Do you really need to read() all of that?

Theo Jepsen <tjepsen@wisc.edu>

A contrived example

struct Record {
    int id;
    char name[32];
};
struct Record records[NUM_RECORDS];

for (int i = 0; i < NUM_RECORDS; i++)
    read(fh, &records[i], sizeof(struct Record));
for (int i = 0; i < NUM_RECORDS; i++)
    if (records[i].id == 7)
        printf("%s\n", records[i].name);

Outline

I/O data-flow analysis tool

Example program: summing two bytes

Tool output: summing two bytes

Example program: parsing XML

Tool output: parsing XML

Which instructions access which bytes?

key1=value1
key2=value2
needle=haystack
foo=bar
document.getElementById('mycode').style.display = 'none'; document.getElementById('touch_lc').contentWindow.setLinksStyle(0.4);
var f = document.getElementById('touch_lc').contentWindow; f.setLinksStyle(0.4); f.highlightIns('400780', 'green');
var f = document.getElementById('touch_lc').contentWindow; f.setLinksStyle(0.4); f.highlightIns('400780', 'green'); f.highlightIns('400820', 'rgb(230, 85, 13)');

What can we observe about the bytes?

Byte classification: k-means clustering

What features can we use for clustering?

Cluster #0 [ 'k e y 1', 'k e y 2', 'f o o', 'b a r' ]
Cluster #1 [ '=', '0xa', '=', '0xa', '=', '0xa', '=', '0xa' ]
Cluster #2 [ 'v a l u e 1', 'v a l u e 2', 'n e e d l e', 'h a y s t a c k' ]

Slicing

slicing is a method for automatically decomposing by analyzing their data flow and control flow. Starting from a subset of a program's behavior, slicing reduces that to a minimal form which still produces that behavior. The reduced , a "slice," is an independent guaranteed to represent faithfully the original within the domain of the specified subset of behavior.

Weiser, Mark. "Program slicing." Proceedings of the 5th international conference on Software engineering. IEEE Press, 1981.

Uses of file slicing

Measuring read() efficiency: results

$ unzip -l archive.minimized.zip
Archive:  archive.minimized.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
       33  12-05-2015 13:52   a.txt
     7524  12-05-2015 14:48   LICENSE
---------                     -------
     7557                     2 files

$ unzip archive.minimized.zip a.txt
Archive:  archive.minimized.zip
  inflating: a.txt                   

$ unzip archive.minimized.zip
Archive:  archive.minimized.zip
  inflating: a.txt                   
file #2:  bad zipfile offset (local header sig):  95

Future work

So maybe you don't need to read() all of that!

Get your head out of those books and stop read()ing all those pages, you bookworms.

Any questions about...

Explore more programs here: http://pages.cs.wisc.edu/~tjepsen/fileslice/

/