// Questions/ Conclusions:
1. GetFieldID is very expensive. JVM does cache GetFiledID.

2. After removing read system call (i.e. implementing a dummy read),
user time goes down. Why? Only system time should go down. In Java user
time goes down from 17.23 to 6.52. Even in C it goes down from  4.63 to
0.45. Possible explanations: (1). Caching. When read system call is not used
the user process runs longer in one context switch and hence benefits
from caching. Otherwise read system call is encountered so early that
user process never gets to execute in cached state. 
(2) Threads? But then why C?

To verify (1) we need to execute buffered reads. Somehow we'll need to
tune our buffer size which corresponds to one context switch. After this
we shouldn't see any change in user time. 

// buffer size 4096. 
java MyRead /local.everest/jindal/1GB.input  426.03s user 7.28s system 92% cpu 7:48.62 total
java MyRead /local.everest/jindal/1GB.input  395.30s user 7.15s system 96% cpu 6:55.96 total
a.out /local.everest/jindal/1GB.input  63.43s user 6.47s system 97% cpu 1:11.45 total

// same as above but without using read system call
java MyRead MyRead.class  417.26s user 0.46s system 99% cpu 6:58.60 total

This confirms our caching hypothesis. User time doesn't reduce by eliminating system calls. 
Lets see if eliminating native calls help.

// a dummy read but no native functions. Native is faster. 
java DummyRead DummyRead.class  773.37s user 1.21s system 98% cpu 13:03.23 total

with 18MB loop and null native method:
	java MyRead  4.98s user 0.11s system 100% cpu 5.081 total

// implementing my own read.  (Case1)
java MyRead /local.everest/jindal/input_file  39.56s user 19.92s system 99% cpu 59.482 total

// After removing GetObjectClass from read()  (Case2)
java MyRead /local.everest/jindal/input_file  37.92s user 17.75s system 100% cpu 55.662 total

// After removing GetFieldID from read()  USER TIME IS CUT IN HALF (Case3)
java MyRead /local.everest/jindal/input_file  19.37s user 17.89s system 100% cpu 37.249 total

// After removing GetFieldID but re-inserting GetObjectClass in read() (Case4)
java MyRead /local.everest/jindal/input_file  22.81s user 18.56s system 100% cpu 41.356 total

// After caching everything (Case5)
java MyRead /local.everest/jindal/input_file  17.23s user 18.63s system 99% cpu 35.861 total
java -native MyRead /local.everest/jindal/input_file  14.50s user 19.10s system 98% cpu 34.272 total

// When no read system call is used. Just return something.  (Case6)
java MyRead /local.everest/jindal/input_file  6.52s user 0.19s system 100% cpu 6.703 total

// same experiment in C
a.out test  0.45s user 0.00s system 101% cpu 0.442 total

// If I read everything in the native read without going back to Java. (Case7)
java MyRead /local.everest/jindal/input_file  11.02s user 18.03s system 100% cpu 29.038 total
java -native MyRead /local.everest/jindal/input_file  7.65s user 18.85s system 97% cpu 27.087 total


// read everything in open and implement dummy read (Case8)
java MyRead /local.everest/jindal/input_file  17.36s user 18.10s system 100% cpu 35.445 total

// results from ../seq/results.
IndividualRead 23.48s user 20.07s system 100% cpu 43.544 total
read.c 4.63s user 18.54s system 100% cpu 23.160 total

----- 18 MB file on crash & burn (RedHat 7), native threads

// Case 1
real    1m22.432s
user    0m58.740s
sys     0m23.690s

real    1m22.589s
user    0m59.560s
sys     0m23.030s

real    1m22.480s
user    0m58.380s
sys     0m24.100s

real    1m22.574s
user    0m58.580s
sys     0m23.990s

real    1m22.520s
user    0m59.560s
sys     0m22.960s

// Case2
real    1m20.525s
user    0m56.700s
sys     0m23.820s

real    1m20.463s
user    0m56.520s
sys     0m23.940s

real    1m20.355s
user    0m57.460s
sys     0m22.900s

real    1m20.555s
user    0m56.980s
sys     0m23.570s

real    1m20.660s
user    0m57.450s
sys     0m23.210s

// Case3

real    0m44.305s
user    0m22.610s
sys     0m21.690s

real    0m44.203s
user    0m21.420s
sys     0m22.790s

real    0m44.175s
user    0m21.410s
sys     0m22.760s

real    0m44.230s
user    0m22.040s
sys     0m22.190s

real    0m44.277s
user    0m22.120s
sys     0m22.160s

// Case4

real    0m49.121s
user    0m27.250s
sys     0m21.860s

real    0m49.004s
user    0m27.410s
sys     0m21.600s

real    0m49.071s
user    0m28.270s
sys     0m20.800s

real    0m48.937s
user    0m27.870s
sys     0m21.060s

real    0m49.006s
user    0m28.120s
sys     0m20.890s

// Case5

real    0m42.867s
user    0m20.120s
sys     0m22.750s

real    0m42.828s
user    0m20.440s
sys     0m22.390s

real    0m42.832s
user    0m21.090s
sys     0m21.740s

real    0m42.882s
user    0m20.950s
sys     0m21.930s

real    0m42.892s
user    0m20.430s
sys     0m22.460s

// Case6

real    0m8.878s
user    0m8.810s
sys     0m0.070s

real    0m8.845s
user    0m8.750s
sys     0m0.090s

real    0m8.877s
user    0m8.790s
sys     0m0.090s

real    0m8.878s
user    0m8.780s
sys     0m0.100s

real    0m8.878s
user    0m8.830s
sys     0m0.050s

// Case7

real    0m32.833s
user    0m11.100s
sys     0m21.730s

real    0m32.635s
user    0m10.860s
sys     0m21.770s

real    0m32.623s
user    0m11.210s
sys     0m21.420s

real    0m32.752s
user    0m11.070s
sys     0m21.680s

real    0m32.739s
user    0m10.790s
sys     0m21.950s

// Case8 (calling dummy read 18000000 times)

real    0m40.739s
user    0m19.330s
sys     0m21.410s

real    0m40.703s
user    0m19.160s
sys     0m21.540s

real    0m40.723s
user    0m19.090s
sys     0m21.630s

real    0m40.698s
user    0m19.430s
sys     0m21.270s

real    0m40.744s
user    0m19.120s
sys     0m21.620s

// Case9 (same as Case8 but calling dummy read 1 time only)

real    0m32.685s
user    0m10.830s
sys     0m21.860s

real    0m32.591s
user    0m11.470s
sys     0m21.120s

real    0m32.582s
user    0m10.630s
sys     0m21.950s

real    0m32.600s
user    0m11.500s
sys     0m21.100s

real    0m32.644s
user    0m11.110s
sys     0m21.540s

