Benchathon Activities No porting was done! Problem investigation Benchmark improvements: validation, workload size, tools Versions: v11 going into benchathon, v14 going out, v15 with any remaining changes agreed at meeting Meeting Reviewed status and problems Discussed analysis data and desired additional data Discussed benchmark scaling: problem size vs. iteration count, impacts on memory size and I/O content benchmarks to be packaged in zip archives to minimize http connection overhead impact on initial run time Eliminated benchmark candidates by consensus: 203_linpack, 204_newmst, 207_tsp, 212_gcbench, 223_diners, 225_shock Set schedule Keep benchmark gate open to 12/31/97 Intel expects up to 5 additional candidates, real applications Everyone urged to solicit new candidates Member vote planned in March, release in April Discussed how to group and report benchmarks and composite metrics Discussed other run rule issues
The other problem discussed at length was timing variability, from one run to another, and from execution to execution in an autorun sequence. Most of the problems were observed with V10 and earlier where a large console buffer was occupying memory and causing excessive garbage collection activity. In V11 and later the amount of output sent to the console is greatly reduced, you have the option to direct console output to your Java console which may be a file or system console, and you have the option to discard console output. However there were some indications that timing variability problems remain in some cases with V11, particularly in lower memory systems.
Anirudha Rahatekar (Intel) suggested some additional controls and instrumentation around the benchmark executions in order to better control the memory environment and reduce variability. These are noted in the "DEVELOPMENT RELEASES" section below and will be available in V15.
Members agreed to perform some tests of variability and share the results with the group. (If you don't want to release absolute numbers, then you can at least give relative percentages.)
Intel offered to provide additional profile information. The profiles in this chart were collected with the ordinary JDK profile flag, which requires that you measure with JIT turned off. Michael Greene saw substantial differences on some of the benchmarks depending on whether JIT was enabled or not, so it was considered important to be able to look at both. Walter Bays thought that he might also be able to get some profiles with JIT but wasn't sure.
There was general agreement that real applications are more important than synthetic, although it was noted that typically for commercial applications the source code is not available for inspection to see what the program is doing. The situation is more like BaPCO than traditional SPEC CPU benchmarks, and we need to look at their rationales.
There was no agreement on how benchmarks might be grouped. Many felt that there should be some solid basis on which to group benchmarks based on program characteristics or application area. Some suggestions were application/synthetic, integer/floating point, or some combination of those divisions. How or whether to combine sub-metrics into a composite metric in these cases was discussed with no resolution.
"Early" "Late" Nov Close gate continue benchmark search Analyze Dec subcommittee vote OSSC vote Close gate Jan begin member vote Analyze Annual meeting Annual meeting end member vote Feb subcommittee vote release OSSC vote Mar begin member vote Apr end member vote release
In the next month everyone is encouraged to redouble their efforts to acquire additional benchmark candidates, particularly real applications. Benchmarks should be fit into the SPEC tool harness. Walter has an action item to send out a guide to the steps needed to do this. In order to give everyone as much time as possible to examine the candidates, and to improve their chances of being accepted into the suite, you should not wait for the last day but send information on any prospective candidates as soon as you have it, and get the benchmark out to committee members as soon as possible. Michael Greene now "owns" the benchmark numbers 232 through 236.
We discussed running short versions (e.g. 10%) of the benchmarks for systems without JIT and with small memories, such as embedded systems or low-end NC's. No resolution was reached. It was thought that perhaps some intermediate problem size (e.g. 20%) would be more appropriate. Attention would have to be paid to both memory size and run time. Perhaps one follow-on benchmark would be able to address both embedded systems and low-end NC's.
_202_jess Longer 100% workload _205_raytrace/ Removed spurious output - KMD/NS _213_javac/ New longer 100% workload - KMD/NS _214_deltablue/ New shorter 100% workload - KMD/NS _222_mpegaudio Fixed validation problem per subcommittee vote on floating point accuracy _224_richards/ Restored printout of subunit timings for academic purposes - KMD/NS _227_mtrt Fixed validation problem where it was dependent on thread order and now is notRemoved 6 benchmarks eliminated by subcommittee consensus, and put them into a "Removed" group. There were other changes to some of them as well but these are not particularly important now.
_203_linpack _204_newmst New workload. Some double changed to float - KMD _207_tsp _212_gcbench New longer 100% workload. Sized to still fit in 30MB heap space - KMD/NS _223_diners _225_shock
These remain in the "Removed" category in case someone wants to work on revising/combining them to try again to get the committee's approval. Even though removed, we all owe these benchmark authors a big thank-you for their effort, and for the beneficial effects their benchmarks have had on JVM's already during suite development. Should any author not wish to have his code remain in the "Removed/work-in-progress" category, we will pull it from the next release and all SPEC members will be asked to delete all copies of that benchmark in their posession. I will be contacting the authors on that subject soon. As a corrolary, if any of these benchmarks provides you with useful insights on your systems' performance and you would like to retain access to it, then it is in your interest to contact the author and work with her on addressing its' shortcomings for the suite. (Note also that some of these are freely available on the net.)
Version 15 should include changes to the tool harness agreed in last week's meeting, primarily aimed at the issue of timing variability.