Paper Title:   "Do Performance Visualization Tools Really Help? A Case Study"

Paper Authors: xxxxxxxxxxxxxxxxx

Please rate the paper from 5 (Excellent) to 1 (Unacceptable).

                Originality: 3

               Significance: 4

         Technical strength: 4

       Written presentation: 3

     Relevance to symposium: 5

Your overall recommendation: 4


Please rate how confident you feel of  your assessment (due to factors 
such as closeness of topic to your area of expertise, level of detail 
at which you evaluated paper, etc.).

Confidence in review:

	 5  =  ``I really know this topic''

Reviewer:
1) Please give very detailed comments to explain your ratings and/or 
   to help the authors improve the paper.
-------------------------------------------------------------------------------
Comments to Author(s):

This paper argues, with a case study, that visualization can be effective
in performance tuning a parallel program.  Overall, this is a nice paper and
I would like to see it included in the conference.

It seems that one of the major themes of the usefulness of the visualizations
is patterns.  Things like spacing of communication events, parallel vs.
fan-out communication timings, and uniformity of durations are presented:
these all work towards the human ability to work with patterns.

Two suggestions (one easy, one more difficult):  (1) explicitly mention this
point early in the paper; i.e., the reason (in this study) why visualization
worked was that it exposed patterns that were useful and triggered human
recognition of a problem.  (2) Try to draw some conclusion (wildly speculate?)
about what type of visualizations will be most useful; this would be an
attempt to help separate useful from merely pretty visualizations.

Serious issue (that could actually kill the paper): how much did the overall
program improve as a result of your study?  You convince me that the
visualizations provide new information; is it useful information?

Following are some specific comments.

* Page 1: the authors argue that "The lack of concrete and convincing examples
  of usefulness of performance visualizations in the literature is one of the
  most significant impediments to more widespread acceptable of such tools in
  both the user and tool-developer communities."

  Come on!  This is self-serving at best.  Tool users don't read the
  "literature".  Perhaps ease of use, ease of installation, support, manuals,
  and availability might have something to do with the problem.

  This paper is worthwhile as a help to make visualization claims more concrete.
  You're not going to get a bunch of new users if this is published.

* Figure 1: this needs a "time -->" label at the top or bottom of the figure.

* The phrase "in order" appears *a lot* in this paper.  In all cases, you can
  just delete it and you end up with tighter, more direct prose.

* Section 4 needs the most work.  First, it looses focus with the rest of the
  paper.  The paper is about visualizations and their effectiveness.  This
  section is about alternate strategies for tuning the application.  In the
  last paragraph on p.7, there is a discussion of using simulation to evaluate
  alternatives, and then graph them.  This is quite a different tone from the
  rest of the paper. Only when we get to the last paragraph of Sec 4 do we
  get back to the point.  Trim this and stay with your strong point.

  The presentation in Sec 4 isn't as good as the rest of the paper.  The
  prose is more complicated and the figures are more difficult to understand.
  Fig 5 needs work.  What are the units of "work" on the x-axis?  What is
  "time" measuring?  It seems to be the average time to send a message.  It
  should say this explicitly.  On both Figs 5 & 6, the legends describe what
  the solid, dashed, and dotted lines mean.  Then, in the figures, there are
  things like "no padding" pointing to the same lines.  This is pretty confusing
  to follow.  You only have 5 curves in each graph; just make each one a
  different line style (solid, dashed, dotted, dot-dash, and dot-dot-dash, for
  example).  Then, next to the figure, have a description like:

	.......      synchronized communication

	- - - -      non-blocking, optimal padding

	etc.

  In the last paragraph of p.7: why did you use a simulation to study this
  change when you had a real SP-2 to run on?

  In Fig 8, you present a new version the execution, but with no way to
  compare it in a quantitative way to anything else.

  In general, ALL of your time-line views should have time labeled on the
  x-axis (so that we can understand the magnitude of the issue.

For context, I'm identifying myself as the reviewer and am happy to discuss
the paper and (if it gets accepted) its revision.

Paper Title:   "Performance Visualization and Tuning with Carnival"

Paper Authors: xxxxxxxxxxxxxxxxxxxxxxxx

Please rate the paper from 5 (Excellent) to 1 (Unacceptable).

                Originality: 4

               Significance: 3

         Technical strength: 4

       Written presentation: 3

     Relevance to symposium: 5

Your overall recommendation: 4


Please rate how confident you feel of  your assessment (due to factors 
such as closeness of topic to your area of expertise, level of detail 
at which you evaluated paper, etc.).

Confidence in review:

	 5  =  ``I really know this topic''

Reviewer:
1) Please give very detailed comments to explain your ratings and/or 
   to help the authors improve the paper.
-------------------------------------------------------------------------------
Comments to Author(s):

This paper describes a technique for identifying the source of waiting time in
a parallel program.  I like the ideas in this paper, though I didn't always
understand the explanations.

A basic weakness of this paper is that you don't show whether the information
provided by Carnival is useful to improve the application program.  It would
be nice if you could walk the reader through the steps from identifying the
problem (which you do), to the cause of the problem (which it appears that you
do) to the fix and re-evaluation of the program (these last two, you don't do).

Some sources of my lack of understanding:

1.  The explanation of "steps" and "paths".  It sounds like "steps" are just
    basic blocks (from the bottom of p.3).  Avoid introducing a new term and
    just use "basic block".

    A path is a sequence (or a set, depending on whether you're on p.3 or p.4)
    of steps.  The steps are identified with (among other things) processor ID.
    So, it *sounds* like paths might meander between processors.  But, from
    reading the prose, it *sounds* like a path is all on one processor.

    So, if I understand these paths, there are something like: an execution
    sequence of basic blocks delineated by synchronization operations.  If this
    is the case, say so in simple words!  If it's not the case, please
    un-confuse me.

2.  If you have global synchronization, such as a barrier, it is clear how
    to compute the stuff you're doing.  But, if you have random collections
    of pairwise synchronization operations (locks, semaphores, messages),
    then I don't understand how to compute your paths.

    It also seems, with lots of non-global synchronization, you can get a
    combinatorial growth in the number of paths.  Without seeing how you
    actually identify the paths, I can't figure out if you avoid this
    problem.

3.  Figure 3, and its accompanying prose, gave me some problems.  On p.6
    (2nd & 3rd line from bottom) and p.9 (lines 9-10) talk about source
    code on the right and "execution time profiles for each nested scope
    on the left".  Is this exec profile missing or am I totally lost?

Other issues:

1.  Paper title: the first half of the paper talks about waiting time
    analysis.  The second half talks about the visualization.  But nowhere
    do you actually do any tuning of the application.  A more representative
    title would be something like: "Waiting Time Analysis and Performance
    Visualization with Carnival".

2.  You mention that you don't handle memory delays.  What about OS delays
    due to multiprogramming?

    With respect to memory delays: do you see anyway of using things such as
    hardware counters to incorporate this information?

3.  When you combine similar steps, you don't pay attention to the execution
    length of the steps (assuming that they will be similar).  It is easy to
    construct cases where this isn't true.  Would it be worth checking the
    times and if they are far enough off, not combining?

Last modified: Tue Dec 3 10:40:52 CST 2019