Notes on MVP discussion

The MVP paper was split into two sections, characterizing workloads for a desktop multimedia environment and outlining an architecture for a chip multiprocessor for such an environment. We began by discussing the workload characteristics.

We began by discussing the numbers provided by the paper (1G "RISC-like" instructions per second) and if these had changed in modern processors. The general concensus was that modern processor cores could handle this workload. However, changes in demands of multimedia applications could result in a greater demand of resources. We next discussed what this demand might be.

As hardware becomes faster and faster software continues to evolve and make use of newly available resources. The focus of a lot of discussion was whether or not desktop applications could or would take advantage of a CMP. Are desktop application multithreaded? Can desktop applications be multithreaded, and to what extent? Several answers to these questions were attempted. The first argument was that some applications are multithreaded (Excel), and others can likewise be parallelized. These applications could take advantage of a CMP system. The second argument was that desktop systems rarely, if ever, are only running one application. Most people listen to music, use instant messenger and have several web applications open at the same time. These separate threads could also take advantage of a CMP system. The counter argument to both of these was, is it needed? How much design complexity is worth this performance improvement? Also, how can we know programmers will be able to take advantage of this architecture? Parallelizing compilers have improved in recent years, but still have significant trouble in this area.

We also discussed the specific architecture presented in the paper. The focus of this discussion was on the designers decision to make the architecture flexible. We specifically discussed their choice not to implement a dedicated cosine transform unit. The reasoning for this decision was that even in CDT intensive applications, the transform is only 20% of execution time. During the other 80% of this application (and an application that does not perform this function) these resources would be idle. We generally agreed that a flexible architecture was a good design decision.

A few other interesting notes: We polled the class, "Is performance the most important metric?" Most agreed that it was, a few decenting that power consumption/battery life for their laptops was very important. This sparked a short debate about the possibility of a thin-client system. What if you had a low power laptop that you used to ssh into a more powerful machine to do your work. Some people already do this. The main problem is limited performance when a number of people have sshed to the same machine. Your performance is limited by others.

Memory bandwidth, memory bandwidth, memory bandwidth. It was brought up several times that this is a problem that must be dealt with in any system. (As is off chip bandwidth)