CS 736 Reviews - Spring 2017: The scalable commutativity rule: designing scalable software for multicore processors

1. Summary
This paper describes the scalable commutativity rule, which suggests that whenever two interface operations are commutative, they can be implemented in a scalable way on a shared-memory multicore processor. The authors implement a tool called COMMUTER to determine which pairs of operations in an interface are commutative, and they use their findings to develop sv6, a highly scalable kernel.

2. Problem
It is difficult to develop systems that scale to a large number of cores on a shared-memory multicore processor using a MESI protocol. Cores cannot write shared cache lines in a scalable way, but each core can scalably read and write exclusive cache lines and scalably read shared cache lines. If all cores perform memory accesses in a scalable way, then an increase in the number of cores corresponds to roughly a linear increase in throughput.

Another issue is that existing interfaces, such as POSIX, are inherently impossible to scale to multiple cores. If an interface contains many pairs of operations that produce different results or different system states when executed in reverse order, then the interface is not scalable.

3. Contributions
The authors introduce the idea of SIM-commutativity of a series of actions Y, which basically means that given an initial system state, then for any prefix P of Y, any reordering of P will produce the same returned result and final system state. The authors also define the idea of conflict-freeness, which formalizes the idea that different cores should not write to the same cache line. Then, they provide a formal proof that SIM-commutativity of an interface guarantees the existence of a conflict-free implementation.

Next, the authors implement COMMUTER, a tool that applies the scalable commutativity rule to interfaces and implementations. First, the ANALYZER component determines the exact situations in which the operations of an interface (ignoring the implementation) commute or fail to commute. Next, the TESTGEN component uses the output of ANALYZER to generate test cases that provide as much coverage as possible. Lastly, the MTRACE component uses these tests to check whether the implementation is conflict-free. The authors apply COMMUTER to Linux and to their own experimental kernel, sv6.

4. Evaluation
Only 68% of test cases in Linux scale, but 99% of test cases in sv6 scale. Furthermore, the authors perform two microbenchmarks that test the fstat syscall and the open syscall in the Linux kernel, respectively, and the corresponding syscalls in sv6. These show that the number of syscalls per second per core degrades significantly in Linux, and also in sv6 if implemented in a way that breaks the commutativity rule. Similarly, a macrobenchmark using a mail server shows similar performance on multiple cores. All of these results show that systems that follow the commutativity rule wherever reasonable generally have increased scalability.

5. Confusion
The proof in Section 3.5 was difficult to follow.

Posted by: Varun Naik | March 14, 2017 07:58 AM

Summary:
This paper puts forth an approach for developing scalable software that can run on multi-core processors. The proposed approach helps developers to design a scalable software even before actual implementation. The developers can leverage the tool developed by the authors called Commuter to come up with alternate interface designs for complex interfaces and set a clear scaling target, one that is maximum conflict free for their implementation.

Problem:
Typically, during developing a multi-core software, developers choose workloads, plot performance at varying numbers of cores to identify scalability bottlenecks. Identifying the scalability issues and fixing them towards late in the implementation and development cycle may become tedious approach and also at such stage the improvements may seem impractical. The proposed framework/tool helps in identifying the scalability aspects of an interface at the design time rather placing requirements on the implementation.

Contributions:
The main contribution is the state-dependent, interface-based and monotonic commutativity rule which the authors call SIM commutativity. This rule helps in first analyzing the interface’s commutativity, and then helps on designing an implementation that scales in commutative situations.
The authors describe an automated scalability testing tool called Commuter developed to identify commutative cases for complex interfaces. This tool automates the process of identifying the conditions under which operations commute and generates commutative tests by taking in an interface in the form of symbolic model input. They further describe it's 3 components, namely the 1.Analyzer, 2. TestGen and 3. Mtrace. Analyzer takes the symbolic model of the interface and computes the conditions under which commutativity of an interface holds. Testgen, converts analyzers’s commutativity conditions into concrete test cases that can be applied to obtain a conflict free implementation. Mtrace ensures that particular interface is conflict free for each test case.

Evaluations:
The authors evaluate the benefits of Commuter to kernel developers by modeling POSIX file system and virtual memory calls and using it to evaluate and develop a scalable file and virtual memory system for SV6 kernel. They evaluate scalability and performance by comparing SV6 kernel(with optimized POSIX calls) with single core Linux using two micro-benchmarks and a application level benchmark (Mail server)
SV6 kernel is found to generate more scaling test cases and the throughput in operations per second per core compared to Linux kernel in all the benchmarks.

Confusions
Coming up with a correct model specifying the behavior of complex interfaces seems a like a new bottleneck for the developers with this approach. How sensitive is their tool to errors in the specification?

Posted by: Lokananda Dhage Munisamappa | March 14, 2017 06:49 AM

Summary
This paper talks about the use commutativity rule to detect the scalability of systems. It states that whenever interface operations commute, they can be implemented in a way that scales. This new approach to scalability starts at software interface level and helps in detecting scalability opportunities very early during the development cycle.

Problem
The general method of evaluating the scalability of multicore software is to test different workloads, compare their performance at varying number of cores and use various tools to identify scalability bottlenecks. There are several drawbacks associated with this process. Fundamental bottlenecks usually remain unclear and this process happens very late in the development cycle that it becomes impossible to implement some design level solutions.

Contribution
This paper proposes the use of commutativity rule to identify scalability opportunities early during the development cycle. Commutative operations have an implementation whose memory accesses are conflict free during those operations and thus they provide an opportunity for scalability. Authors developed a new tool named COMMUTER that accepts high-level interface models and generates tests of operations that commute and hence could scale. COMMUTER consists of ANALYZER which takes interface model as input and computes commutativity conditions, TESTGEN which generates various test cases based on the commutativity conditions and MTRACE which checks whether an implementation is conflict free for each test case or not.

Evaluation
Authors evaluated their design by implementing it on 80 core machine using sv6 kernel. Microbenchmarks statbench and openbench were used. When COMMUTER was applied to 18 POSIX calls, Linux scaled for 68% of the tests whereas sv6 scaled for 99% of the tests.

Confusion
>Could you please discuss more about monotonic and non-monotonic versions of commutativity.
>Is this rule used in current systems to provide scalability?

Posted by: Gaurav Mishra | March 14, 2017 05:47 AM

Summary
This paper presents a new approach to scalability that starts at a higher level: the software interface.
This approach enables developers to build better scalable interface design and provides a clear target to achieve during implementation stage. It uses the scalability commutativity rule which states that whenever interface operations are commutative, they can have scalable implementations.

Problem
The workload driven approach to finding scalability bottlenecks is ineffective.
This focuses developer effort on real issues, but has several drawbacks. Different workloads or higher core counts often exhibit new bottlenecks. It’s unclear which bottlenecks are fundamental, so developers may give up without realizing that a scalable solution is possible and this process happens so late in the development process that design-level solutions such as improved interfaces may be impractical.

Contributions
The main purpose of this paper was finding scalability through commutativity which is state dependent, interface based and monotonic. Based on SIM commutativity rule, if the interface design commutes, the system can be implemented in a scalable way. For determining commutativity paper proposes the COMMUTER tool made up of ANALYZER, TESTGEN, and MTRACE tools. ANALYZER uses symbolic execution to explore the execution paths used by the interfaces to identify the conditions under which the commutativity of the interfaces hold. TESTGEN takes the conditions and generates test cases for them. MTRACE runs the test cases generated on a real implementation and reports memory access conflicts and shared variables for tests which failed.

Evaluations
Authors evaluated the scalability and performance one real hardware with sv6 OS. They used two microbenchmarks named openbench and statbench. Each benchmark has two variants, one that uses standard, non-commutative POSIX APISs and another uses modified commutatvie APIs. The microbenchmarks scale almost perfectly except for the case when statbench does fstat and link which do not commute.

Confusions
How much of this approach is followed or tools are used in practice for designing a real software?

Posted by: Om Jadhav | March 14, 2017 04:01 AM

The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors

1. Summary
The problem of developing scalable multi-threaded applications running on multi-core processors has been addressed in this paper. The paper has defined the Scalability in terms of number of commutative test cases possible for given interface. The paper describes tool called Commuter that generates these test cases. The authors also describe how these test cases can be leveraged to increase degree of parallelism of an interface thus increasing scalability.
2. Problem
An interface (functionality of an application) can be implemented in any different ways. There isn’t any generic way to differentiate good implementations from others. In most cases, developers improvise the implementation after first iteration. This increase time and efforts required to achieve a scalable solution. There isn’t a generic metric to describe the degree of parallelism and hence scalability of an implementation.
3. Contributions
The most notable contribution of the paper is the ability to determine maximum attainable scalability of a given interface prior implementation. The paper describes Commutativity Rule to judge the scalability of a given implementation. The paper provides formal proof for the Commutativity rule. The paper describes Commuter tool to automate the process of generate commutative test cases and determine how many of those test cases should have conflict free implementation. The commutator has three main components – Analyzer, Testgen and MTrace. Analyzer takes symbolic model of an interface and computes commutative operations of interface. Testgen generates test cases based on commutative operations. Mtrace checks if a particular interface is conflict-free for each test case.
4. Evaluation
The authors have used Commuter tool to model File system and Virtual memory POSIX APIs in sv6 research kernel. The authors have used benchmarks like statbench and openbench to measure the performance new sv6 APIs against Linux kernel. The sv6 kernel clearly generates more scaling test cases as compared to Linux kernel.
5. Confusion
Why is POSIX’s lowest FD rule bad? Why does it result in poor scalability?

Posted by: Rohit Damkondwar | March 14, 2017 03:49 AM

Summary

The paper deals with the issue of increasing scalability in systems by identifying commutable operations of interface designs and then executing them parallelly across multiple cores.

Problem

The common way of identifying scalability bottlenecks in multi-core software system was to choose a workload and plot performance at varying number of cores. Different workloads pointed to different bottlenecks. Varying number of cores poses the same problem. This makes it difficult to find the real bottleneck(s) which hinder scalability.

Contributions

SIM Commutativity

The paper builds on some previous work of achieving concurrency among commutative operations by Rinard and Prabhu and extends it to scalability. This is possible because the commutative operations can be reordered in any manner and still results in the same system state as produced by a serial execution.

The paper provides provides the formal definition and verification of the scalable commutativity rule also briefly describing how commutative implementations might be built.

It also introduces us to the COMMUTER tool which is used to determine the commutativity property among operations of an interface model. COMMUTER is composed of three sub-tools ANALYZER, TESTGEN AND MTRACE.

ANALYZER produces the conditions under which the operations of a symbolic interface model commute.

TESTGEN takes the conditions produced by ANALYZER and generates test cases for them.

MTRACE checks each test case using a particular implementation and determine if it is conflict free.

Evaluation

It was found that close to a third of all tests among the systems calls in Linux kernel (version 3.8) for ramfs were conflicting. Scalability of Linux was compared against that of sv6. This was done on an 80-core machine using two micro-benchmarks and one application benchmark and it is shown that sv6 clearly outperforms Linux.

Confusion

Are these tools used anywhere in commercial systems currently?

Posted by: Mayur Cherukuri | March 14, 2017 03:15 AM

Summary:
This paper proposes a new approach to detect the scalability opportunities of a software for multicore processors very early in the development process, based on interface specifications. This enables developers to build better scalable interface design and provides a clear target to achieve during implementation. The approach used here uses the scalability commutativity rule which states that whenever interface operations are commutative, they can have scalable implementations.

Problem:
The traditional way to evaluate a scalable software for multiprocessor is to test on different workloads, and use tools such a differential profiling to identify bottlenecks. But, the drawback here is, the issue is identified after the implementation. This can either prevent developers from identifying that potential scalable implementation exists or might be very difficult in case the interface design is faulty.

Contributions:
The main idea of the paper is to identify the scalability opportunities early in the development process just by seeing high-level interface. Rule of thumb is scalability commutativity rule. A operation is commutative if the result is independent of order of execution. According to the rule, when operation commute, they can have implementations where memory accesses are conflict-free and thus scalable. The tool COMMUTER, takes a high-level interface model and identifies operations that commute, develops tests of those operations and use the tests to evaluate scalability of implementation. The SIM commutativity used here takes care of stateful complex interfaces as it is state-dependent, interface-based and monotonic i.e. commutativity of operation is defined in terms of specific system state, operation arguments and concurrent operations.

Evaluation:
COMMUTER is interpret 18 POSIX calls and the obtained results are used to implement a new operating system kernel sv6. The tests generated (13,664 in number) from this interface is used to test the Linux implementation which resulted in 68% scalability. In contrast, sv6 resulted in 99% scalability for these tests. This provides a idea that COMMUTER can help in better scalable implementation.

Confusion:
Authors argue that some interfaces such as Communication Interface can benefit from weak ordering of operations. Which order(s) exactly can be relaxed in Communication Interface to improve scalability?

Posted by: Pallavi Maheshwara Kakunje | March 14, 2017 03:11 AM

Summary:
The paper argues that current practice to identify scalability opportunities in a multicore system requires implementation details which is not developer friendly because by the time scalability issues are known, implementations become too large and complex to be changed. To solve this problem, it presents a new approach to find scalability opportunities which starts at a higher level i.e. Software Interface. Authors discuss related work and show novelty of their design. They then propose another measure of commutativity – SIM commutativity, which allows developers to recognize SIM-commutative / scalable operations in an interface, which helps in designing a scalable implementation of the system. Authors also build a tool (called COMMUTER) to compute conditions under which operations commute and to test an implementation for conflicts.

Problem:
Measuring scalability of multicore software by testing performance of implementations with varying number of cores have several issues. To enumerate few – i) it is difficult for a developer to calculate the degree of scalability affected by various factors such as different workloads, and varying cores count, ii) also, these scalability issues and opportunities become visible very late in development process and making changes in design of an interface or system is difficult and expensive at this stage. Motivated by these challenges, authors design an approach and implement a tool to pinpoint scalability issues and opportunities at a higher level i.e. software interface which is before implementation.

Contributions:
1. Thinks scalability in terms of conflict-free memory access. Postulates that if no one writes a cache line that was read/written by another core, then scalability is proportional to number of cores.
2. Gives precise and formal definition of scalability rule, and provides proof of its correctness. In process of providing definition and proof, it defines actions as either invocation or response, history as set of executions (in a given order), and reordering of a history as set of actions which respect the ordering of actions for the given history.
3. Provides definition of SI-commutativity and shows that SI-commutativity is insufficient to argue about scalability of an operation. It then defines its monotonic version i.e. SIM-commutativity. SIM-commutativity identify operations that commute, and hence can scale in some states (not all though). It allows authors to extend the set of situations that can scale.
4. Discusses various problematic cases in POSIX system to highlight issues which one faces in designing a scalar interface. Few problems are – i) Decomposed compound operations: Some operations (such as fork) combine multiple operations inside them which prohibits commutativity, ii) Overly Simplified and Deterministic specification: Some designs (such as allocating lowest FD for operations like opening a file) prohibit commutativity as they mandate an order for FDs being allocated which is not needed/used by most of applications, iii) Strictly Ordered Operations: Some interfaces (such as Communication interface) don’t need strict order of operation executions in every case. They can benefit from weak ordering in some cases in multi-thread environments, and iv) Aggressive/Synchronous Release of Resources: It is not necessary to release resource immediately for some operations. Asynchronous release minimizes the aggressive bookkeeping required by these operations.
5. Builds a tool COMMUTER. COMMUTER tool takes an interface as input, computes conditions under which operations commute, and tests an implementation for conflict-freedom under these conditions. It is divided into 3 components – i) ANALYZER: automates the process of analyzing commutativity of an interface, ii) TESTGEN: Converts ANALYZER’s commutativity conditions into concrete test cases to apply in an implementation, and iii) MTRACE: Runs test conditions generated by “TESTGEN” on a real implementation for conflict-freedom.

Evaluation:
Various POSIX file system calls and virtual memory calls were modeled in COMMUTER to evaluate scalability of Linux. Of 13,664 tests in totl, COMMUTER determined that 13,528 were conflict-free on sv6 (with commutable APIs) and 9,389 tests were conflict free on Linux (with standard/non-commutable APIs). Performance was evaluated using two benchmarks – i) statbench, and ii) openbench. Authors found that commutative version scales linearly, while non-commutative version doesn’t. Their evaluation shows that it is possible to achieve a scalable implementation of POSIX by applying the commutativity rules.

Confusion: What exactly is the scope (size) of real world operations which can be argued to scale based on commutativity rules proposed by authors? Doesn’t conflict-free memory access pre-requisite (assumption) severely restrict the scope/size of these operations?

Note: Please discard my earlier review. It has minor grammar errors and claims SIM-commutativity to be more flexible than SI, which I believe is opposite of truth.

Posted by: Rahul Singh | March 14, 2017 02:19 AM

Summary
The paper explores scalability at the software interface level. This is helpful as the reasoning about scalability can be done before an implementing a protocol nor is related hardware needed. Reasoning scalability at software interface level can highlight inherent scalability problems and helps to discover alternation interface designs and sets a clear scaling target for the scalable interface implementation,

The paper stated the scalable commutativity rule and formally proved SIM-commutative regions have conflict-free implementations. The authors also developed a tool, named COMMUTER which accepts high-level interface models and generates tests of operations that commute and hence could scale. A new research operating system kernel sv6 was also developed.

Problem
Traditionally, people try to use tools such as differential profiling to reason about scalability on different workloads or higher core counts. It has two main draw backs. First, it’s unclear which bottlenecks are fundamental which may result possible scalable solutions not being found. Secondly, differential profiling usually happens in a late stage of the development when implementation efforts and hardware are already required and hard to make changes on design level.

At the same time, current systems usually have bottlenecks in scalability. For example, Linux still has many of the bottleneck and it is hard to found out which bottlenecks are inherent to its system call interface.

Contribution
1. The author formally state and proved the scalable commutativity rule which is “In any situation where several operations commute—meaning there’s no way to distinguish their execution order using the interface—they have an implementation whose memory accesses are conflict-free during those operations.” The rule showed that SIM-commutative regions have conflict-free implementations and led to a new model of scalability design: analyze commutativity first and then implement based on the commutativity model.
2. The authors reasoned guidelines for commutative interfaces
a. Decompose compound operations.
b. Embrace specification non-determinism.
c. Permit weak order.
d. Release resource asynchronously.

3. The authors developed a tool named COMMUTER (consisting ANALYZER, TESTGEN and MTRACE), which is a systematic, test-driven tool to applying the commutativity rule to real implementations.
4. Based on guidelines from COMMUTER, the authors also developed a research operating system called sv6 which scales for 99% of the 13664 tests generated by COMMUTER.

Evaluation
To evaluate the system, the authors compared sv6 and with Linux 3.5.7 running microbenchmarks on an 80-core machine (eight 2.4 GHz 10-core Intel E7-8870 chips) and 256 GB of RAM. The results should that sv6 scales much better than the compared Linux system. As the compared Linux system’s performance drops significantly when there is more than 20 cores, the sv6 performance stayed relatively stable.

Confusion
1. Can you further explain the concept of “Specification” first appeared in section 3.2?

Posted by: Yunhe Liu | March 14, 2017 02:09 AM

Summary:
The paper argues that current practice to identify scalability opportunities in a multicore system requires implementation details which is not developer friendly because by the time scalability issues are known, implementations become too large and complex to be changed. To solve this problem, it presents a new approach to find scalability opportunities which starts at a higher level i.e. Software Interface. Authors discuss related work and show novelty of their design. It then proposes a more flexible measure of commutativity – SIM commutativity, which allows to recognize SIM-commutative / scalable operations in an interface which helps in designing a scalable implementation of the system. Authors also build a tool (called COMMUTER) to compute conditions under which operations commute and to test an implementation for conflicts.

Problem:
Measuring scalability of multicore software by testing performance of implementations with varying number of cores have several issues. To enumerate few – i) it is difficult for a developer to calculate the degree of scalability affected by various factors such as different workloads, and varying cores count, ii) also, these scalability issues and opportunities become visible very late in development process and making changes in design of an interface or system is difficult and expensive at this stage. Motivated by these challenges, authors design an approach and implement a tool to pinpoint scalability issues and opportunities at a higher level i.e. software interface which is before implementation.

Contributions:
1. Thinks scalability in terms of conflict-free memory access. Postulates that if no one writes a cache line that was read/written by another core, then scalability is proportional to number of cores.
2. Gives precise and formal definition of scalability rule, and provides proof of its correctness. In process of providing definition and proof, it defines actions as either invocation or response, history as set of executions (in a given order), and reordering of a history as set of actions which respect the ordering of actions for the given history.
3. Provides definition of SI-commutativity and shows that SI-commutativity is insufficient to argue about scalability of an operation. It then defines its monotonic version i.e. SIM-commutativity. SIM-commutativity identify operations that commute, and hence can scale in some states (not all though). It allows authors to extend the set of situations that can scale.
4. Discusses various problematic cases in POSIX system to highlight issues which one faces in designing a scalar interface. Few problems are – i) Decomposed compound operations: Some operations (such as fork) combine multiple operations inside them which prohibits commutativity, ii) Overly Simplified and Deterministic specification: Some designs (such as allocating lowest FD for operations like opening a file) prohibit commutativity as they mandate an order for FDs being allocated which is not needed/used by most of applications, iii) Strictly Ordered Operations: Some interfaces (such as Communication interface) don’t need strict order of operation executions in every case. They can benefit from weak ordering in some cases in multi-thread environments, and iv) Aggressive/Synchronous Release of Resources: It is not necessary to release resource immediately for some operations. Asynchronous release minimizes the aggressive bookkeeping required by these operations.
5. Builds a tool COMMUTER. COMMUTER tool takes an interface as input, computes conditions under which operations commute, and tests an implementation for conflict-freedom under these conditions. It is divided into 3 components – i) ANALYZER: automates the process of analyzing commutativity of an interface, ii) TESTGEN: Converts ANALYZER’s commutativity conditions into concrete test cases to apply in an implementation, and iii) MTRACE: Runs test conditions generated by “TESTGEN” on a real implementation for conflict-freedom.

Evaluation:
Various POSIX file system calls and virtual memory calls were modeled in COMMUTER to evaluate scalability of Linux. Of 13,664 tests in totl, COMMUTER determined that 13,528 were conflict-free on sv6 (with commutable APIs) and 9,389 tests were conflict free on Linux (with standard/non-commutable APIs). Performance was evaluated using two benchmarks – i) statbench, and ii) openbench. Authors found that commutative version scales linearly, while non-commutative version doesn’t. Their evaluation shows that it is possible to achieve a scalable implementation of POSIX by applying the commutativity rules.

Confusion: What exactly is the scope (size) of real world operations which can be argued to scale based on commutativity rules proposed by authors? Doesn’t conflict-free memory access pre-requisite (assumption) severely restrict the scope/size of these operations?

Posted by: Rahul Singh | March 14, 2017 02:06 AM

1. Summary
The paper presents commutativity of software interfaces as new approach to designing scalable software. This allows to reason about scalability even before a system can be implemented. Furthermore, a toolset, to identify when interfaces commute and evaluate the scalability of an application, is also designed for this purpose.

2. Problem
Typically, scalability of multi-core software was evaluated by choosing a workload, plotting performance for varying cores and using profiling tools to identify bottlenecks. However, such an approach had several drawbacks. Increasing number of cores and new workloads expose inevitably different bottlenecks. Also, understanding the bottlenecks is often time-consuming by which time it may be difficult to improve software interfaces. There was, thus, a need for novel way to understand the fundamental bottlenecks early to design scalable software.

3. Contributions
- The main contribution of this work is interface-driven scalability. As the rule says - ‘whenever interface operations commute, they can be implemented in a way that scales’.
- Commuter is a scalability testing tool which accepts high-level interfaces and generates tests of operations that commute and hence can scale.
- Different components of Commuter include Analyzer, Testgen and Mtrace. Analyzer takes an interface model as input and outputs the commutativity conditions. This saves developers from tedious process of considering large number of interactions in really complex operations. Testgen converts Analyzer’s commutativity conditions into concrete test cases. These test cases help to expose potential scalability problems in the implementation. Finally Mtrace runs these test cases on an implementation and reports any violations of the commutativity rule.

4. Evaluation
Commuter was applied to 18 POSIX calls and allowed for an implementation of new OS kernel called sv6. Two micro benchmarks, namely statbench and openbench were used to evaluate the scalability and performance of sv6 kernel on real hardware. Both of these scale really well. Finally, a simple mail server is used as a better representative of real application to evaluate sv6.

5. Confusion
Not exactly confusion but more of a question. How easy is it to address scalability from interface perspective for systems in general ?

Posted by: Dastagiri Reddy Malikireddy | March 14, 2017 02:00 AM

1. Summary
The paper demonstrates that unlike generic commutativity, SIM commutativity can be used to prove the existence of conflict-free implementation of an operation taking into account the state, the arguments and concurrent operations. Along with scalable commutativity rule, it introduces a tool called COMMUTER, which can guide in the scalable design of API.

2. Problem
Conventional methods to evaluate the scalability of multicore software involve many complexities like plotting performance at varying numbers of cores, and using tools such as differential profiling. Also this complex evaluation process is performed late in the design process, where improved design solutions might become impractical. This paper tries to solve to try this problem at the higher level – software interface.

3. Contributions
The paper introduces the scalable commutativity rule considering state, interface dependency and monotonicity involved in the operation. The paper formally defines this rule and proves why it is correct. Then based on the scalable commutative rule, the authors provide guidelines for defining commutative interface – breaking down compound operations (to almost do only one thing), encouraging freedom of implementation by embracing specification non-determinism, allow weak ordering, asynchronous release of resources.
The paper presents the COMMUTER, which gives an automated systematic approach to applying commutativity rule to real implementations of scalable software. The paper gives a detailed explanation of the three components of COMMUTER namely, Analyzer, Testgen and MTrace. Then the authors evaluate the scalability of 18 POSIX system calls using the COMMUTER.
To learn the cost and complexity of implementation of scalable file systems and virtual memory systems, the authors designed and implemented in-memory file system called ScaleFS and a virtual memory system called RadixVM for sv6, which is their research kernel based on xv6. From this study, the authors come up with patterns for making implementations scale – layer scalability, work deference, optimism before pessimism, avoid unnecessary reads.

4. Evaluation
The paper provides a systematic evaluation of scalability of POSIX system calls using COMMUTER. For each pair of system calls, the number of test cases which are conflict free are calculated and presented as a heat map. The paper first lays out the implementation of ScaleFS and RadixVM and their conflict-freeness, theoretically. Then their evaluation is done to show that they translate to scalability on real hardware. The scalability and performance is evaluated against two microbenchmarks – statbench and openbench and one application-level benchmark – mail server throughput. The paper clearly demonstrates the scalable performance improvement when compared against single core Linux.

5. Confusion
A discussion on different situations where state, arguments and monotonicity affect the commutativity will be helpful. Also, how is the extent of commutativity determined for the interfaces and scalability decided, where commutativity for different SIM is different?

Posted by: Sharath Hiremath | March 14, 2017 01:26 AM

Summary
The paper introduces a new approach called SIM commutativity to identify the scalability issues in the interfaces (like sys call). They also provide a tool called COMMUTER that accepts a higher level interface models to generate test cases to test the commutativity to help increasing scalability. The authors provide a new approach of tackling this problem: Identify the scalability issues at the interface design stage and fix the issues till the scalability reaches the target.

Problem
The current methods of analyzing the bottleneck, through running some benchmarks, plotting graphs and using tools track down scalability issues, inst' helpful. With complex interface, varied workloads and higher cores, it almost becomes impossible to indentify a scalable solution. And these debugging session happen much after design and implementation which makes changes in the interface (to improve scalability) almost impractical.

Contribution
Authors say that a set of operations scales if the implementations have conﬂict-free memory accesses. They propose a new approach called scalable commutativity rule which says: operations that commute have an implementation that are conflict-free and thus scalable; They reason about the relationship between commutativity and concurrency. The commutativity rule is less stringent that the algebraic one; The commutativity depends on state, argument and concurrent operations (SIM commutativity: state-dependent, interface-based, and monotonic). The main advantages of such an approach is that, before the implementation, the interface could be tested for commutativity. For analyzing complex interfaces and provide reasoning about the commutative cases, a tool named COMMUTER is used.

Evaluation
Authors evaluate their approach by showing that, using COMMUTER, sv6 could achieve 99% of conflict-free operations and through benchmarks and applications that sv6 implementations scale/speed-up linearly where as linux almost drops to zero. For each pair of system calls, the authors compare the number of test cases generated by COMMUTE are having conflicts; Linux seems to have huge number of conflicts file system call paris and memory operations (mmap and mprotect specifically). The same pair of sys calls in sv6 work with much less conflicts (one to two orders of magnitude). Authors also test throughput of two microbenchmarks (statbench and openbench) and mailserver application; Their implementation in all three of these test cases provide almost linear speed-up where as Linux almost touches the floor with the increase in cores.

Confusion/Question
Was wondering why, in openbench and mail server test cases, for first 10 cores the throughput took a dip (similar to linux) before maintaining consistent throughput ? What's so special about 10 (something related to sockets?) ?

Looks like a great idea for distributed systems ? How well is this work received in the industry ?

Posted by: Pradeep Kashyap Ramaswamy | March 14, 2017 01:18 AM

1. summary
This paper discusses commutativity and its role in allowing more scalable software in multicore environments.
2. Problem
Many OS interfaces are implemented in a way such that they are not able to commute. A non-commutable operation has a limited ability to scale from the fact that it is accessing the same information as the overlapping operations. It can be difficult to determine whether an interface is truly commutative, especially when there are many different usage cases in which it could or could not be.
3. Contributions
This paper introduces a tool that can determine if calls are commutative. It uses three components to first define and analyze the interface being scrutinized, generate a set of tests to cover all possible scenarios in which the operations could interact, and then trace the memory usages of the actual use to determine if it is indeed a commutative operation.
4. Evaluation
They used their tool to analyze the performance and scalability of many POSIX filesystem and memory functions for the standard Linux kernel and an alternative implementation that allows the operations to be far more commutative. They were able to show that the version using the more commutative operations was far more scalable for their test cases. On a machine with 80 cores, the modified kernel saw very minimal performance degradation as the number of cores in use increased while standard Linux saw sharp drop-offs.
5. Confusion
Why did they only analyze memory and file operations? It seems commutable operations should extend well beyond that.

Posted by: Taylor Johnston | March 14, 2017 01:04 AM

1) Summary

The increasing number of cores in microprocessors and machines means that improving performance requires scalable systems code. However, many current interfaces are not scalable, and designing such systems is already difficult. The authors find that if interfaces have commutative operations, then there exists a conflict-free implementation. They produce a tool which given an interface can analyze the scalability of the interface.

2) Problem

With the emergence of new multiprocessor systems, improving system performance requires writing scalable systems code. In particular, performance should increase linearly with the number of cores. In current systems, achieving this is difficult, though.

Observations from distributed systems and databases suggest that interfaces with commutative operations may make the task easier. The authors extend these observations to scalability of systems software.

3) Contributions

The authors's contributions are three-fold. First, the authors state and prove their theorem: if an interface contains commutative operations, there is a conflict-free implementation. This theorem leads to a number of rules of thumb for systems design. The authors propose decomposing large operations and avoiding needlessly strong interface guarantees by allowing nondeterminism, avoid ordering, and releasing resources asynchronously. Overall, the theorem greatly simplifies scalable design problems because it allows the designers to focus on the interface to improve scalability before ever writing any code.

The second contribution is an interface analysis tool which helps to find sources of non-conflict-free-ness and non-commutativity in interfaces. The aim of this tool is to further simplify the design task by automating reasoning about complex interfaces.

The third contributions is an analysis of the POSIX system call interface for commutativity. The analysis shows potential bottleneck in Linux and other POSIX-compliant systems. The authors use this analyis to design sv6, an OS that avoids a number of scalability bottlenecks in traditional systems.

4) Evaluation

The main theorem is simple and intuitive, yet remarkably useful. Likewise, the rules of thumb the authors give in section 4 are useful. Overall, theorem, design hints, and tool they develop work together to greatly simplify and offer guidance on a difficult problem.

Moreover, the paper was very well-written. Numerous times as I was reading the paper, the authors anticipated and addressed my objections or questions. The flow of the paper was very natural and its content was pretty comprehensive.

However, the authors never offer much supporting evidence for their rules of thumb. They propose them and give examples of their usefulness, but they do not list or address any caveats with these design hints.

Also, the authors admit that the scalable commutativity rule does not guarantee the existence of a scalable interface for all situations even with commutative operations. But they never address the likelihood that such an interface does not exist at all. The authors assume that for most problems a good interface exists, but are there problems for which no scalable interface can be developed?

5) Confusion

Do the authors intend their section 4 as suggestions or is it intended to be stronger? I don't believe the authors are claiming their rules are necessary or sufficient for scalability, but what are they claiming?

Posted by: Mark Mansi | March 14, 2017 12:54 AM

1. Summary
Clements et al. present the scalable commutativity rule that discusses the implications of interface design and as it names, scalability. The rule states “that whenever interface operations commute, they can be implemented in a way that scales”. Using this rule, they evaluate the efficiency of several linux system calls using COMMUTER.

2. Problem
The workload driven approach to finding bottlenecks in efficiency is ineffective. New bottlenecks arise from new workloads and evolution in hardware (eg higher core counts). What the workload driven approach misses is potential flaws in the interface design. How can we systematically and methodically evaluate interfaces to remedy scalability issues?

3. Contribution
The first contribution is the scalable commutativity rule. It finds its roots in cache coherence where they allude to MESI-like cache coherence protocols. Scalability can occur when implementations are have conflict-free memory accesses. This is backed by their observation in the statbench that a single L2 cache miss can severely inhibit scalability.

The second contribution is a three-part pipeline that contains an analyzer, test generator, and a trace evaluator. The analyzer takes in the python model which describes the interface and the underlying implementation requirements. In their example, rename has two core data structures, the inode map and the fname to inode map. The result is the commutativity condition, which feed in as input to the test generator. The test generator creates concrete test cases. Finally, using the test cases, MTRACE runs the entire OS in a modified version of QEMU logging numerous things, but most importantly the memory accesses and C data type information. Using this, they can identify conflicting accesses.

This culminates, and further justifies, three keys to implementing commutative interfaces: embrace specification, non-determinism, permit weak ordering, and release resources asynchronously. And ultimately minimize communication via lock-free data structures and optimistic concurrency.

4. Evaluation
They evaluate this platform on a heavily modified version of the xv6 kernel called sv6 and compare against a version of Linux 3.5.7. Pairs of a subset of system calls were evaluated using the COMMUTER framework and significant limitations were identified in the linux kernel. An implementation of COMMUTER-driven sv6 (components of interest being RadixVM and ScaleFS) demonstrated a significant improvement in commutability among the system calls.

5. Confusion
The observations make sense and there are advantages of having both a symbolic level model and actual implementation that you can use to crosscheck the output. Can you use some sort of code analysis technique to minimize the efforts of building these models and maintaining the models? While xv6 showed great improvement, it offers a significantly limited implementation of linux. As simply as they put it, why do we not use more lock-free data structures?

Posted by: Dennis Zhou | March 14, 2017 12:37 AM

Summary:
This paper proposes to use commutativity rules to help designers design scalable system software by identifying the scalability of interface designs. They also design a toolset to identify the ‘conflict free’ scenarios of interface design and to automatically generate test cases to check if the implementation fully exploit the scalability opportunities provided by the interface design.
Problem:
Traditionally, people use profiling to evaluate the scalability of systems. However, this approach falls short in two perspectives. Firstly, it does not convey the real bottleneck of the system, and secondly, profiling can only be done after the system is implemented, which is too late for making good design decisions. Reasoning about scalability bottlenecks are hard before actually implementing the systems and even experienced system designers can hardly cover all conflict cases.
Contributions:
This paper proposes SIM commutativity, which is a novel form of commutativity that is state dependent, interface based and monotonic. Based on SIM, the authors prove by construction that if the interface design commutes, the system can be implemented in a scalable way.
The authors design a toolset Commuter to help programmers reason about interface commutativity and check implementation scalability. First, Analyzer takes a symbolic model written in python and computes conditions in which interface operations commute. Then Testgen takes in the conditions and generate test cases that should be conflict free for the implementation. Finally system designers can use Mtrace to detect if the implementation is conflict free under the generated test cases.
Evaluation:
In order to show that Commuter can actually help kernel developers, authors implement simplified models of POSIX file system and virtual memory API and identified that about one third of the generate test cases are not conflict free on Linux implementation, while another operating system sv6 whose design and implementation is guided, is conflict free in almost all test cases. For performance evaluation on real hardware, authors use two microbenchmarks and one real world workloads and test sv6 and linux 3.5.7 on 80 cores and 256GB memory. Results show that sv6 is much more scalable than linux.
Confusion:
It’s interesting to see logic proof in system papers, but the constructive proof in this paper seems to me that authors have shown that we can construct scalable implementations anyway with impractical rules by applying commutative rules, but system designers for responsible with implementing real systems with practical methods, which is kind of unconvincing.
Besides, to achieve good coverage, the analyzer model has to be implemented in detail. Won’t this add huge cost to software development?

Posted by: Yanqi Zhang | March 14, 2017 12:17 AM

1. Summary
This paper examines the lack of scalability being utilized in systems and implements the tool COMMUTER to find conditions where operations commute.

2. Problem
The current way of evaluating the scalability of a system is to plot the performance over multiple cores of a set workload. This means that different workloads or different amounts of cores could exhibit new bottlenecks. This evaluation also occurs late in the development process meaning there is no time for improving the interfaces, which also could have their own bottlenecks.

3. Contributions
The drive behind this paper’s research was finding scalability through commutativity. The basic idea being that if operations commute, their results are independent of order, then there is no communication necessary to cause synchronization issues. Since determining commutativity by hand would be hard and tedious the paper proposes the COMMUTER tool made up of ANALYZER, TESTGEN, and MTRACE tools. The ANALYZER generates the commutativity conditions, which are when sets of operations commute. The TESTGEN then takes these conditions and generates test cases for them. The MTRACE finally uses the test cases with an implementation to check for conflicts with each test.

4. Evaluation
The tools were tested using several benchmarks on an 80-core system. The paper found that for the ramfs file system on Linux that about one third of all cases were not conflict free. Overall the paper finds using several benchmarks that even Linux fails to obtain the kind of scalability found using the commutativity property.

5. Confusion
Is there a reason that file descriptors returned are deterministic? The paper makes it seem that this is a very good example of over determinism and use it as a way to add scalability.

Posted by: Brian Guttag | March 13, 2017 11:34 PM

1. Summary
The paper discusses a new approach called SIM commutativity, to identify and address scalability issues by focusing on interface operations. The authors present the rationale behind the commutativity rule and explain how it can help design more scalable systems. Tools are developed to help identify the conditions in which the interfaces commute, generate test cases to help verify the scalability of implementations.

2. Problem
The authors argue that the current iterative approach to scalability of designing, implementing, measuring and repeating is too time-consuming and very ad-hoc. As the number of cores increase, new bottlenecks will emerge and cause the same process to be repeated. They make the case for a new approach to help design more scalable systems and help developers understand the scalability bottlenecks with their code. They present the SIM commutativity rule to designing interfaces, so that various threads using the interfaces can safely commute or interleave without the need for locks and serializable access.

3. Contributions
1. SIM commutativity rule - They provide a formal specification of the commutativity rule which defines how interfaces ought to be designed for scalability.
2. COMMUTER - Building on top of its definition, a tool called COMMUTER is developed to help developers analyze the commutativity of their interfaces. This tool has three main components:
2.1. ANALYZER - It uses symbolic execution to explore the execution paths used by the interfaces to identify the conditions under which the commutativity of the interfaces hold.
2.2. TESTGEN - This builds on the output of the previous stage to generate test cases to help verify whether the commutativity conditions are indeed being correctly satisfied by an implementation.
2.3. MTRACE - Runs the test cases generated on a real implementation and reports memory access conflicts and shared variables for tests which failed.
3. sv6 - A new OS kernel called sv6 is designed to evaluate the scalability of a system where interface operations commute.

4. Evaluation
MTRACE was used to check the test cases for the ramfs in Linux kernel version 3.8, and found that 32% tests widely distributed across the system call pairs, were not conflict-free. A common source of access conflicts was found to be shared reference counts. They also ran the same tests for a ramfs-like in-memory file system called ScaleFS in their own kernel sv6, and found coverage of 99% of the test cases. They also compared the scalability of Linux and sv6 using two micro benchmarks and one application benchmark on an 80-core machine. For all benchmarks, it was found that sv6 could easily scale to 80 cores, whereas Linux’s performance quickly tailed off because of the shared access conflicts. sv6’s superior performance was due to its extensive use of techniques such as scalable data structures, deferring work and decomposing compound interfaces.

5. Confusion
A review of monotonic and non-monotonic SI commutativity will be helpful.

Posted by: Karan Bavishi | March 13, 2017 11:16 PM

1. Summary
The paper introduces the rule: Whenever interface operations commute, they can be implemented in a way that scales. A tool that accepts high-level interface models and generates tests for operations that commute is developed. A highly scalable operating system named sv6 is also demonstrated and evaluated.

2. Problem
With multicore processors becoming the norm, it is extremely important to develop software that scales with the number of processor cores. Many applications depend on the kernel, and hence they don't scale if the kernel doesn't scale. Today, most of the bottlenecks for scaling are identified through profiling techniques. Sometimes, it is not apparent if a scalable solution is even possible. If it is, it is too late to design such a solution late in the project.

3. Contributions
The paper introduces the scalable commutativity rule which is based on the observation that if operations have conflict-free memory accesses, i.e., they do not write cache lines that was read or written by another core, then they scale almost linearly with the number of cores. The rule enables reasoning about scalability very early in the design process. The scalable commutativity rule is formalized in the paper and a proof of correctness is provided. A tool named Commuter is introduced that applies this rule to real systems. The tool has three components. Analyzer, which takes a Python symbolic model and identifies conditions in which the operations of an interface can commute. Testgen, which generates test cases for these commutative operations. Mtrace, which tests whether the implementation is conflict-free for each test. Several POSIX file system and virtual memory calls were analyzed with Commuter and it was observed that 69% of system call pairs in Linux scaled for tests generated by Commuter. A scalable implementation of the xv6 kernel named sv6 could achieve scalability for 99% of system call pairs.

4. Evaluation
Scalability and performance on real hardware with sv6 kernel were measured with two microbenchmarks named openbench and statbench. The microbenchmarks scale almost perfectly except for the case when statbench does fstat and link which do not commute. Similar results are obtained for a simple mail server which represents a more realistic workload.

5. Confusion
I did not quite understand the "CONTINUE" actions in section 3.3

Posted by: Suhas Pai | March 13, 2017 11:16 PM

1.Summary
Authors have looked at increasing the scalability of the system by identifying the commutativity in the interface operations and performing those operations in parallel instead of serializing them. A tool called commuter has been introduced which takes the interface models as inputs and and tests for commutativity under various conditions.

2. Problem
Some of the scalability issues are due to the poor implementations of the interfaces and these issues could be fixed if the developers had the prior knowledge of these issues in the stages of the development. After the software/product is completely built due to the complex interface models it is a difficult task to analyze if the interface operations are commutative and could be fixed which improves the scalability.

3. Contributions
Even though the commutative rule had been explored to examine the concurrency of the operations, this was the first effort to examine commutativity rule to improve the scalability of the systems. SMI commutativity is a version of commutativity rule which keeps in mind the state of the system, the arguments passed and the concurrent operations. The rule states that if the operations are SMI commutative then there exists a conflict-free implementation of the for that specific state, the specific arguments and concurrent operations. This rule has been formally defined, proved and discussed in the paper.A tool called COMMUTER was designed to identify all the operations that are commutative from the complex interface models and then in the Testgen step test cases were generated to test the commutative operations. Finally, Mtrace step made sure that the implementation was conflict free for each of the test cases.

4. Evaluation

The sv6 research kernel was modified to include the commutative rule proposed. Micro benchmarks such as fstat and link pair and also the mail server benchmarks were run both on linux and the modified sv6 kernel. The graphs show the significant improvement in the scalability in case of modifies sv6 compared to linux.

5. Confusion
Is SMI commutativity rule used for avoiding scalability issues in the systems currently ?

Posted by: Sowrabha Horatti Gopal | March 13, 2017 10:55 PM

1. Summary

This paper claims that best way to design scalable systems is to provide commutable interface operations. Whenever such operations are available, there exists an implementation that provides true parallelism.

2. Problem

In traditional system design, developers first build an initial version of the system. Then benchmark it against some workload, find scalability bottlenecks, and modify the system to fix those. And then this loop repeats until some performance objectives are achieved. However, this approach requires huge amount of manual effort and the eventual system scalable with one workload may not do well with another workload. Moreover, this loop also ignores the fact that the real bottleneck maybe in the interface of the system. So their exists a need to design the system in a scalable way from the beginning such these problems do not arise.

3. Contributions

The basic idea of this paper is that whenever interface operations commute, they can be implemented in a way that scales. This is because whenever the operations are commutable, their results are independent of the order in which they are carried out without any race conditions. The authors also formalize this rule in terms of histories and reorderings of operations in a thread. The authors have also implemented a tool called commuter. This tool takes an interface as an input and generates tests of operations which can scale. The developers can use this tool in the interface design of the system.

4. Evaluation

The authors use the commuter tool to test the linux kernel for scalable interfaces. Out of 13,664 cases that it tested it found that 32% of the them do not scale and their interface is the bottleneck. These interfaces could become bottlenecks in many future applications. The authors implement their own kernel sv6 which is a scalable kernel and the commuter finds that in sv6 about 99% of the interface operations are scaleable. The authors further tested their implementation with microbenchmarks including fstat, openbench and mail server against an increasing number of cores. It shows that with commutative APIs the operations per second for all of these benchmarks do not change and remain very high in contrast to linux which degrades very quickly.

5. Confusion

Was this system adopted in the industry anywhere? How likely is it that it will?

Posted by: Hasnain Ali Pirzada | March 13, 2017 10:32 PM

1. Summary
This paper introduces scalable commutativity rule. The rule is the guideline to implement scalable software. The main idea is whenever operations commute, they can be implemented in a scalable way.

2. Problem
The main problem is that scalability is often considered during implementation of interface. But scalability should be considered and proved formally early during design of the interface. In addition, even though scalability is considered carefully during many rounds of implementation, it is not evaluated in a formal way. Evaluating scalability under certain workloads is good for developers to identify issues, but hard for them to realize the fundamental bottlenecks in software design.

3. Contributions
This paper's contribution is to show us how to formally define commutativity of interface operations. The paper defines the SIM (state-dependent, interface-based and monotonic) commutativity and scalable commutativity rule. Based on SIM commutativity rule, the authors designed COMMUTER toolchain, which consists of a tool to calculate conditions that is commutative (ANALYZER), a tool to generate concrete test cases (TESTGEN), and a tool to test whether a implementation is scalable (MTRACE).

4. Evaluation
The authors implemented a research OS (sv6) based on guideline from their toolchain COMMUTER, with scalability started from early design. The experiment is done in a 80-core machine with 256G memory. To test their sv6's scalability, the authors compared it to Linux 3.5.7 with two microbenchmarks and one application benchmark. Two microbenchmarks is to test scalability of operations link/fstat and open/close respectively. Application benchmark is a workload of a simple mail server. The benchmarks show that sv6 has far better scalability than Linux. It is interesting to see whether sv6 scales better in a more realistic and distributed setting.

5. Confusion
Are symbolic execution methods (include concolic execution) time consuming when faced with complex interface? This paper seems to introduce a big idea, is there any new progress made during last several years? (The first author had a open source version sv6 in github, but it seems to be not maintained anymore)

Posted by: Cheng Su | March 13, 2017 10:14 PM

Summary:
The paper presents the idea of using commutativity as an interface design requirement to improve scalability. The authors define Scalable Commutativity rule and introduce a tool called COMMUTER to automatically develop test cases and to test if a particular implementation commutes.

Problem:
Identifying scalability bottlenecks of multicore software is difficult. Different workloads or higher core counts often exhibit new bottlenecks. Then considering scalability as implementation property rather than an interface property makes matter worse. After implementation stage, solutions such as improved interfaces are impractical. To solve this problem, the authors suggest a new approach to scalability that starts at software interface level.

Contributions:
The authors present a novel form of commutativity called SIM commutativity which is state dependent, interface based and monotonic. The authors show that when operations commute in the context of a specific system state, specific operation arguments and specific concurrent operations, there exists an implementation that is conflict free. Since it is difficult to spot and reason about all commutative cases even with the rule, the authors developed a tool called COMMUTER that automates this reasoning. Analyzer stage in Commuter takes an interface model as an input and computes precise conditions under which that model commutes. The Testgen stage takes these conditions and generates test cases and then Mtrace stage checks whether a particular implementation is conflict free for each test case.

Evaluation:
The authors modeled various POSIX file system and virtual memory calls in COMMUTER to determine their commutativity. Performance and scalability of conflict free implementations were evaluated by running two micro benchmarks and an application level benchmark on 80 core machine. With the configuration that uses commutative APIs, the application benchmarks is shown to be 7.5x scalable from 1 socket to 8 sockets.

Confusion:
The computational complexity of Commuter seems to be huge. How feasible it would be to use this tool for designing a real software?

Posted by: Neha Mittal | March 13, 2017 10:09 PM

1. summary
This paper shift the post implementation scalability analysis to the software interface level, which exposes more opportunities for scalability.

2. Problem
The problem is how to design and analyze the scalability of mutlicore software. The convetional approach is to choose a workload, measure the performance over varying numbers of cores, and uses tools such as differential profiling to detect the bottlenecks. Such a method has two drawbacks: 1. it is hard to tell whether different workloads and higher core counts causes the fundamental bottlenecks; 2. The analysis happens too late to make design-level solutions practical.

3. Contributions
Instead of doing a post implementation analysis, the authors move the analysis to a higher level: the software interface, which can be independent from the implementation. This makes reasoning about scability possible before an implementation exists and before the necessary hardware is available to measure the implementations's scability. This idea is also helpful to highlight inherent scalability problems and set clear scaling target. The second contribution is the formalization of the
scalable commutative rule. This formalization is useful for guiding the design of scalable software interface. Based on the formality of the commutativity rule, the authors present a systematic, test-driven approach applied to real implementations and construct a tool COMMUTER that automating the reasoning about scalability.

4. Evaluation
The authors run experiments on an 80-core machine with eight 2.4GHz 10-core Intel E7-8870 chips and 256GB RAM and use Linux 3.5.7 running on a single core as the performance baseline. The experiments consists of two microbenchmarks and a real application. Each benchmark has two variants, one that uses standard, non-commutative POSIX APISs and another that accomplishes the same task using the modified, more broadly commutatvie APIs. The statbench shows that commutative implementations can avoid the overhead of parallel fstat operations. In openbench, non-commutative open interface limits openbench's scalability, while openbench with O_ANYFD(commutative) scales linearly. in the mail server application, using commutative APSs achieves 7.5x scalability.

5. Confusion
What does monotonic mean for communitivity?
How generalizable is this work? From the evaluation part it seems for each specific task we need to design commutative APIs.

Posted by: Huayu Zhang | March 13, 2017 10:01 PM

1. Summary
This paper focuses on designing scalable software using commutativity rule. It proposes SIM commutativity to help design scalable software interface, and provides a COMMUTER tool to validate and test commutativity.

2. Problems
The problem this paper tries to solve is how to build a scalable system. The previous method is, implement the whole system, choose a workload, plot the performance, identify the bottleneck, and make some changes to the design. However, evaluation happens too late, so it will be a lot of work to make changes to the system, also, it is hard to distinguish whether the bottleneck is because of implementation, or is fundamental and inherent from the interface.

3. Contributions
This paper gives a formal definition and proof of correctness of scalable commutativity rule. If two operations commute, their results are independent of order, there is no way to distinguish execution order using interface, thus there is no need for waiting, communication between these operations, and these operations can be implemented in a way that scales. Using commutativity rule, this paper proposes a new method of developing scalable system. First define the interface, then analyze the interface’s commutativity, compute conditions under which the operations commute, then implement the details under these conditions. This paper also develops some tools like COMMUTER to help use design scalable system.

4. Evaluation
The evaluation section of this paper contains two parts. First, it uses COMMUTER to analyze the linux interface and sv6, and identify the number of syscall pairs that does not scales. SV6, which is designed using commutativity rule, contains less cases where is can’t scale. Second, on real hardware, this paper tests the performance using microbenchmarks and mail server benchmark, the results shows a good scalability improvement, especially for a large number of cores. The throughput per core almost does not change even for a large number of cores.

5. Confusion
(1). How difficult will it be to model an interface? Figure 4 gives an example of rename interface, but that is a very simple example. For complex interfaces, I think defining the correct model will be time consuming, and how to prove the model itself is correct is also another problem.
(2). Is commutativity between pair of operations equals to commutativity of a sequence with many kinds of operations? This paper seems to focus on the commutativity of operation pairs, like in Figure 6.

Posted by: Tianrun Li | March 13, 2017 08:51 PM

Summary

The authors claim good interface design is one of the cornerstones of
building scalable systems. The key idea is to group operations into
conflict free operations. If a long complex one cannot be grouped this
way. Break it down into states, or parts where given the things have
happenened the remaining bit can be done in parallel w/o
conflicts. They introduce a semi formal model to mathematically
express this idea. They also wrote an analyser that implements this
idea for symbolic python code.

Problem statement

The standard procedure for identifying bolttlenecks involves using
some sort of differential profiler, plotting performance vs number of
cores as they limit one of the resource and this leads to
identification.

Different workloads or higher core counts often exhibit new
bottlenecks. It’s unclear which bottlenecks are fundamental, so
developers may give up without realizing that a scalable solution is
possible

Authors believe that often poor performance is not due to lack of
hardware resources but due poor interface design which prevents the
software from scaling as expected.

Contribution

Notion of SI scalable commutativity: Fancy way of saying it order of
operations don't matter, so the interface design should not be able to
let the user infer/control anything about the order of
operations. Absolute commuativity is nearly impossible in systems so
some measure of state is kept and given that state operations may
commute.

* analyze the interface’s commutativity i.e given state and interface
what are the order constraints?
* then design your system

Since order doesn't matter no synchronisation needed between each
operation => Linear scalability

FIGURING OUT IF SET OF OPERATIONS ARE COMMUTATIVE:

Implementation:
m : Domain -> Range m is a map or what they call implementation

Domain: (State x Invocation)
Range : (State x Response)

Continue : is a valid response indicating thread hassn't finished.

An implementation is correct if every response is valid. (Don't need to generate all possible valid responses)

A response is valid if H is a valid history, r is the response and m
can generate H || r. The order of operations to generate r won't
matter. As long as the machine is in the state provided by H all good.

Comments: This idea is very popular in Machine learning-> Given
workload construct a conflict graph. Figure out which operations need
to be done serially and which can be done in parallell. Then bunch up
each group into different cores and scale. Cyclades is a sgd algorithm
that does the exact same thing. They provide a good example using file
creation scalability. The file creation implementation can be modified
into doing soemthing like this I believe. If no ones already done it,
this could be a fun hobby project.

They wrote an ANALYZER automates the process of analyzing the com-
mutativity of an interface, saving developers from the tedious and
error-prone process of considering large num- bers of interactions
between complex operations. AN- ALYZER takes as input a model of the
behavior of an interface, written in a symbolic variant of Python, and
outputs commutativity conditions: expressions in terms of arguments
and state for exactly when sets of operations commute.

Confusion

Idea of monotonicity is a bit unlcear.

Posted by: Ari | March 13, 2017 05:44 PM

1. Summary
The scalable commutativity rule says that if an interface allows for commutative parallel operations, there exists possible implementations that will scale well.

2. Problem
It is difficult to reason about the intrinsic scalability of a system while designing it. Usually, the design process involves building a system and then testing how well it scales. As the number of cores in conventional systems continues to increase, scalability must become a first-order design constraint. However, there are no available formal methods for reasoning about a system’s scalability that are not tied to specific implementation details.

3. Contributions
The authors provide a formal definition of the scalable commutativity rule for interface design. Informally, it says that if an interface’s operations are commutative, it is possible to build an implementation with that interface that will scale. Commutativity needs only hold for a specific system state for the rule to be effective, allowing discovery of more opportunities for scaling.
They integrate the scalable commutativity rule into a formal checker tool that checks an interface for scalability and produces test code for implementations of that interface.
They apply their tester to a subset of the POSIX interface and identify opportunities to make it more scalable, including allowing calls to only read the fields they need and allowing for non-determinism in the interface definition.

4. Evaluation
They run the tool on part of the POSIX FS interface and identify many areas for improvement.
They implement their improvements in a scalable virtual memory system (RadixVM) and file system (ScaleFS). Both are implemented in the sv6 research kernel.
They run microbenchmarks for file stating and creation, both of which wipe the floor with normal Linux in terms of scalability.
They also run a mail-server benchmark, with similar good results.

5. Confusion
In explaining the “Commuter” tool (Section 5.1), they comment that it uses “the definition of SIM commutativity […] except that it assumes the specification is sequentially consistent.” What do they mean by that? Is that a big deal? Is it just a trivial detail? It feels like it might be a big deal, but I’m not sure.

Posted by: Mitchell Manar | March 13, 2017 02:18 PM

1. Summary
The scalable commutativity rule says that if an interface allows for commutative parallel operations, there exists possible implementations that will scale well.

2. Problem
It is difficult to reason about the intrinsic scalability of a system while designing it. Usually, the design process involves building a system and then testing how well it scales. As the number of cores in conventional systems continues to increase, scalability must become a first-order design constraint. However, there are no available formal methods for reasoning about a system’s scalability that are not tied to specific implementation details.

3. Contributions
The authors provide a formal definition of the scalable commutativity rule for interface design. Informally, it says that if an interface’s operations are commutative, it is possible to build an implementation with that interface that will scale. Commutativity needs only hold for a specific system state for the rule to be effective, allowing discovery of more opportunities for scaling.
They integrate the scalable commutativity rule into a formal checker tool that checks an interface for scalability and produces test code for implementations of that interface.
They apply their tester to a subset of the POSIX interface and identify opportunities to make it more scalable, including allowing calls to only read the fields they need and allowing for non-determinism in the interface definition.

4. Evaluation
They run the tool on part of the POSIX FS interface and identify many areas for improvement.
They implement their improvements in a scalable virtual memory system (RadixVM) and file system (ScaleFS). Both are implemented in the sv6 research kernel.
They run microbenchmarks for file stating and creation, both of which wipe the floor with normal Linux in terms of scalability.
They also run a mail-server benchmark, with similar good results.

5. Confusion
In explaining the “Commuter” tool (Section 5.1), they comment that it uses “the definition of SIM commutativity […] except that it assumes the specification is sequentially consistent.” What do they mean by that? Is that a big deal? Is it just a trivial detail? It feels like it might be a big deal, but I’m not sure.

Posted by: mitchell.manar@gmail.com | March 13, 2017 02:18 PM

1. summary
The paper defined SIM commutativity rule, which resons about scalability from interface and thus enable developers fomulate the expectation of scalability for certain inferface design. They also provided COMMUTER tool, to help developers to analyze interface commutativity and generate test cases to verify the implementation.

2. Problem
a). To make program scalable to multi-cores.
b). Current prctice is suffering for the iterative process to improve scalability: design, implement, measure, repeat.
c). Need a model to fomulate the scalability potential and trade-off to made clear for multi-core conditions.
d). commutativity rule is useful, but previous work still has gap to direcly apply to scalability.

3. Contributions
a). Scalability resoning clearly defined by SIM commutativity rule
b). define commutativity rule in two steps: reordering & prefix
c). Four problems of POSIX interface or more general interfaces as guideline: decompose compound operations, embrace specification non-determinism, permit weak ordering and release resources asynchronously.
d). a prototype tool COMMUTER: systematic, test-driven approach to applying the commutatility rule to implementation.
e). Confilct coverage to test under the condition of multi-core uncertainty.

4. Evaluation
a). microbenmark: fstat, link pair
b). benchmark: mail-server
c). The graph result is generally good and indicates great scalability improvement. But one thing is that the evaluation only covers very small set of evaluation space, just one pair, and operation, and a not very general application.
d). I'm wondering Fig.7 (3), 1 core - 10 cores: it seems that the scalability is the same as the older one, althrough the overall performance is much better.

5. Confusion
a). They just use pairs (said typically in P9), but is this really true? Or any model to demonstrate that a pair access confict is dominate than others like a set of operation that cannot reorder?
b). Not very clearly about what they do to exploit all conflict coverage.

Posted by: Jing Liu | March 13, 2017 08:36 AM

CS 736 Reviews - Spring 2017

The scalable commutativity rule: designing scalable software for multicore processors

Comments

Post a comment