This research was conducted by Ben Liblit, Yingjun Lyu, Rajdeep Mukherjee, Omer Tripp, and Yanjun Wang. The paper appeared in the 12th ACM SIGPLAN International Workshop on the State Of the Art in Program Analysis (SOAP 2023).
Running static analysis rules in the wild, as part of a commercial service, demands special consideration of time limits and scalability given the large and diverse real-world workloads that the rules are evaluated on. Furthermore, these rules do not run in isolation, which exposes opportunities for reuse of partial evaluation results across rules. In our work on Amazon CodeGuru Reviewer, and its underlying rule-authoring toolkit known as the Guru Query Language (GQL), we have encountered performance and scalability challenges, and identified corresponding optimization opportunities such as, caching, indexing, and customization of analysis scope, which rule authors can take advantage of as built-in GQL constructs. Our experimental evaluation on a dataset of open-source GitHub repositories shows 3× speedup and perfect recall using indexing-based configurations, and 2× speedup and 51% increase on the number of findings for caching-based optimization.
The full paper is available as a single PDF document. A suggested BibTeX citation record is also available.