|CS 300 Programming II ( canvas )
|CS 354 Machine Organization and Programming ( course page )
|Basic Metal Lathe and Milling Machine Training, The Bodgery Makerspace
|Guest Lecturer: rustc, CS 536 Compilers ( video, slides )
|Guest Lecturer: Unsafe Rust, CS 538 Programming Languages (slides)
|CS 302 Intro Programming (TA)
|CS 361S Network Security, UT Austin (TA) (course page)
Characterizing Physical Memory Fragmentation.
Mark Mansi and Michael M. Swift. 2024.
PDF (open access!),
External fragmentation of physical memory occurs when adjacent differently sized regions of allocated physical memory are freed at different times, causing free memory to be physically discontiguous. It can significantly degrade system performance and efficiency, such as reducing the ability to use huge pages, a critical optimization on modern large-memory system. For decades system developers have sought to avoid and mitigate fragmentation, but few prior studies quantify and characterize it in production settings.
Moreover, prior work often artificially fragments physical memory to create more realistic performance evaluations, but their fragmentation methodologies are ad hoc and unvalidated. Out of 13 papers, we found 11 different methodologies, some of which were subsequently found inadequate. The importance of addressing fragmentation necessitates a validated and principled methodology.
Our work fills these gaps in knowledge and methodology. We conduct a study of memory fragmentation in production by observing 248 machines in the Computer Sciences Department at University of Wisconsin - Madison for a week. We identify six key memory usage patterns, and find that Linux's file cache and page reclamation systems are major contributors to fragmentation because they often obliviously break up contiguous memory. Finally, we create andúril, a tool to artificially fragment memory during experimental research evaluations. While andúril ultimately fails as a scientific tool, we discuss its design ideas, merits, and failings in hope that they may inspire future research.
Modernizing Operating System Kernel Memory Management for 1st-Party Datacenter Workloads.
Mark Mansi (Doctoral Dissertation advised by Michael M. Swift).
University of Wisconsin - Madison. May 2023.
Memory management (MM) is a core responsibility of operating system kernels. It has a crucial impact on system performance, efficiency, and cost, especially in large-scale deployments. Meanwhile, computing hardware and software has evolved significantly since the first operating systems were designed. In particular, our work is inspired by two high-level trends: increasing memory sizes and the rise of warehouse-scale computers. The core claim of our work is that system designers must use systematic analysis of MM operations and measurement-based design that prioritizes reasoning about system behavior as a first-class concern. This includes building new tools and methodologies for analyzing system behavior and eliminating heuristic-based policies and ad hoc algorithms from MM.
Our work identifies three key problems for modern MM, and takes steps to address them. First, we identify the Capacity Scaling Problem, wherein system software fails to run as well on systems with terabytes of DRAM as it does on current systems. We design, implement, and validate 0sim, a simulator for detecting Capacity Scaling issues in system software using existing commodity hardware. We demonstrate 0sim's utility for debugging and quantifying problems and for prototyping solutions.
Second, we identify sources of kernel MM design and behavioral complexity that hinder performance debugging and improvement, especially at scale. We design, implement, and evaluate Cost-Benefit Memory Management (CBMM), a novel memory management approach based on the idea that all MM operations have a cost and a benefit to userspace and that the benefit should outweigh the cost.
Finally, we quantify the importance of physical memory fragmentation as the need for huge pages and other contiguous memory allocations grows. We conduct a study of physical memory fragmentation in live systems by instrumenting 248 infrastructure and compute cluster machines at UW-Madison's Computer Sciences Department and Center for High-Throughput Computing. We identify memory reclamation and file cache usage as key sources of memory fragmentation on Linux, make several observations about common MM behavior across observed systems, and draw conclusions about several potential improvements to MM.
STYX: Exploiting SmartNIC Capability to Reduce Datacenter Memory Tax.
Houxiang Ji, Yan Sun, Mark Mansi, Yifan Yuan, Jinghan Huang, Reese Kuper, Michael M. Swift, Nam Sung Kim.
PDF/video/slides (open access!; hopefully available by Sept 2023),
Memory optimization kernel features, such as memory deduplication, are designed to improve the overall efficiency of systems like datacenter servers, and they have proven to be effective. However, when invoked, these kernel features notably disrupt the execution of applications, intensively consuming the server CPU's cycles and polluting its caches. To minimize such disruption, we propose STYX, a framework for offloading the intensive operations of these kernel features to SmartNIC (SNIC). STYX first RDMA-copies the server's memory regions, on which these kernel features intend to operate, to an SNIC's memory region, exploiting SNIC's RDMA capability. Subsequently, leveraging SNIC's (underutilized) compute capability, STYX makes the SNIC CPU perform the intensive operations of these kernel features. Lastly, STYX RDMA-copies their results back to a server's memory region, based on which it performs the remaining operations of the kernel features. To demonstrate the efficacy of STYX, we re-implement two memory optimization kernel features in Linux: (1) memory deduplication (ksm) and (2) compressed cache for swap pages (zswap), using the STYX framework. We then show that a system with STYX provides a 55-89% decrease in 99th-percentile latency of co-running applications, compared to a system without STYX, while preserving the benefits of these kernel features.
FBMM: Using the VFS for Extensibility in Kernel Memory Management.
Bijan Tabatabai, Mark Mansi, Michael M. Swift.
PDF (open access!), ACM DL,
Modern memory hierarchies are increasingly complex, with more memory types and richer topologies. Unfortunately kernel memory managers lack the extensibility that many other parts of the kernel use to support diversity. This makes it difficult to add and deploy support for new memory configurations, such as tiered memory: engineers must navigate and modify the monolithic memory management code to add support, and custom kernels are needed to deploy such support until it is upstreamed.
We take inspiration from filesystems and note that VFS, the extensible interface for filesystems, supports a huge variety of filesystems for different media and different use cases, and importantly, has interfaces for memory management operations such as controlling virtual-to-physical mapping and handling page faults.
We propose writing memory management systems as filesystems using VFS, bringing extensibility to kernel memory management. We call this idea File-Based Memory Management (FBMM). Using this approach, many recent memory management extensions, e.g., tiering support, can be written without modifying existing memory management code. We prototype FBMM in Linux to show that the overhead of extensibility is low (within 1.6%) and that it enables useful extensions.
Policy/mechanism separation in the Warehouse-Scale OS.
Mark Mansi and Michael M. Swift. 2023.
PDF (open access!),
"As many of us know from bitter experience, the policies provided in extant operating systems, which are claimed to work well and behave fairly 'on the average', often fail to do so in the special cases important to us" [Wulf et al. 1974]. Written in 1974, these words motivated moving policy decisions into user-space. Today, as warehouse-scale computers (WSCs) have become ubiquitous, it is time to move policy decisions away from individual servers altogether. Built-in policies are complex and often exhibit bad performance at scale. Meanwhile, the highly-controlled WSC setting presents opportunities to improve performance and predictability.
We propose moving all policy decisions from the OS kernel to the cluster manager (CM), in a new paradigm we call Grape CM. In this design, the role of the kernel is reduced to monitoring, sending metrics to the CM, and executing policy decisions made by the CM. The CM uses metrics from all kernels across the WSC to make informed policy choices, sending commands back to each kernel in the cluster. We claim that Grape CM will improve performance, transparency, and simplicity. Our initial experiments show how the CM can identify the optimal set of huge pages for any workload or improve memcached latency by 15%.
CBMM: Financial Advice for Kernel Memory Managers.
Mark Mansi, Bijan Tabatabai, Michael M. Swift.
PDF/video/slides (open access!), PDF with full data,
First-party datacenter workloads present new challenges to kernel memory management (MM), which allocates and maps memory and must balance competing performance concerns in an increasingly complex environment. In a datacenter, performance must be both good and consistent to satisfy service-level objectives. Unfortunately, current MM designs often exhibit inconsistent, opaque behavior that is difficult to reproduce, decipher, or fix stemming from (1) a lack of high-quality information for policymaking, (2) the cost-unawareness of many current MM policies, and (3) opaque and distributed policy implementations that are hard to debug. For example, the Linux huge page implementation is distributed across 8 files and can lead to page fault latencies in the 100s of ms.
In search of a MM design that has consistent behavior, we designed Cost-Benefit MM (CBMM), which uses empirically based cost-benefit models and pre-aggregated profiling information to make MM policy decisions. In CBMM, policy decisions follow the guiding principle that userspace benefits must outweigh userspace costs. This approach balances the performance benefits obtained by a kernel policy against the cost of applying it. CBMM has competitive performance with Linux and HawkEye, a recent research system, for all the workloads we ran, and in the presence of fragmentation, CBMM is 35% faster than Linux on average. Meanwhile, CBMM nearly always has better tail latency than Linux or HawkEye, particularly on fragmented systems. It reduces the cost of the most expensive soft page faults by 2-3 orders of magnitude for most of our workloads, and reduces the frequency of 10-1000-μs-long faults by around 2 orders of magnitude for multiple workloads.
0sim: Preparing System Software for a World with Terabyte-scale Memories.
Mark Mansi and Michael M. Swift.
PDF, video, slides, code, ACM DL,
Recent advances in memory technologies mean that com- modity machines may soon have terabytes of memory; how- ever, such machines remain expensive and uncommon today. Hence, few programmers and researchers can debug and prototype fixes for scalability problems or explore new system behavior caused by terabyte-scale memories.
To enable rapid, early prototyping and exploration of system software for such machines, we built and open-sourced the 0sim simulator. 0sim uses virtualization to simulate the execution of huge workloads on modest machines. Our key observation is that many workloads follow the same control flow regardless of their input. We call such workloads data-oblivious. 0sim harnesses data-obliviousness to make huge simulations feasible and fast via memory compression.
0sim is accurate enough for many tasks and can simulate a guest system 20-30x larger than the host with 8x-100x slowdown for the workloads we observed, with more compressible workloads running faster. For example, we simulate a 1TB machine on a 31GB machine, and a 4TB machine on a 160GB machine. We give case studies to demonstrate the utility of 0sim. For example, we find that for mixed workloads, the Linux kernel can create irreparable fragmentation despite dozens of GBs of free memory, and we use 0sim to debug unexpected failures of memcached with huge memories.
An Improved Hybrid AMPM and ISB Prefetcher.
Mark Mansi (Undergraduate Thesis under Calvin Lin).
TR-2220. University of Texas at Austin. Spring 2016.
In their recent work, Jain and Lin propose a hybrid prefetcher composed of the Access Map Pattern Matching Prefetcher (AMPM) and the Irregular Stream Buffer (ISB), two state-of-the-art prefetchers [Jain13]. While their hybrid achieves high performance, it can also waste memory bandwidth by making many inaccurate prefetches, which can hurt system performance in bandwidth-constrained environments. We aim to improve their work along two fronts. First, our design improves the accuracy of AMPM. Second, our design identifies whether the current work- load is regular or irregular and uses this information to toggle between AMPM and ISB in the hybrid.
While our design does not fully meet these goals, we do show that it improves AMPM accuracy by 7% while decreasing coverage by only 3%. Our solution improves Hybrid accuracy by 4% and bandwidth consumption by 8% on average, though it decreases speedup by 17% compared with the original hybrid.
Effect of marital status on clinical outcome of heart failure.
Taylor Watkins, Mark Mansi, Jennifer Thompson, Ishak Mansi, Roy Parish.
Journal of Investigative Medicine. June 2013. (Summer project with father :) ).
Background: Despite significant advances in pharmacological and nonpharmacological treatment of heart failure (HF), there are more than 1 million HF visits annually to the emergency department. Studies indicate that HF clinical outcome is affected not only by medical interventions but also by social factors such as marital status.
Objectives: This study aimed to determine the effect of marital status of HF patients on clinical outcome of HF in a high-risk population.
Methods: We reviewed data collected for The Joint Commission in patients admitted with HF at a university hospital serving a high-risk population in Louisiana during the period from June 2003 to September 2004 and followed up until December 2008. Patients were divided into 2 groups, namely, married patients and unmarried patients (including single, divorced, and widowed) based on self-reporting. Primary outcome measures were in-hospital survival and time to readmission. Secondary outcome measures were HF admission rate, average B-type natriuretic peptide, and average troponin-I levels throughout the follow-up period.
Results: Of 646 reviewed records, 542, representing 357 patients, were included in the analysis. Of these, 105 patients were married and 245 were unmarried; marital status was missing for 7 patients. Mean (SD) of follow-up period was 2.39 (1.6) years. Marital status was not a significant variable for in-hospital death (hazard ratio, 0.71; 95% confidence interval, 0.35-1.49), or for time to readmission for HF (hazard ratio, 1.16; 95% confidence interval, 0.86-1.56); multiple linear regression analysis identified married status as an independent variable for average B-type natriuretic peptide (parameter estimate = -0.26, P = 0.02) but not for HF admission rate or average troponin-I levels.
Conclusions: Married status was not associated with better clinical outcome in HF patients in a high-risk population.
I did my PhD at UW Madison in the SCAIL group advised by Prof. Mike Swift, graduating in May 2023. I did my BSCS at UT Austin. My PhD research is about OS kernel memory management for datacenter workloads in light of new memory technologies and software paradigms. Generally my core interest is system software, such as operating systems and distributed systems -- not just using such systems, but how to build them or make them better. I'm also interested in pretty much everything adjacent to systems -- architecture, compilers, programming languages, etc.
I enjoy working on open-ended problems and building things that have never been built before. I fight a strong urge to build everything from scratch driven by curiosity and a love of making things. I also enjoy organizing and sharing knowledge. These loves extend to my hobbies and extra-curricular interests, too. I've enjoyed contributing to open-source software projects -- most notably, the Rust Programming Language. I was Working Group Co-lead for WG-rustc-dev-guide, which created and maintains a living book about how the Rust compiler works and how to contribute to it. During the pandemic, I switched to a less digital hobby: machining (i.e., subtractive metal manufacturing). I am a machining trainer at The Bodgery.
STRAIGHT: Hazardless Processor Architecture Without Register Renaming
This is probably my all-time favorite paper. It draws an in-hindsight-obvious connection between register renaming in microarchitectures and single-static assignment in compilers: the processor is doing something that the compiler already does. By eliminating the redundancy, we can massively simplify processor hardware. Very clever!
Memory resource management in VMware ESX server
This is another all-time favorite. This classic kernel memory management paper describes several now-common memory management mechanism and ESX Server's policies for using them, such as same-page merging and balloon drivers. Its idle memory tax is one of the first papers I know of that tries to use a principled economics-based approach to memory management policies (a topic near and dear to my work). The paper is also very well written and easy to read.
Mesh: compacting memory management for C/C++ applications
This paper proposes an allocator design that massively reduces fragmentation in userspace memory allocators by a very clever and simple use of randomization.
MOD: Minimally Ordered Durable Datastructures for Persistent Memory
Another clever/elegant solution! This paper makes the observation that immutable data structures from functional programming languages make good foundations for building durable data structures for persistent memory.
Software-Defined Far Memory in Warehouse-Scale Computers
This paper discusses Google's "far-memory" system: a cluster-wide strategy for memory compression. I love this paper because I think it is a harbinger of the way memory management should go at scale. Centralized planning, aggregating stats/metrics from across the cluster, increased efficiency. Fascinating!
I personally hate black text on white background... it hurts my eyes; too bright. So originally I made this website white on solid black.
But then a friend told me that my solid black background hurt their eyes because it took time to adjust from other websites which had white backgrounds.
So I tried this, and we both seemed to like it.
"Whatever you do, work at it with all your heart, as working for the Lord, not for human masters, since you know that you will receive an inheritance from the Lord as a reward. It is the Lord Christ you are serving."
Colossians 3:23-24 NIV
Last updated: Jan 7, 2024