Project 1: Intro to Ceph

Important Dates

Due: Mon, 10/1. Details below.

Project Teams: Ideally, each project team should consist of three people. Please contact me (Remzi) if you wish to construct a team with a different number of participants.

Overview

The project introduces you to the focus of the project side of the course, which is the Ceph distributed storage system. Ceph was originally developed by Sage Weil, who will be assisting us this semester, so you are getting a rare opportunity not only to work with an interesting and important system, but also work with a person who is intimately familiar with the system.

Notes from Sage

ceph daemonperf (like ceph daemon ... too), must be run from the same host the daemon is on. It's the thing that spits out a line per second for the most interesting performance counters.

To get an interesting result from Part 3, you probably need to collect time series data for a given counter from all daemons and then plot them alongside each other over time to see how they compare.

Details

Part 0: Background

The first thing to do is to read the ceph paper , which was published at OSDI '06. Although it is an old paper, the paper presents the fundamentals of the system well. It is also true that it would be much easier to read this paper at the end of the semester, when we have learned some of the core techniques needed to build distributed systems; however, we don't have that luxury. Thus, read it a few times to understand what you can; also, discuss it carefully with your project teammates.

Also worth looking through is the Ceph webpage. The documentation is fairly extensive, so you probably won't get through it by simply sitting down and reading. Instead, start by getting familiar with what is there. We'll also point you to specific parts of the documentation below.

Part 1: Ceph Deployment

This project will focus on Ceph performance. You will run various testing workloads against Ceph and use tools to understand its performance. You will also create graphs to show what you have found.

However, before we can run, we must learn to walk. In this project, this means learning to create a cluster (likely with Google Cloud Platform, as described below), and then deploying Ceph. You'll have to learn a bit about GCP for this project, so start reading the documentation. For the initial setup, use Google persistent disks on each machine as the storage device for Ceph.

To read more about Ceph Deployment, you should click here . You'll be using ceph-deploy . Read all about it, and get to work in deploying a small object storage cluster, say on 4 or 8 machines. Use When you are done with this, you should feel happy! Getting a system up and running is never easy, especially one as complicated as Ceph.

Getting Help: In life, and in computer systems, sometimes you need help. In other cases, however, you just need to work a little harder to figure things out. The tough part is knowing which situation you are in, and acting accordingly. In this project, you may run into trouble now and then. When you do, first try hard -- with your partners -- to figure out what is going on. Then try on piazza, to see if classmates can help. There is a third place to go for help, however, and that is the Ceph IRC. Please do not flood this with simple questions; however, when you get truly, deeply stuck, it can be a lifesaver. Use this resource wisely.

Once you have Ceph up and running, congratulations! The yucky part is done, and now we can get into the science of understanding complex systems via performance measurement.

Part 2: Basic Performance

In this part of the project, you'll configure Ceph in a number of different ways, and use basic tools to measure its performance. You'll mostly use the tool rados bench for this part; read about that in the documentation.

A good place to start is the set of instructions found here. Follow these instructions to get performance numbers for the local (google persistent) disks, then the network between nodes, and then, using rados bench , the RADOS storage cluster (don't worry about the Ceph block device or gateway parts). Make graphs showing the results of your study. How did the local disks perform? How much performance did you get from the network? Are those numbers as you expected? How much performance did you get for various write and read tests using rados bench ? How did it compare to the raw numbers?

Lastly, play around with many different options found in rados bench . What parameters can you use to focus solely on small-object performance, or large-object performance? What kind of results do you see? Can you re-create (at smaller scale) any of the figures from the original Ceph paper? How do the numbers seem now as compared to then?

Part 3: Tracing During Operation

To get a deeper look at what is going on while Ceph is running, you'll again use the rados bench tool to create some workloads (both write-heavy and read-heavy ones). However, we'll also use some tools to monitor what is going on in Ceph at a lower level.

One interesting thing to look at is the usage of each drive. During a write intensive benchmark, use ceph osd df to watch each drive and its usage. Make a graph of per-drive usage over time. How good is Ceph at balancing utilization?

Another thing worth learning about is Ceph placement groups. Use the ceph pg command to learn more about this. What are placement groups? What can you learn about them as the system runs?

Other useful data can be found via ceph stats and various performance counters. What else can you measure when the system is running? Find one statistic you think is most interesting and record it over time. What can you learn by monitoring this value?

Part 4: Heterogeneity

Thus far, you have created a homogeneous cluster just using persistent disks. In this part, you will create a heterogeneous cluster with local SSDs for all but one node, and a (slower) persistent disk for the other node. Now, run benchmarks (using rados bench ) that write to the system, and monitor how the drives get filled up. How does this system perform, as compared to one with all local SSDs, under a write-heavy workload?

One thing to play around with is the primary affinity of an OSD. What does this setting do? Shift the primary affinity around so as to avoid, to the extent possible, the persistent disk. How does this change performance under write-heavy workloads? Read heavy? Can affinity be a general solution to the heterogeneity problem?

Machines To Use

For this project, you can use any set of machines you like. Two resources are specifically available to you, however, that likely make the most sense. The first is Google's Cloud Platform (GCP). Google has provided some credits for each of you, which should be enough for this and perhaps other projects. However, spend the free credits wisely, so as to reduce the need to get more.

You'll have to learn about how to create VM instances with different local storage options, so spend some time reading and learning about this. Also, try to use the cheapest configurations you can! You'll get some credits for free, but not an infinite amount.

The second is CloudLab. This is a resource we have created in conjunction with groups at Utah and Clemson. More details about this will be available soon, but perhaps for the next project, so concentrate upon GCP for now.

Handing It In

To turn this project in, you'll just meet with me and bring some graphs describing what you have measured. You'll then explain what you did via graphs, showing me measurements and results as outlined above. We'll have a signup sheet as the date of the class approaches, and I'll also give a little more detail in class.