An Introduction To Zplot

by Remzi H. Arpaci-Dusseau
(translated from the original paper based on the Tcl version)

Abstract

We introduce Zplot, a Python library for making two-dimensional data plots. Zplot provides a simple set of primitives that allow users to input and manipulate data, plot said data in a variety of formats, and decorate the resulting graphs with axes, labels, and other textual accents. Zplot then outputs encapsulated PostScript and PDF for ease of inclusion in technical documents, and SVG for inclusion of generated plots in modern web pages.

Introduction

Zplot is a simple Python library that allows the creation of two-dimensional data graphics in a flexible and powerful manner. Typical graphs are created with only a few lines of code, and complex and intricate graphs can be produced from only tens of lines of code. Additionally, because Zplot graph-creation is simply Python, one can bring to bear all the power of programming to create visualizations. Repetitive tasks can be performed in loops, and useful primitives can be encapsulated in functions.

In this document, we describe Zplot. First, we give an overview of the tool and the basic primitives it provides. Then, we describe each of the basic routines in more detail, showing how they can be combined to produce a wide range of interesting graphs. Zplot drawing routines are all built upon a set of low-level generic drawing commands that can produce PostScript, PDF, or SVG graphic formats; these commands hide many of the details of generating correct PostScript, PDF, or SVG from the rest of Zplot, boiling down most activities to simple drawing commands that place lines, shapes, and text on the drawing surface.

Overview

We now describe the basic primitives provided by Zplot. Let us start with a typical (if simple) graph as an example, and use this to drive the discussion of the different elements of Zplot. A typical graphing script might be written as follows:

# import the library
from zplot import *
import sys

# describe the drawing surface
ctype = 'eps' if len(sys.argv) < 2 else sys.argv[1]
c = canvas(ctype, title='example1', dimensions=['3in', '2.4in'])

# load some data
t = table(file='example1.data')

# make a drawable region for a graph
d = drawable(canvas=c, xrange=[0,10], yrange=[0,10],
             coord=['0.5in','0.4in'], dimensions=['2.3in','1.7in'])

# make some axes
axis(drawable=d, title='A Sample Graph', xtitle='The X-Axis',
     ytitle='The Y-Axis')

# plot the points
p = plotter()
p.points(drawable=d, table=t, xfield='x', yfield='y', style='triangle',
         linecolor='red', fill=True, fillcolor='red')

# finally, output the graph to a file
c.render()



Figure 1: The most bare-boned of plots that one can make with Zplot.

The EPS version of this graph is shown here; a PDF version of this graph is shown here; it is made from this data set using this script.

In this example, the user creates a graph by first describing the drawing surface by creating a canvas object and specifying its dimensions; the user at this point specifies the type of the canvas, which can currently be 'eps' (for PostScript), 'pdf' (for PDF), or 'svg' (for SVG graphics format). Here, as in many of the example graphs, the script will generate the output file 'example1.eps' if no arguments are passed to the script, but can be directed to generate PDF or SVG formats simply by passing in the 'pdf' or 'svg' argument to the canvas (and thus generate 'example1.pdf' or 'example1.svg', respectively).

Then, the user creates a table object to load data from a file, getting the data from a file called example.data. The table object provides some simple ways to read input files, and later plotting routines expect to input data from such tables.

The user, now wishing to plot the data, now creates a drawable region by creating a drawable object; doing so defines where on the canvas the drawable is, and also how to map data points onto the drawing surface (e.g., the range of x values and y values that map onto this drawable); note that interesting graphs can use more than one drawable to great effect.

With a drawable defined, the user can create a plotter object, call one of a variety of plotting routines (e.g., points()) to plot the data onto the drawable. The plotting routines generally take a large number of arguments, enabling a wide variety of plots to be produced; in this case, the user chooses to draw a red triangle at each (x,y) point of the graph.

Finally, the user adds some graphical and textual decorations to help clarify the graph (in this case, by simply creating an axis object), and then renders the PostScript to a file by calling the render() method of the canvas object. We now describe each of these primitives in more detail.

Note that each of these routines takes a large number of optional parameters. Read more about them in the documentation.

Tables

There are numerous routines available to users to input and manipulate data; these are found in the table object and related methods. The most commonly used approach is simply to create a table object and pass in a file name; creating a table in this way will read the file into memory and thus make it ready to be plotted. A typical data file (such as example1.data above) looks like this:

# x y
0 0
1 1
2 2
3 3
4 6
...
9 4
10 8

The first line contains the schema for the table, with names for each column; these names are subsequently used to refer to the data when manipulating it or drawing it to the screen. If no schema is specified, one can simply refer to each column by the names 'c0', 'c1', and so forth. The default is to use whitespace as a separator; however, one can specify a different separator (such as a comma or colon) as need be.

One powerful aspect of tables in zplot is that they utilize SQLite; this allows one to perform database-like selections over data and thus subset and manipulate data readily. Here is an example that selects data from a table with y-values above 5 by creating a new table thi, and plots green circles around said points (the results of which are shown in the figure linked below).

# load some data
t = table(file='example2.data')
thi = table(table=t, where='y > 5')

...

# plot the points
p = plotter()
p.points(drawable=d, table=t, xfield='x', yfield='y', style='triangle',
         linecolor='red', fill=True, fillcolor='red')
p.points(drawable=d, table=thi, xfield='x', yfield='y', style='circle',
         linecolor='green', size='5', linewidth=2)



Figure 2: Table Selection. The example uses a simple table selection to find y-values that are greater than 5. Then, these points are plotted as green circles.

This example is shown in PostScript here; it is made from this data set using this script.

There are a number of other useful table functions which are not covered here, mostly for manipulating and summarizing data; see the table method APIs for more information. For example, the update() method allows arbitrary SQL updates to be performed.

Drawable

The drawable is likely the most important abstraction that Zplot implements. A drawable is created by instantiating a drawable object. The powerful aspect of a drawable is that it enables a user to place multiple (potentially overlapping) drawable regions onto the drawing surface. This feature can be used to implement a number of interesting graphs. For example, in the figure below (Figure 5 in this a FAST paper), two regions of the graph are of interest but hard to see due to their small size. Thus, one can create two additional drawables and plot closeups of the data in those regions.



Figure 3: Nested Plots. A plot from an earlier paper of ours is recreated. Two closeups are made in the lower graph, with only a few lines of Python code required.

The FAST paper graph can be seen here; it is made from this file and this file using this script. Two closeups are made in the lower graph, with only a few lines of code required to do so.

This example also demonstrates a number of parameters that the drawable object can be passed when creating it. For example, a user can specify its exact position with the coord parameter and its size with the dimensions parameter.



Figure 4: Multiple Y Axes. The script creates two drawables, the right one with a y-range that is twice as high as the left one. The same data is plotted on both.

Multiple drawables can also be used to plot data with multiple y axes in a simple and straightforward manner. In this example, we plot the same data from the example above, except onto an overlapping drawable that maps the y range from 0 up to 20 (instead of 0 to 10).

This third example is shown here; it is made from this data set using this script (this script is also shown below). The script creates two drawables, the right one with a y-range that is twice as high as the left one. The same data is plotted on both drawables, but with different scales.

# import the library
from zplot import *
import sys

# describe the drawing surface
ctype = 'eps' if len(sys.argv) < 2 else sys.argv[1]
c = canvas(ctype, title='example3', dimensions=['3.3in', '2.4in'])

# load some data
t = table(file='example3.data')

# make a drawable region for a graph
d1 = drawable(canvas=c, xrange=[0,10], yrange=[0,10],
             coord=['0.5in','0.4in'], dimensions=['2.3in','1.7in'])
d2 = drawable(canvas=c, xrange=[0,10], yrange=[0,20],
             coord=['0.5in','0.4in'], dimensions=['2.3in','1.7in'])

# make some axes
axis(drawable=d1, title='A Sample Graph', xtitle='The X-Axis',
     ytitle='The Y-Axis')
axis(drawable=d2, style='y', title='', ytitle='The Second Y-Axis',
     yaxisposition=10, yauto=[0,20,4], labelstyle='in', ticstyle='in')

# plot the points
p = plotter()
p.points(drawable=d1, table=t, xfield='x', yfield='y', style='triangle',
         linecolor='red', fill=True, fillcolor='red')
p.points(drawable=d2, table=t, xfield='x', yfield='y', style='triangle',
         linecolor='green', fill=True, fillcolor='green')
    
# finally, output the graph to a file
c.render()

The Plotter Object

The plotter object is used to plot data onto drawables. It provides numerous plotting methods to get this job done:



Figure 5: Multiple Plot Types. This example plots a number of different plot types, as described in each title. Of course, many other variations are possible.

This next example, shown in Figure 5 above, presents a number of different possibilities from the above, all combined into a single script. The EPS graph is shown here; PDF here; it is made from this data set using this script. One interesting point is that multiple types from above can be combined to make more interesting plots; for example, a box-and-whiskers type plot is simply a combination of vertical intervals, vertical bars, and points.

Another example plots a number of different patterns in a set of stacked bars. As one can see, patterns such as diagonal lines and triangles can be used to fill a region, allowing for the creation of bar graphs with many different types of data within. The example is shown here; it is made from this data set using this script.



Figure 6: Multiple Patterns. This example plots a number of different patterns in a set of stacked bars. As one can see, patterns such as diagonal lines and triangles can be used to fill a region. The example also includes a legend.

Axes, Tics, and Labels

A single complex object supports the generation of axes, tic marks, and labels for a graph. It is (not surprisingly) called the axis object. It has too many arguments to describe here in any detail; see the documentation page for details. However, it is often quite simple to use. For example, to specify the title, label for the x-axis, and label for the y-axis, one simple do the following:

axis(drawable=d, title='Title', xtitle='X-Axis', ytitle='Y-Axis')

Internal algorithms compute reasonable locations for said labels (depending on whether tic marks are used, for example). Further, when the guesses are wrong, one can use a shift argument to move the text to a more appropriate location (e.g., the titleshift argument can be passed the value [3,0] to bump it 3 points to the right). Many of the other options deal with customizations such as font selection, rotation, color, and so forth.

Legend

Finally, Zplot provides support in most plotting routines for the addition of a legend via a legend object; see documentation here. Each given plot method (such as line()) takes an optional legend parameter which specifies the legend object, and a legendtext parameter which indicates the name to be associated with the data. The script should subsequently call the draw() method to place the legend on the screen and control its appearance. This script contains the following example (some lines omitted for brevity):

...
L = legend()

p = plotter()
p.verticalbars(drawable=d, table=b5, xfield='x', yfield='y', fill=True, 
               fillcolor='darkgray', bgcolor='white', barwidth=0.9, legend=L,
               legendtext='Stuff', linewidth=0.5)
...
L.draw(canvas=c, coord=d.map([6,8]), down=True, width=15, height=15)
c.render()

PostScript, PDF, and SVG Generation

Zplot is built on top of a number of underlying canvas primitives. Three types of canvases are currently supported: PostScript (actually, embedded Postscript, or EPS for short), PDF, and SVG graphics. The user can select which to output by simply specifying it when creating a canvas object. The EPS format is particularly useful for inclusion in LaTeX-generated papers, whereas SVG can be useful for web pages; PDF is useful in many scenarios. Each type of canvas provides the basic ability to draw shapes as well as place text into the figure the user is making. Furthermore, the primitives provided by the canvases are used by the plotting routines and thus make said plotting code (such as that found in the creation of the different types of plots) canvas-neutral.

One method found in these classes is line(), which lets the user draw a line directly on the canvas. The method is passed a set of coordinates, some basic information about the line, and then produces a line that connects the coordinates in the resulting PostScript, PDF, or SVG. All primitives take coordinates in PostScript ems, each of which is 1/72nd of an inch. The line method takes additional arguments that allow the addition of an arrow to the end of the line; we omit these parameters for the sake of space; see this for more information.

An example of some of the subtle differences in lines is shown here; it is made using this script.

There are a few other raw drawing methods, such as box(), circle(), polygon(), all of which make the shape one might expect, given the name.

Each of these shape routines take a variety of arguments that describe their coordinates, and then all take three different sets of arguments that characterize the line around the shape (e.g., linewidth, linejoin, linecap, linedash), the fill of the shape (fill, fillstyle), and the background color behind any non-solid fill pattern (bgcolor). The line descriptors match those of the line() method above, and the background color is straightforward. Most interesting, then, is the variety and flexibility provided by the pattern descriptions.

Parameters further allow users to specify a fill pattern for a region. The most important parameter is fillstyle, which determines how the region is filled. Current styles that are supported include solid, hline, vline, dline1, dline2, circle, square, triangle, utriangle. Each pattern takes two arguments to determine its contents: a fillsize and fillskip. Within a given pattern, fillsize determines the size of each element in the pattern, and fillskip the space between each element.

An example shape and pattern collection is shown in EPS here; in PDF here; data is available here; it is made using this script.

The last canvas method we describe is text(), which draws text onto the screen. Most of its parameters are straightforward. However, the most crucial argument to understand is the anchor. This parameter describes how the text should be anchored relative to the coordinate that was passed to the routine. The parameter takes the form of a comma-separated string 'xanchor,yanchor', where xanchor specifies the anchoring of the text in the x direction (either l for left, c for center, or r for right), and yanchor the anchoring in the y direction (l for low, c for center, and h for high). The figure below shows the different possible anchors (the coordinates passed to the text drawing routine are highlighted with a red circle).



Figure 7: Text Anchors. This example shows how to specify text anchors.

The anchoring EPS graph is shown here; PDF here; it is made using this script.

Programmability

One of the major advantages of Zplot versus other plotting packages is that the user simply writes Python. Thus, one can write functions and use code to simplify the task at hand; one is not limited by some artificial graph-specification language.

In this example, we define two new functions: label_with_arrow() and circle_with_text(). The first draws a label at a particular spot on the canvas, and the second draws a circle with some text inside of it. Each are just plain Python functions that call various Zplot primitives to get their work done.

The example also shows how you can use classic constructs like loops, randomization, and other similar constructs to ease the creation of interesting visualizations. In this example, we create a number of green random dots via a simple loop.

The EPS graph is shown here; PDF here; it is made using this script. The SVG is shown below in Figure 8.



Figure 8: Programmability. This example shows how you can use Python programmability to make certain types of visualizations more readily.

Related Work

Zplot was a tool born of frustration with gnuplot. Gnuplot provides excellent support for simple line graphs and scatter plots, as well as numerous other graph types. However, its lack of reasonable support for bar charts was one of the main driving forces behind Zplot. One positive of gnuplot was its PostScript driver; the PostScript produced by gnuplot was clear and easy to read, sparking an initial interest in that language, and thus (indirectly) making Zplot possible. Great PostScript resources, for those who are interested, are the blue book, red book, and (to some extent), the green book; all are available online.

A number of good SVG resources are also available online. We found Jenkov's tutorials to be particularly useful.

It is somewhat harder to find good resources for generating raw PDFs; PDF files are a bit intricate in that they force the inclusion of many precise byte offsets for fast lookup of objects, and there generally is a lack of examples (as compared to PostScript, for which there are numerous good resources). The PDF reference is useful and covers everything, but does not include many detailed examples; the SyncFusion book is quite useful in this regard.

As Zplot was demonstrated to others, some were reminded of Ploticus, which is a more powerful and complete tool than gnuplot and is capable of producing a large variety of interesting graph types. Many of the features found in Zplot are also found in ploticus (e.g., a ploticus area is akin to a Zplot drawable), and we often use examples from the Ploticus web page to determine whether Zplot can easily do what Ploticus already does. However, Ploticus is complex and harder to modify, comprised of over 60,000 lines of C code. Zplot, in contrast, consists of a few thousand lines of Python. This comparison is certainly unfair, as Zplot is not as feature-rich as Ploticus, but the point remains.

Conclusions

We have introduced Zplot, a pure Python library for drawing PostScript (and SVG) figures. Zplot provides a number of powerful but simple tools for making beautiful two-dimensional plots. Dive into the documentation, or, better yet, look at existing examples to learn more.

.

.