We introduce Zplot, a Python library for making two-dimensional data plots. Zplot provides a simple set of primitives that allow users to input and manipulate data, plot said data in a variety of formats, and decorate the resulting graphs with axes, labels, and other textual accents. Zplot then outputs encapsulated PostScript and PDF for ease of inclusion in technical documents, and SVG for inclusion of generated plots in modern web pages.
Zplot is a simple Python library that allows the creation of two-dimensional data graphics in a flexible and powerful manner. Typical graphs are created with only a few lines of code, and complex and intricate graphs can be produced from only tens of lines of code. Additionally, because Zplot graph-creation is simply Python, one can bring to bear all the power of programming to create visualizations. Repetitive tasks can be performed in loops, and useful primitives can be encapsulated in functions.
In this document, we describe Zplot. First, we give an overview of the tool and the basic primitives it provides. Then, we describe each of the basic routines in more detail, showing how they can be combined to produce a wide range of interesting graphs. Zplot drawing routines are all built upon a set of low-level generic drawing commands that can produce PostScript, PDF, or SVG graphic formats; these commands hide many of the details of generating correct PostScript, PDF, or SVG from the rest of Zplot, boiling down most activities to simple drawing commands that place lines, shapes, and text on the drawing surface.
We now describe the basic primitives provided by Zplot. Let us start with a typical (if simple) graph as an example, and use this to drive the discussion of the different elements of Zplot. A typical graphing script might be written as follows:
# import the library from zplot import * import sys # describe the drawing surface ctype = 'eps' if len(sys.argv) < 2 else sys.argv[1] c = canvas(ctype, title='example1', dimensions=['3in', '2.4in']) # load some data t = table(file='example1.data') # make a drawable region for a graph d = drawable(canvas=c, xrange=[0,10], yrange=[0,10], coord=['0.5in','0.4in'], dimensions=['2.3in','1.7in']) # make some axes axis(drawable=d, title='A Sample Graph', xtitle='The X-Axis', ytitle='The Y-Axis') # plot the points p = plotter() p.points(drawable=d, table=t, xfield='x', yfield='y', style='triangle', linecolor='red', fill=True, fillcolor='red') # finally, output the graph to a file c.render()
The EPS version of this graph is shown here; a PDF version of this graph is shown here; it is made from this data set using this script.
In this example, the user creates a graph by first describing the drawing surface by creating a canvas object and specifying its dimensions; the user at this point specifies the type of the canvas, which can currently be 'eps' (for PostScript), 'pdf' (for PDF), or 'svg' (for SVG graphics format). Here, as in many of the example graphs, the script will generate the output file 'example1.eps' if no arguments are passed to the script, but can be directed to generate PDF or SVG formats simply by passing in the 'pdf' or 'svg' argument to the canvas (and thus generate 'example1.pdf' or 'example1.svg', respectively).
Then, the user creates a table object to
load data from a file, getting the data from a file called
example.data.
The table object provides some simple ways to read
input files, and later plotting routines expect to input data from such
tables.
The user, now wishing to plot the data, now creates a drawable region by creating a drawable object; doing so defines where on the canvas the drawable is, and also how to map data points onto the drawing surface (e.g., the range of x values and y values that map onto this drawable); note that interesting graphs can use more than one drawable to great effect.
With a drawable defined, the user can create a plotter object, call one of a variety of plotting routines (e.g., points()) to plot the data onto the drawable. The plotting routines generally take a large number of arguments, enabling a wide variety of plots to be produced; in this case, the user chooses to draw a red triangle at each (x,y) point of the graph.
Finally, the user adds some graphical and textual decorations to help clarify the graph (in this case, by simply creating an axis object), and then renders the PostScript to a file by calling the render() method of the canvas object. We now describe each of these primitives in more detail.
Note that each of these routines takes a large number of optional parameters. Read more about them in the documentation.
There are numerous routines available to users to input and manipulate data; these are found in the table object and related methods. The most commonly used approach is simply to create a table object and pass in a file name; creating a table in this way will read the file into memory and thus make it ready to be plotted. A typical data file (such as example1.data above) looks like this:
# x y 0 0 1 1 2 2 3 3 4 6 ... 9 4 10 8
The first line contains the schema for the table, with names for each column; these names are subsequently used to refer to the data when manipulating it or drawing it to the screen. If no schema is specified, one can simply refer to each column by the names 'c0', 'c1', and so forth. The default is to use whitespace as a separator; however, one can specify a different separator (such as a comma or colon) as need be.
One powerful aspect of tables in zplot is that they utilize SQLite; this
allows one to perform database-like selections over data and thus subset and
manipulate data readily. Here is an example that selects data from a table
with y-values above 5 by creating a new table thi
, and plots
green circles around said points (the results of which are shown in the figure
linked below).
# load some data t = table(file='example2.data') thi = table(table=t, where='y > 5') ... # plot the points p = plotter() p.points(drawable=d, table=t, xfield='x', yfield='y', style='triangle', linecolor='red', fill=True, fillcolor='red') p.points(drawable=d, table=thi, xfield='x', yfield='y', style='circle', linecolor='green', size='5', linewidth=2)
This example is shown in PostScript here; it is made from this data set using this script.
There are a number of other useful table functions which are not covered here, mostly for manipulating and summarizing data; see the table method APIs for more information. For example, the update() method allows arbitrary SQL updates to be performed.
The drawable is likely the most important abstraction that Zplot implements. A drawable is created by instantiating a drawable object. The powerful aspect of a drawable is that it enables a user to place multiple (potentially overlapping) drawable regions onto the drawing surface. This feature can be used to implement a number of interesting graphs. For example, in the figure below (Figure 5 in this a FAST paper), two regions of the graph are of interest but hard to see due to their small size. Thus, one can create two additional drawables and plot closeups of the data in those regions.
The FAST paper graph can be seen here; it is made from this file and this file using this script. Two closeups are made in the lower graph, with only a few lines of code required to do so.
This example also demonstrates a number of parameters that the drawable
object can be passed when creating it. For example, a user can specify its
exact position with the coord
parameter and its size with the
dimensions
parameter.
Multiple drawables can also be used to plot data with multiple y axes in a simple and straightforward manner. In this example, we plot the same data from the example above, except onto an overlapping drawable that maps the y range from 0 up to 20 (instead of 0 to 10).
This third example is shown here; it is made from this data set using this script (this script is also shown below). The script creates two drawables, the right one with a y-range that is twice as high as the left one. The same data is plotted on both drawables, but with different scales.
# import the library from zplot import * import sys # describe the drawing surface ctype = 'eps' if len(sys.argv) < 2 else sys.argv[1] c = canvas(ctype, title='example3', dimensions=['3.3in', '2.4in']) # load some data t = table(file='example3.data') # make a drawable region for a graph d1 = drawable(canvas=c, xrange=[0,10], yrange=[0,10], coord=['0.5in','0.4in'], dimensions=['2.3in','1.7in']) d2 = drawable(canvas=c, xrange=[0,10], yrange=[0,20], coord=['0.5in','0.4in'], dimensions=['2.3in','1.7in']) # make some axes axis(drawable=d1, title='A Sample Graph', xtitle='The X-Axis', ytitle='The Y-Axis') axis(drawable=d2, style='y', title='', ytitle='The Second Y-Axis', yaxisposition=10, yauto=[0,20,4], labelstyle='in', ticstyle='in') # plot the points p = plotter() p.points(drawable=d1, table=t, xfield='x', yfield='y', style='triangle', linecolor='red', fill=True, fillcolor='red') p.points(drawable=d2, table=t, xfield='x', yfield='y', style='triangle', linecolor='green', fill=True, fillcolor='green') # finally, output the graph to a file c.render()
The plotter object is used to plot data onto drawables. It provides numerous plotting methods to get this job done:
This next example, shown in Figure 5 above, presents a number of different possibilities from the above, all combined into a single script. The EPS graph is shown here; PDF here; it is made from this data set using this script. One interesting point is that multiple types from above can be combined to make more interesting plots; for example, a box-and-whiskers type plot is simply a combination of vertical intervals, vertical bars, and points.
Another example plots a number of different patterns in a set of stacked bars. As one can see, patterns such as diagonal lines and triangles can be used to fill a region, allowing for the creation of bar graphs with many different types of data within. The example is shown here; it is made from this data set using this script.
A single complex object supports the generation of axes, tic marks, and labels for a graph. It is (not surprisingly) called the axis object. It has too many arguments to describe here in any detail; see the documentation page for details. However, it is often quite simple to use. For example, to specify the title, label for the x-axis, and label for the y-axis, one simple do the following:
axis(drawable=d, title='Title', xtitle='X-Axis', ytitle='Y-Axis')
Internal algorithms compute reasonable locations for said labels (depending on whether tic marks are used, for example). Further, when the guesses are wrong, one can use a shift argument to move the text to a more appropriate location (e.g., the titleshift argument can be passed the value [3,0] to bump it 3 points to the right). Many of the other options deal with customizations such as font selection, rotation, color, and so forth.
Finally, Zplot provides support in most plotting routines for the addition of a legend via a legend object; see documentation here. Each given plot method (such as line()) takes an optional legend parameter which specifies the legend object, and a legendtext parameter which indicates the name to be associated with the data. The script should subsequently call the draw() method to place the legend on the screen and control its appearance. This script contains the following example (some lines omitted for brevity):
... L = legend() p = plotter() p.verticalbars(drawable=d, table=b5, xfield='x', yfield='y', fill=True, fillcolor='darkgray', bgcolor='white', barwidth=0.9, legend=L, legendtext='Stuff', linewidth=0.5) ... L.draw(canvas=c, coord=d.map([6,8]), down=True, width=15, height=15) c.render()
Zplot is built on top of a number of underlying canvas primitives. Three types of canvases are currently supported: PostScript (actually, embedded Postscript, or EPS for short), PDF, and SVG graphics. The user can select which to output by simply specifying it when creating a canvas object. The EPS format is particularly useful for inclusion in LaTeX-generated papers, whereas SVG can be useful for web pages; PDF is useful in many scenarios. Each type of canvas provides the basic ability to draw shapes as well as place text into the figure the user is making. Furthermore, the primitives provided by the canvases are used by the plotting routines and thus make said plotting code (such as that found in the creation of the different types of plots) canvas-neutral.
One method found in these classes is
line()
, which lets the user draw a line directly on the
canvas. The method is passed a set of coordinates, some basic information
about the line, and then produces a line that connects the coordinates in the
resulting PostScript, PDF, or SVG. All primitives take coordinates in
PostScript ems, each of which is 1/72nd of an inch. The line method
takes additional arguments that allow the addition of an arrow to the end of
the line; we omit these parameters for the sake of space;
see this for more information.
An example of some of the subtle differences in lines is shown here; it is made using this script.
There are a few other raw drawing methods, such as
box()
,
circle()
,
polygon()
, all of which
make the shape one might expect, given the name.
linewidth, linejoin,
linecap, linedash
), the fill of the shape (fill,
fillstyle
), and the background color behind any non-solid fill pattern
(bgcolor
). The line descriptors match those of
the line()
method above, and the background color is
straightforward. Most interesting, then, is the variety and flexibility
provided by the pattern descriptions.
Parameters further allow users to specify a fill pattern for a region. The
most important parameter is fillstyle
, which determines how the
region is filled. Current styles that are supported include solid,
hline, vline, dline1, dline2, circle, square, triangle, utriangle
. Each
pattern takes two arguments to determine its contents: a fillsize
and fillskip
. Within a given pattern, fillsize
determines the size of each element in the pattern, and fillskip
the space between each element.
An example shape and pattern collection is shown in EPS here; in PDF here; data is available here; it is made using this script.
The last canvas method we describe
is text(), which draws text onto the
screen. Most of its parameters are straightforward. However, the most crucial
argument to understand is the anchor
. This parameter describes
how the text should be anchored relative to the coordinate that was passed to
the routine. The parameter takes the form of a comma-separated
string 'xanchor,yanchor'
, where xanchor specifies the anchoring
of the text in the x direction (either l
for left, c
for center, or r
for right), and yanchor the anchoring in the y
direction (l
for low, c
for center,
and h
for high). The figure below shows the different possible
anchors (the coordinates passed to the text drawing routine are highlighted
with a red circle).
The anchoring EPS graph is shown here; PDF here; it is made using this script.
One of the major advantages of Zplot versus other plotting packages is that the user simply writes Python. Thus, one can write functions and use code to simplify the task at hand; one is not limited by some artificial graph-specification language.
In this example, we define two new functions:
label_with_arrow()
and circle_with_text()
. The first draws
a label at a particular spot on the canvas, and the second draws a circle with
some text inside of it. Each are just plain Python functions that call various
Zplot primitives to get their work done.
The example also shows how you can use classic constructs like loops, randomization, and other similar constructs to ease the creation of interesting visualizations. In this example, we create a number of green random dots via a simple loop.
The EPS graph is shown here; PDF here; it is made using this script. The SVG is shown below in Figure 8.
Zplot was a tool born of frustration with gnuplot. Gnuplot provides excellent support for simple line graphs and scatter plots, as well as numerous other graph types. However, its lack of reasonable support for bar charts was one of the main driving forces behind Zplot. One positive of gnuplot was its PostScript driver; the PostScript produced by gnuplot was clear and easy to read, sparking an initial interest in that language, and thus (indirectly) making Zplot possible. Great PostScript resources, for those who are interested, are the blue book, red book, and (to some extent), the green book; all are available online.
A number of good SVG resources are also available online. We found Jenkov's tutorials to be particularly useful.
It is somewhat harder to find good resources for generating raw PDFs; PDF files are a bit intricate in that they force the inclusion of many precise byte offsets for fast lookup of objects, and there generally is a lack of examples (as compared to PostScript, for which there are numerous good resources). The PDF reference is useful and covers everything, but does not include many detailed examples; the SyncFusion book is quite useful in this regard.
As Zplot was demonstrated to others, some were reminded of
Ploticus, which
is a more powerful and complete tool than gnuplot and is capable of producing
a large variety of interesting graph types. Many of the features found in
Zplot are also found in ploticus (e.g., a ploticus area
is akin
to a Zplot drawable
), and we often use examples from the Ploticus
web page to determine whether Zplot can easily do what Ploticus already
does. However, Ploticus is complex and harder to modify, comprised of over
60,000 lines of C code. Zplot, in contrast, consists of a few thousand lines
of Python. This comparison is certainly unfair, as Zplot is not as
feature-rich as Ploticus, but the point remains.
We have introduced Zplot, a pure Python library for drawing PostScript (and SVG) figures. Zplot provides a number of powerful but simple tools for making beautiful two-dimensional plots. Dive into the documentation, or, better yet, look at existing examples to learn more.
.
.