Be sure you are acquainted with my collaboration policy and my late policy, both found on the main course webpage. If you are working with a partner, be sure to include both of your names.
XML (eXtensible Mark-up Language) is a language for describing data in a particular structured way. Many computer programs use XML-based file formats. One example you may be familiar with is iTunes, Apple's digital music player.
In this assignment you will use stacks to check whether an iTunes XML document is syntactically well-formed. This is essentially a fancy version of the parentheses-matching problem we looked at in class. In order to complete this assignment, you'll need these files:
XMLChecker.java
is the Java file that you'll be editing and handing in.
Stack.java
is my implementation of the stack data structure. You will use this file, but will not need to edit it or turn it in.
ListNode.java
is my implementation of the a linked-list node, and is used by Stack.java. You will use this file, but will not need to edit it or turn it in.
playlist.xml
is an XML description of an actual iTunes playlist.
endcut.xml
is a copy of playlist.xml
that I have damaged.
startcut.xml
is another copy of playlist.xml
, that has been damaged in a different way.
mismatch.xml
is yet another damaged copy of playlist.xml
.
playlist.xml
If you view playlist.xml
in the Firefox web browser, then you'll see a hierarchical, "tree-like" view of its contents; Firefox understands the structure of XML files. If you view endcut.xml
, startcut.xml
, and mismatch.xml
in Firefox, then the program complains that they are not well-formed XML files; that's what this assignment is all about.
To understand what's going on, examine the playlist.xml
file in a text editor. As you can see, an XML document is just a bunch of text, but sprinkled with nuggets enclosed in <
and >
, such as <dict>
; these are called tags.
The first two lines in playlist.xml
are special. The first line has an ?xml
tag, which indicates that this is an XML file. This tag has two attributes, naming the intended XML version and text encoding. The second line has a !DOCTYPE
tag, which has an attribute describing what kind of XML document we're looking at. In this assignment we're going to ignore these two special lines.
After the first two lines, the file takes on a rigid structure. There are three kinds of tags. What distinguishes them is whether they have a slash /
in them, and where.
plist
and dict
, have no slash; these are opening tags.
/dict
and /plist
near the end of the file, start with a slash; these are closing tags.
<dog/>
is shorthand for <dog></dog>
. The only empty tag in playlist.xml
is <true/>
.
playlist.xml
is a nest of opening and closing tags. The whole file (after the first two lines) is contained in a single <plist>
</plist>
pair. Inside this plist
pair is a single <dict>
</dict>
pair. Inside that dict
pair are many key
s and other things.
The important point here is that all tags (after the first two lines) are properly nested. If a tag is opened inside a matching pair of opening/closing tags, then it is closed inside that same matching pair. Put another way, each closing tag matches the most-recently-opened-but-not-yet-closed tag. We say that the XML file is well-formed if the tags nest properly like this. In this assignment you will write a program to check whether a given XML file is well-formed.
The file XMLChecker.java
already contains some code. Read through it and familiarize yourself with what it is doing. The first chunk of code is a check to make sure the user has passed a single XML file to be validated. Notice that the usage should look like this once everything has been compiled:
java XMLChecker filename.xml
The code then stores the contents of the file in a string--you should have seen code like this in previous courses. Then comes something you may not have seen before--our String of XML is parsed down to a list of tags using something called a "regular expression." Regular expressions are a type of pattern matching that you will see frequently in your CS career. It is not required that you understand them for this class. However, if you are interested, Wikipedia's article is actually pretty decent. You're also welcome to come ask me about it during office hours.
Anyway, the important thing to know is that this code will take any tags found between "<" and ">" characters and stores them in an ArrayList
called tagList
.
Finally, the code prints out the entire list of tags. That's not what we want it to do, though. Here's where you come in.
Your assignment is to replace the line that prints out the tags with a bunch of code that uses a stack to check the well-formedness of the XML. You will be using basically the same algorithm we covered in class (Tuesday, 6/21). Essentially, each XML tag is a different kind of parenthesis, with opening (e.g. dict
) and closing (e.g. /dict
) variants. Remember that an empty tag (e.g. true/
) opens and immediately closes itself; how do you treat this? (Although <true/>
is the only empty tag in playlist.xml
, your program should be able to handle any empty tag.)
If the XML is well-formed, then your program should print out a happy message to that effect. If the XML is not well-formed, then there are several different ways in which it could have failed. You should print out a detailed error message to tell the user what went wrong. Part of your error message should be the contents of the stack at the time the error occurred. This helps the user figure out where the bad part of the XML file is. For example, here's what the program might do when handed a well-formed XML file:
Here's what the program might do with an XML file that closes tags incorrectly:ealexand$ java XMLChecker playlist.xml The file playlist.xml correctly nests XML tags.
I have provided you with three damaged iTunes XML files, in addition to the undamagedealexand$ java XMLChecker mismatch.xml The file mismatch.xml does not correctly nest XML tags. The tag /sminteger was found where the following tags were expected to be closed (starting with the inner-most tag): integer dict dict dict plist
playlist.xml
. Make sure that your program works correctly on all of them. You may want to test your program on other iTunes XML files as well; I'm sure you can find someone who uses iTunes, to get more samples.
To hand in your program, copy XMLChecker.java
and any other files needed to create XMLChecker.class
(not including the files with which I provided you) to the following directory:
~cs367-1/handin/login-name/P1
Use your actual CS login name (not your UW NetID!) in place of login-name.
Be sure that a comment at the top of the XMLChecker.java
file gives the name(s) of the author(s) of the code.
If you worked with a partner, then only one of you should turn in the program files as specified above. It doesn't matter which partner hands in the program files.
Do not copy any ".class
" files, and do not create any
subdirectories in your handin
directory. Note that for this
assignment, you should test your program thoroughly, but you do not
need to hand in your test data.
You will be graded on program correctness, error checking, the usefulness of your error messages, and general style.
Note: this assignment has been adapted from a similar assignment given by Carleton College professor Josh Davis.