Statistics Department Administrative Databases * Location and naming conventions * Editing a database * Checking fields * Sorting the records * Printing an ASCII report * Creating a LaTeX table * Some difficulties Location and naming conventions The administrative databases are organized as tables (flat-files) with tab-separated records. They can be manipulated in emacs forms-mode, with perl scripts, or as S data frames. You could probably slurp them into a spreadsheet as well, if we had a reasonable spreadsheet with which to work. These flat files are located in the AFS directory /p/stat/Data/admin. Each database should have a ".form" file associated with it. The form file is emacs-lisp code that defines the fields for forms mode. Thus "phd_info" has its fields (and the forms-mode display format) defined in "phd_info.form". Editing a database in emacs forms-mode If you put the following function definition in your ~/.emacs file, you will be able to use M-x get-db to begin editing a database. (defun get-db nil "Use forms mode to edit an administrative database" (interactive) (let* ((db-directory "/p/stat/Data/admin/") (extension ".form") (elen (length extension))) (forms-find-file (concat db-directory (completing-read "Database: " (mapcar 'list (mapcar '(lambda (form-name) (substring form-name 0 (- elen))) ; strip off the extension (directory-files db-directory nil (concat ".*\\" extension "$") )))) extension)))) The documentation for forms-mode can be found in the info pages accessed through C-h i in emacs. Checking fields A perl script called clean_fields.pl is available in the database directory. This script checks that the number of fields in each record is correct (emacs forms-mode enforces this too) and that there are no leading or trailing blanks except for empty fields which are replaced by a single blank. It can be used from the shell as % clean_fields.pl This script creates a backup file before modifying the database. There is only one backup file, however, so be careful using this if you are not sure that your edits should be committed. some of these checks are also implemented in an emacs-lisp function called db-check-fields. The .form file will often define this function as the forms-modified-record-hook. Sorting the records If the sort is a simple lexicographic sort on the entire record (such as when the records are defined with the first two fields being lastname and firstname), it can be implemented as a forms-write-file-hook. For more complicated sorts, a sample perl script called sort_phd_info.pl is available in the database directory. Only the first part of the script should ever need to be modified to sort other databases. The database name and the comparison routine are defined at the top. The names $a and $b are magic for the comparisons. The field names are those defined in the .form file associated with the database. They are used as $a->{'grad-yr'}, etc. In perl terminology, they are used as the keys in a hash or associative array. The comparison operator for numeric fields is "<=>". For text fields use "cmp". Printing an ASCII report Perl has good facilities for quickly defining a report format. Usually you just need to define the number of lines per page, the top-of-page format, and the format for each record. A sample report is given as report_phd_info.pl in the database directory. Because we often store the fields "lastname" and "firstname" but we want to print them as "lastname, firstname", I call a subroutine field_mod before printing each record. The example definition is sub field_mod { $r->{'name'} = $r->{'lastname'} . ", " . $r->{'firstname'}; } If you don't need to modify any fields, just define this as an empty subroutine. Do not delete the definition entirely or perl will become confused. That report uses some fancy facilities to split the thesis title over two lines while distributing the rest of the information on the two lines. The report is designed to be printed by % perl report_phd_info.pl | enscript -rBfCourier8 Manipulating the data in S Use % S ... > PhDs <- read.table("/p/stat/Data/admin/phd_info", sep = "\t") to read the data into an S data frame. I usually also pull the field names out of the .form file and assign them as the names attribute. Creating a LaTeX table The approach is similar to creating an ASCII report. I will draft up a sample later. Difficulties with this approach * There is only user caution to protect the databases. We also have the db_name.bak file and the files in the OldFiles directory. * It is possible to do a join in the database. It is not entirely obvious how to do this, however. Again, I can work up an example. * Sharon's machine needs to be upgraded to perl 5.000 to use the perl scripts. We can either install Debian or install the latest Slackware if we want to do a complete upgrade. ------------------------------------------------------------------------------- Last modified: Thu Jul 6 11:12:52 1995 by Douglas Bates bates@stat.wisc.edu