With the explosion of new audio and video content on the Web, it's more important than ever to use accurate and comprehensive metadata to get the most out of that content. Developing Quality Metadata is an advanced user guide that will help you improve your metadata by making it accurate and coherent with your own solutions. This book is designed to get you thinking about solving problems in a proactive and productive way by including practical descriptions of powerful programming tools and user techniques using ...
With the explosion of new audio and video content on the Web, it's more important than ever to use accurate and comprehensive metadata to get the most out of that content. Developing Quality Metadata is an advanced user guide that will help you improve your metadata by making it accurate and coherent with your own solutions. This book is designed to get you thinking about solving problems in a proactive and productive way by including practical descriptions of powerful programming tools and user techniques using several programming languages. For example, you can use shell scripting as part of the graphic arts and media production process, or you can use a popular spreadsheet application to drive your workflow. The concepts explored in this book are framed within the context of a multimedia professional working on the Web or in broadcasting, but they are relevant to anyone responsible for a growing library of content, be it audio-visual, text, or financial.
Cliff Wootton was the technical systems architect in the BBC News Interactive TV group. This team pioneered the "News Loops" service, which was nominated for a BAFTA Technology award and has won a Royal Television Society Award for Technical Innovation. His current research projects are investigating new ways to build interactive content creation tools for the emerging IPTV platforms
Acknowledgments xi
Introduction xiii
Framing the Problem 1
Metadata 11
Object Modeling Your Data 35
Transfer and Conversion 59
Dealing With Raw Data 73
Character Mapping and Code Sets 81
Data Fields 93
Fields, Records, and Tables 97
Times, Dates, Schedules, and Calendars 107
Names, Addresses, and Contacts 127
Spatial Data and Maps 139
Paint Me a Picture 157
Roll Tape! 165
Rights Issues 185
Integrating with Enterprise Systems 191
Data Exchange Formats 197
XML-Based Tools and Processes 213
Interfaces and APIs 233
Scripting Layers 247
UNIX Command Line Tools 255
Power Tools 263
Automation with Shell Scripts 273
Automation with AppleScript 281
Script Automation in Windows 287
Compiled and Interpreted Languages 299
GUI Tools and Processes 309
Building Tools 313
Keep It Moving 323
Publishing Systems 343
Adding Intelligence and Metrics 347
Lateral Thinking 353
The Bottom Line 359
Tutorials 365
Calling Shell Commands from AppleScript 369
Calling AppleScript from Shells 370
Calling Visual Basic from AppleScript 372
Calling Visual Basic from UNIX 373
Calling UNIX Shell Commands from C, 374
Calling Java from C Language 375
Calling C from Java 376
What Your Web Server Log Can Tell You 377
Monitoring Your Operating System Logs 378
Measuring and Monitoring Disk Usage 380
Wrapping FTP Transfers in a Script 385
Wrapping gzip in a Shell Script 389
Many-to-Many Relationships 390
Phonetic Searches 391
Fuzzy Searching and Sorting 393
Finding Buffer Truncation Points 394
Cleaning Unstructured Data 395
Sorting Out Address Data 396
Time Data Cleaning 400
Removing Duplicates 401
Converting TSV to XML 402
Removing Macros from Word Documents 404
Removing all Hyperlinks from Word 405
Recognizing U.S. Zip Codes 406
Recognizing UK Postal Codes 407
Finding Variable Names in Source Codes 408
Finding Double-Quoted Strings 409
Finding Single-Quoted Strings 410
Finding Currency Values 411
Finding Time Values 412
Recovering Text from Corrupted Documents 413
Extracting Text from PDF Files 415
Mail and HTTP Header Removal 417
ISO 8601 Date Format Output 419
Relative Date Tool (the Date) 421
Zip/Postal Code-to-Location Mapping 423
Shortest Distance Between Two Towns 424
Dealing with Islands 425
Calculate Centroid of Area 427
Extracting Text from Illustrator 428
Generating Candidate Keywords 430
Extracting Metadata from Word Documents 432
Extracting Metadata from Image Files 433
Extract Metadata from a QuickTime Movie 434
Discovering Formats with File Magic 437
Extracting Hyperlinks from Word Documents 438
Extracting URLs from Raw Text 439
Testing URL Hyperlinks 440
Dictionary Lookups via Did.org 441
Lookup the Online Dictionary from a Web Page 442
Check for Editorial Integrity 443
Publish a Spreadsheet SYLK File 444
Publish a Word RTF Document 447
Publish an Adobe SVG 452
Converting XML to HTML with XSLT 454
Making Excel Spreadsheets with AppleScript 456
Making Word Documents with AppleScript 457
Scripting Alpha Channels in Photoshop 458
Searching and Editing Word Docs 459
Creating a Script Wrapper for Microsoft Word 462
Putting It on the Desktop 468
Remote Renderers and Compilers 470
Data Exchange Containers 473
Metadata Standards 481
A Simple Metadata Dictionary 495
Code Sets 499
Regular Expressions 101 503
Glossary 507
Bibliography 513
Webliography 515
Index 524
Overview