Write Up for CS736

Database Support for Terabyte-sized File Systems

Ying Lu, Jin Wang, Qing Wang

Abstract:

With the development of storage techniques, high capacity disk is popularly available. A notebook can have more than 10 GB disk space that can accomadate large number, say tens of thousands of files. Finding a wanted file from those huge number of files becomes an annoying task for most users. Current OS provides limited facilities to help user to locate files, such as the find command in UNIX and explorer in Windows environment. Those facilities can only provide simple help to users to locate the files. It is desirable to have some facilities that can help user to retrieve a file in more convenient way.

In our proposed system, we provide different ways for user to locate a file within a file system. Such as the metadata information of the files, the keywords of plain text files, semantic information from special files which includes the header information of the mail files, and the keywords of the source code files. B+Tree indexing is used as the indexing structure instead of the traditional inverted files due to its simplicity, efficient access time, and adaptation to file growth. Our query interface is of UNIX-like commands. Users are able to query boolean queries using and and or operations on various types of files and data patterns. The output of the query result is in the form of virtual directory where results are listed as file names and directories. Caching of previous queries were done in order to avoid redundant searching of the same pattern. It also provides the flexibilty of going down and up the virtual directory hierarchy of a set of related queries within the same scope. Four main components make up of our preliminary version of Database-Supported Terabyte sized File System. The interaction between each of the components and their implementation details are provided in the paper.

Available as: postscript

Link to Source Code