We create this project to investigate and explore the google book datasets. I worked on it under the supervision of Prof. Michael Gleicher. The idea of this project is initially raised by Michael Gleicher.
All samples here are online live, supported by the back-end database, you can play with them.
Currently, this is a temporary server.
Statistic Information:
- Estimated n-grams datasets size: 360,717,742,667 entries.
- Estimated 1-grams datasets size: 472,000,000 entries.
- Estimated 2-grams datasets size: 6,622,000,000 entries.
- More statistic information, please see the statistics page!
Table Structure: [+more]
- books
- id grams number year mcount pcount vcount language
- int string int int int int int int
- agg_years
- year words number language
- int int int int
How to access these data in database?
- Code for accessing the database will be released in a few days. Now, upon requested!
Sentence Visualization
Description:
Phrase and phrase are separated by ','.
I set up a simple web search engine for 1-grams:
Multiple words search, word and word separated by ',':
Here is graph search engine for word counts by year, decade, century:
Here is graph search engine for words ring - analyze the character strings:
Here is graph search engine for words ring - analyze the character strings:
Here is graph search engine for 3grams words:
Here is graph search engine for 3grams words:
Here is a page for the Data quality:
Sample homepage:
Here is graph search engine for word counts by year, decade, century:
Sample homepage:
Normalized Data:
Sample homepage:
Description:
Try to magnify and shrink some influence for the factors, like book num, and focus on the interested time period.
Example: Top10 by my algorithm
thefe, fuch, fome, moft, muft, firft, fhould, himfelf, againft, themfelves
Description:
Pop, aggregate the interesting word, and appear on the top of the datasets.
Sentence
Description:
Phrase and phrase are separated by ','.
3Grams Words
Sample homepage:
Multi-words Search Graph
Sample homepage:
Description:
Phrase and phrase are separated by ','.
Events Detection
Sample homepage:
Words Percent
A pie chart to show the word count by year, decade, century.
Sample homepage:
Words Tree
To explore the co-relation between character strings.
Application homepage:
Sample homepage:
Words Ring
To explore the co-relation between character strings.
Application homepage:
Sample homepage:
An Example: Olympics
Description:
In the small trend graph, we can see some striking points (topper than its surrounding points.)
From the large detailed trend graph, we can get more accurate information, these points are:
1976's Montreal Summer Olympics, 1980's Moscow, 1988's Seoul, 1996's Atlanta, 2000's Sydney, 2004's Athens.
An Example: Bush
Description:
In the small trend graph, we can see two peaks:
One is in George H. W. Bush's administration period (1989-1993), and the other is in George W. Bush's administration (2001-2009).
More interesting examples, please here.
A test sample
Description:
This is my first, very very simple sample, I test it just for the google dataset to see whether there is any great, interesting graphs to explore.
1-grams word history trend Text Search
Description:
A Text Search Engine for World history trend.
1-grams word history trend Graph Search
Description:
A Graph Search Engine for World history trend: Overview(left) and Details(right).
Note: play this page with Firefox or Chrome.