Introduction
This page lists some statistic information for the google book datasets.
Server
Currently, this is a temporary server.
- Database: Table books in kf_proj5 on Server http://sepc111.se.cuhk.edu.hk:8080/phpPgAdmin/
- Username and Password upon Requested!!!
Statistic Information:
- Estimated n-grams datasets size: 360,717,742,667 entries.
- Estimated 1-grams datasets size: 470,000,000 entries.
- Currently, part 0 populating finished! ~ 47,323,010 (2011-09-05)
- Part 1 populating finished! ~ 94,720,238 (2011-09-06)
- Part 2 populating finished! ~ 142,089,474 (2011-09-07)
- Part 3 populating finished! ~ 189,298,194 (2011-09-08)
- Part 4 populating finished! ~ 236,491,492 (2011-09-09)
- Part 5 populating finished! ~ 283,791,303 (2011-09-09)
- Part 6 populating finished! ~ 331,037,776 (2011-09-10)
- Part 7 populating finished! ~ 378,340,852 (2011-09-11)
- Part 8 populating finished! ~ 425,549,835 (2011-09-12)
- Part 9 populating finished! ~ 472,764,897 (2011-09-12)
- Estimated 2-grams datasets size: 6,600,000,000 entries.
- Currently, part 0 populating finished! ~ 538,992,318 (2011-09-14)
- Part 1 populating finished! ~ 605,212,331 (2011-09-15)
- Part 2 populating finished! ~ 671,487,875 (2011-09-16)
- Part 3 populating finished! ~ 737,659,606 (2011-09-19)
- Part 4 populating finished! ~ 803,897,338 (2011-09-20)
- Part 5 populating finished! ~ 870,199,893 (2011-09-21)
- Part 6 populating finished! ~ 936,437,090 (2011-09-22)
- Part 7 populating finished! ~ 1,002,595,580 (2011-09-24)
- Part 8 populating finished! ~ 1,068,803,052 (2011-09-25)
- Part 9 populating finished! ~ 1,135,162,108 (2011-09-26)
- Part 10 populating finished! ~ 1,201,537,350 (2011-09-27)
- After the re-construction on database: Estimated 2-grams datasets size: 6,600,000,000 entries.
- Currently, part 0 populating finished! ~ 66,227,421 (2011-09-29)
- Part 1 populating finished! ~ 132,447,334 (2011-09-30)
- Part 2 populating finished! ~ 198,722,978 (2011-10-01)
- Part 3 populating finished! ~ 264,894,709 (2011-10-03)
- Part 4 populating finished! ~ 331,132,441 (2011-10-05)
- Part 5 populating finished! ~ 397,434,996 (2011-10-07)
Table Structure:
- books
- id grams number year mcount pcount vcount language
- int string int int int int int int
- agg_years
- year words number language
- int int int int
How to access these data in database?
- Code for accessing the database will be released in a few days. Now, upon requested!