Sources
=======
The two websites we used for crawling movie data are:
  1. Google Play
  2. iTunes Store

Method of crawling the sources
------------------------------
In each website, we crawled movies first based on the genre and then we made
sure that we visit every page under the genre and crawl all the movies under
that particular genre. In this way we were able to crawl all the movies that
were listed under the genres that we were interested in.

Google Play
-----------
This website is Google's digital media store where users can purchase apps,
books, music, movies, etc. online.
https://play.google.com/store/movies

The attributes in GooglePlay table (tableA.json):
* id [TEXT] (The unique identifier of the movie in the Google Play database.)
* name [TEXT] (Name of the movie.)
* year [INTEGER] (Year of the movie release.)
* genre [TEXT] (Genre of movie, e.g., Romance, Comedy, etc.)
* description [TEXT] (Short description about the movie or the storyline.)
* actors [TEXT] (The list of lead actors in the movie.)
* writers [TEXT] (The list of screen-writers of the movie.)
* producers [TEXT] (The list of producers of the movie.)
* directors [TEXT] (The list of directors of the movie.)
* content_rating [TEXT] (Motion Picture Association of America file rating.
  Suitability of the movie to the audience, e.g., PG-13)
* rating [TEXT] (A rating on a five-star scale given for the movie)
* price [TEXT] (The cost to purchase the movie.)
* offer_type [TEXT] (The type of offer available for the movie, e.g., Buy HD,
  Rent HD, etc.)
* all_offers [TEXT] (All the available offers with prices and offer type for
  each.)
* similar_movies_id [TEXT] (The list of movie identifiers of similar movies.)

Number of tuples in Google Play table = 9233

NOTE: Google Play does not provide mechanism for crawling all the movies hosted
in their website. We could only extract all the movies reachable by iterating
through top-selling and new release pages in each genre. Nevertheless, we
extracted more than 9000 movies entries using this method.

iTunes Store
------------
This website is Apple's media library for users to purchase and download apps,
music, television shows, movies, etc.
https://itunes.apple.com/us/genre/movies/id33

The attributes in iTunes table (tableB.json):
* id [TEXT] (The unique identifier of the movie in the iTunes Store database.)
* name [TEXT] (Name of the movie.)
* year [INTEGER] (Year of the movie release.)
* month [TEXT] (Month of the movie release.)
* genre [TEXT] (Genre of movie, e.g., Romance, Comedy, etc.)
* description [TEXT] (Short description about the movie or the storyline.)
* price [TEXT] (The cost to purchase the movie.)
* actors [TEXT] (The list of lead actors in the movie.)
* writers [TEXT] (The list of screen-writers of the movie.)
* producers [TEXT] (The list of producers of the movie.)
* directors [TEXT] (The list of directors of the movie.)
* content_rating [TEXT] (Motion Picture Association of America file rating.
  Suitability of the movie to the audience, e.g., PG-13.)
* rating [TEXT] (A rating on a five-star scale given for the movie.)
* number_of_ratings [TEXT] (The number of people who have rated the movie.)
* similar_movies_id [TEXT] (The list of movie identifiers of similar movies.)
* rotten_tomatoes_tomatometer [TEXT] (The fraction of positive reviews in
  Rotten Tomatoes website.)
* rotten_tomatoes_average_rating [TEXT] (The average rating in Rotten Tomatoes
  website.)

Number of tuples in iTunes table = 17341

NOTE: We were not interested in certain arcane movies categories (like
middle-eastern movies) in both the source websites. Hence, we decided not to
crawl those genres.

