http://www.yelp.com/dataset_challenge/
Download the dataset and place all the files yelp_academic_dataset_*.json in a directory called data/ which is part of the gitignore.
You should have the following installed:
scipynumpytextblobsklearn(can be installed throughpip install -U scikit-learn)nltksqlalchemy
After you install textblob, you need to run the following:
python -m textblob.download_corpora
After you install nltk, you need to run the following:
>>> import nltk
>>> nltk.download()
We are using SQLite with SqlAlchemy as the ORM.
To auto-create the database schema and populate the tables, run import_data_to_sql in parse.py
$ python -i parse.py
>>> import_data_to_sql()
Great it's imported, now what? Read the docs. You can do some cool queries based on the three models we have: Tip, Business, Review (see db.py).
An example query:
session.query(Tip).filter(Tip.bid == 'U7jOpLoLXYphWFqS6JO8mQ').all()
If you want to change the database schema (add another field or something), you'll also need to re-import the data. Delete data/store.db and a new database will be created.