Machine learning for 3ml
Use machine learning to find the "nearest neighbours" for each story in our database using tf-idf. This can be used to provide users with a list of similar stories to the one they are reading.
$ python -m venv ./venv
$ source ./venv/bin/activate(.fish)
$ pip install jupyterlab scikit-learn pandas nltk
Or use the requirements file (created using pip freeze) if you have problems:
pip install -r requirements.txt
Dump the stories table to stories.csv:
psql --csv -d my3ml -o stories.csv -c 'select id,title,content from story where enabled and not archived'
source ./ml-venv/bin/activate
jupyter-lab 3ml_stories.ipynb
Use the browser to open the notebook.
The notebook creates a CSV file, stories_nn.csv. This can be edited using vim macros to convert it to SQL, i.e. lines of the form
update story set nearest_neighbours='{968, 964, 963, 965, 985, 966, 967, 989, 988, 969}' where id=950;TODO: create the SQL file directly from the pandas dataframe.
Then run the file using psql -f stories_nn.sql -d <db_or_url>