Skip to content

ThreeMinuteLearning/3ml-ml

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Machine learning for 3ml

Use machine learning to find the "nearest neighbours" for each story in our database using tf-idf. This can be used to provide users with a list of similar stories to the one they are reading.

Setup

Use venv to setup python for the project

$ python -m venv ./venv
$ source ./venv/bin/activate(.fish)
$ pip install jupyterlab scikit-learn pandas nltk

Or use the requirements file (created using pip freeze) if you have problems:

pip install -r requirements.txt

Extract story data as a CSV file

Dump the stories table to stories.csv:

psql --csv -d my3ml -o stories.csv -c 'select id,title,content from story where enabled and not archived'

Running the notebook

source ./ml-venv/bin/activate

jupyter-lab 3ml_stories.ipynb

Use the browser to open the notebook.

Update the database

The notebook creates a CSV file, stories_nn.csv. This can be edited using vim macros to convert it to SQL, i.e. lines of the form

update story set nearest_neighbours='{968, 964, 963, 965, 985, 966, 967, 989, 988, 969}' where id=950;

TODO: create the SQL file directly from the pandas dataframe.

Then run the file using psql -f stories_nn.sql -d <db_or_url>

About

Machine learning code for 3ml stories

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published