Skip to content

Use tmtoolkit to fit multiple LDA models in parallel.#18

Open
SeppeDeWinter wants to merge 1 commit intomainfrom
update-topic_modeling
Open

Use tmtoolkit to fit multiple LDA models in parallel.#18
SeppeDeWinter wants to merge 1 commit intomainfrom
update-topic_modeling

Conversation

@SeppeDeWinter
Copy link
Collaborator

Using this change it is easier for a user to explore multiple parameters for LDA modeling.

The function run_topic_modeling now accepts a list of values for

  • n_topics
  • alpha
  • eta

When a list of values is given for one or more of these parameters multiple models will be fit in parallel to allow the user to explore the most optimal hyperparameters. This is done under the hood using tmtoolkit. The function now returns a list of topic models along with quality metrics.

After a model has been selected this can be added to the AnnData object using the new function add_topic_modeling_result.

tmtoolkit provides multiple functionalities to evaluate topic models, see https://tmtoolkit.readthedocs.io/en/latest/topic_modeling.html#Evaluation-of-topic-models.

For this reason the loglikelihood function is removed given that it is already implement in tmtoolkit.

This change does introduce a new dependency. We could consider making the topic modeling dependencies optional given that it is a more advanced use case.

pip install tfmindi[topic] for instance.

@LukasMahieu
Copy link
Collaborator

Okay, looks interesting, will have to test this.
We already have an "evaluate_topic_models", which does something similar but unoptimized and only for a range over n_topics. Maybe this functionality fits better there?

@SeppeDeWinter
Copy link
Collaborator Author

Okay, looks interesting, will have to test this. We already have an "evaluate_topic_models", which does something similar but unoptimized and only for a range over n_topics. Maybe this functionality fits better there?

True! That way we don't have to change the API.
Not sure how easy it is to automatically detect the optimal parameters though, although I also have not fully explored how strong the results differ between parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants