Skip to content

Conversation

@plaguss
Copy link
Contributor

@plaguss plaguss commented Sep 5, 2024

Description

This PR adds a general TextClassification task, which can be useful for both label and multilabel classification.

Also UMAP, DBSCAN and TextClustering, which are quite related to each other, to generate clusters of text.

It was defined as part of a pipeline for text clustering, so may be biased towards that definition, but should be general enough to work on any common text classification tasks.

TODO. Create an example in the docs of a full text clustering pipeline, including the label inference.

@plaguss plaguss requested review from gabrielmbmb and removed request for gabrielmbmb September 5, 2024 09:26
@plaguss plaguss self-assigned this Sep 5, 2024
@plaguss plaguss added the enhancement New feature or request label Sep 5, 2024
@plaguss plaguss linked an issue Sep 5, 2024 that may be closed by this pull request
@github-actions
Copy link

github-actions bot commented Sep 5, 2024

Documentation for this PR has been built. You can view it at: https://distilabel.argilla.io/pr-948/

@codspeed-hq
Copy link

codspeed-hq bot commented Sep 5, 2024

CodSpeed Performance Report

Merging #948 will not alter performance

Comparing text-clustering (c0cbe15) with develop (28ecbc4)

Summary

✅ 1 untouched benchmarks

@plaguss plaguss changed the title Add TextClassification task Add TextClassification, UMAP, DBSCAN and TextClustering tasks Sep 9, 2024
@plaguss plaguss added this to the 1.4.0 milestone Sep 10, 2024
@plaguss plaguss marked this pull request as ready for review September 11, 2024 13:05
@plaguss plaguss requested a review from gabrielmbmb September 11, 2024 13:05
Copy link
Contributor

@gabrielmbmb gabrielmbmb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@plaguss plaguss merged commit f0067b8 into develop Sep 16, 2024
@plaguss plaguss deleted the text-clustering branch September 16, 2024 04:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] add TextClassificationLabeler Task

3 participants