Skip to content

Unhelpful error in VisualizeEmbeddings when docs not set  #17

@zilch42

Description

@zilch42

Hi there,

Small thing, I think it may be helpful to have some error checking on whether docs have been set when they are needed.

I was running through the example notebook and just set the newsgroup embeddings using tmt.embeddings = embeddings rather than calculating them (because I use them all the time I just have them saved) but didn't set the documents anywhere.

When I got to tmt.visualizeEmbeddings(131,78).show() it threw the following error generated in _check_CS_SS

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[14], line 1
----> 1 tmt.visualizeEmbeddings(131,78).show()

File [c:\path\lib\site-packages\topictuner\basetuner.py:316](file:///C:/path/lib/site-packages/topictuner/basetuner.py:316), in BaseHDBSCANTuner.visualizeEmbeddings(self, min_cluster_size, min_samples, width, height, markersize, opacity)
    310     VizDF["wrappedText"] = [
    311         "Topic #: " + str(topic) + "

" + text
    312         for topic, text in zip(VizDF["topics"], wrappedText)
    313     ]
    314 else:
    315     VizDF["wrappedText"] = [
--> 316         "Topic #: " + str(topic) for topic in self.runHDBSCAN()
    317     ]
    318 for topiclabel in set(VizDF["topics"]):
    319     topicDF = VizDF.loc[VizDF["topics"] == topiclabel]

File [c:\path\lib\site-packages\topictuner\basetuner.py:94](file:///C:/path/lib/site-packages/topictuner/basetuner.py:94), in BaseHDBSCANTuner.runHDBSCAN(self, min_cluster_size, min_samples)
     88 def runHDBSCAN(self, min_cluster_size: int = None, min_samples: int = None):
     89     """
     90     Cluster the target embeddings (these will be the reduced embeddings when
     91     run as a TMT instance. Per HDBSCAN, min_samples must be more than 0 and less than
     92     or equal to min_cluster_size.
     93     """
---> 94     min_cluster_size, min_samples = self._check_CS_SS(
...
--> 408         raise ValueError("Cannot set min_cluster_size==None")
    409 if min_cluster_size == 1:
    410     raise ValueError("min_cluster_size must be more than 1")

ValueError: Cannot set min_cluster_size==None

This wasn't very helpful as the issue was no docs being set, not anything to do with min_cluster_size or min_samples

Setting tmt.docs = docs resolve the issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions