generated from amazon-archives/__template_Apache-2.0
-
Notifications
You must be signed in to change notification settings - Fork 111
Open
Labels
bugSomething isn't workingSomething isn't working
Description
from pecos.xmc import Indexer
import scipy.sparse as smat
import numpy as np
label_embeddings = np.array(
[[-9.21174158, 5.11299655],
[-8.59250195, -1.11406841],
[-4.30549653, 3.99404334],
[-4.43811548, 4.68773409],
[-6.00330942, 7.96222741],
[-6.87172864, 8.01769469],
[-8.86330667, 4.96141572],
[-4.3774397 , 4.60103839],
[-6.42845615, 7.20886612],
[-9.69681323, -2.32416397]], dtype=np.float32)
# ground truth for label clusters
target_label_clusters =np.array([0,2,3,3,1,1,0,3,1,2])
label_embeddings = smat.csr_matrix(label_embeddings)
chain = Indexer.gen(feat_mat=label_embeddings, indexer_type="hierarchicalkmeans", max_leaf_size=3, spherical=False)
I made a fake label_embeddings to make it easier to see the problem.

We can plot label_embeddings on 2D image. It should be partition like this.
But I made a breakpoint at codes = clib.run_clustering, I got the codes as [1, 0, 1, 3, 2, 2, 1, 3, 3, 0]
Comparing it to the target_label_clusters=[0,2,3,3,1,1,0,3,1,2],
codes[2]=1, its correct cluster should be 3; codes[8]=3, its correct cluster should be 2.
I can't figure out why such a simple feature can't be divided correctly.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working