Skip to content

Enhanced clustering #290

@orenbenkiki

Description

@orenbenkiki

I am working on Slanter which is a port of an R package for some specialized ways for reordering matrix rows and columns for display. Since such matrices are also often clustered, I needed and implemented reorder_hclust which allows computing a desired branch order from outside the Clustering package. One thing led to another and I ended by also creating ehclust which extends the hclust API with several features:

  1. Being able to specify some preferred order of the leaves and computing a branch order that tries to follow this order as much as possible (e.g., clustering cells and, as much as possible, reordering the branches so that young cells are to the left and old cells are to the right).

  2. Being able to force a strict order on the leaves, that is, constrain the tree nodes such that each covers a continuous range of leaves in that order (e.g., sorting cells by age and then clustering them such that each node covers cells between some minimal and maximal age).

  3. Being able to split the data to distinct groups such that each group is clustered separately (so each one gets a node in the tree) and then clustering the groups to form a complete tree (e.g., assigning a type for each cell and creating a clustering that obeys these type distinctions, clustering the cells in each type and clustering the types).

The code is in https://github.com/tanaylab/Slanter.jl/blob/main/src/enhanced_hclust.jl and documentation is available in https://tanaylab.github.io/Slanter.jl/v0.1.0/

These functions don't really belong in Slanter. It would be nice if they (or some version of them) could find a home in Clustering. I could convert this to a PR - in theory, I could just add the new parameters to hclust as the API is a backward-compatible extension of the current API.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions