Compute clustering (Ward hierarchical)ΒΆ

The option use_hashing=False must be set for the feature extraction. Recommended options also include, use_idf=1, sublinear_tf=0, binary=0.

The Ward Hierarchical clustering is generally slower that K-mean, however the run time can be reduced by decreasing the following parameters,

  • lsi_components: the number of dimensions used for the Latent Semantic Indexing decomposition (e.g. from 150 to 50)
  • n_neighbors: the number of neighbors used to construct the connectivity (e.g. from 10 to 5)
  • URL: /api/v0/clustering/ward-hc

  • Method: POST URL Params: None

  • Data Params:

    • dataset_id: dataset id
    • n_clusters: the number of clusters
    • lsi_components: (optional) apply LSI with lsi_components before clustering (default None) Only k-means can function without the dimentionality reduction provided by LSI, both “birch” and “ward_hc” require this option to be a positive integer.
    • n_neighbors Number of neighbors for each sample, used to compute the connectivity matrix (see AgglomerativeClustering and kneighbors_graph
  • Success Response: HTTP 200

     {"id": <str>}