Compute clustering (Ward hierarchical)ΒΆ
The option use_hashing=False
must be set for the feature extraction. Recommended options also include, use_idf=1, sublinear_tf=0, binary=0
.
The Ward Hierarchical clustering is generally slower that K-mean, however the run time can be reduced by decreasing the following parameters,
lsi_components
: the number of dimensions used for the Latent Semantic Indexing decomposition (e.g. from 150 to 50)n_neighbors
: the number of neighbors used to construct the connectivity (e.g. from 10 to 5)
URL:
/api/v0/clustering/ward-hc
Method:
POST
URL Params: NoneData Params:
dataset_id
: dataset idn_clusters
: the number of clusterslsi_components
: (optional) apply LSI withlsi_components
before clustering (default None) Only k-means can function without the dimentionality reduction provided by LSI, both “birch” and “ward_hc” require this option to be a positive integer.n_neighbors
Number of neighbors for each sample, used to compute the connectivity matrix (see AgglomerativeClustering and kneighbors_graph
Success Response:
HTTP 200
{"id": <str>}