freediscovery.cluster.Birch

class freediscovery.cluster.Birch(threshold=0.5, branching_factor=50, n_clusters=3, compute_labels=True, copy=True, compute_sample_indices=False)[source]

Non online version of the Birch clustering algorithm

This is a patched version of sklearn.cluster.Birch that allows to store indices of samples belonging to each subcluster in the hierarchy (scikit-learn/scikit-learn#8808). As a result this version does not allow online learning, however it,

  • allows to more easily explore the hierarchy of clusters
  • can scale better for high dimensional data

See user manual.

For general information about the Birch algorithm, see the sklearn.cluster.Birch documentation and the scikit-learn User Guide.

Parameters:
  • args (other parameters) – See sklearn.cluster.Birch
  • compute_sample_indices (bool, default False) – Whether the indices of samples belonging to each hierarchical subcluster should be included in the _CFSubcluster.samples_id_ attribute. This option can have some memory overhead.

Examples

>>> from freediscovery.cluster import Birch
>>> X = [[0, 1], [0.3, 1], [-0.3, 1], [0, -1], [0.3, -1], [-0.3, -1]]
>>> brc = Birch(branching_factor=50, n_clusters=None, threshold=0.5,
... compute_labels=True)
>>> brc.fit(X)
... 
Birch(branching_factor=50, compute_labels=True,
   compute_sample_indices=False, copy=True, n_clusters=None,
   threshold=0.5)
>>> brc.predict(X)
array([0, 0, 0, 1, 1, 1])
fit(X, y=None)[source]

Build a CF Tree for the input data.

Parameters:
  • X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Input data.
  • y (Ignored) –
fit_predict(X, y=None)[source]

Performs clustering on X and returns cluster labels.

Parameters:
  • X (ndarray, shape (n_samples, n_features)) – Input data.
  • y (Ignored) – not used, present for API consistency by convention.
Returns:

labels – cluster labels

Return type:

ndarray, shape (n_samples,)

fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:
  • X (numpy array of shape [n_samples, n_features]) – Training set.
  • y (numpy array of shape [n_samples]) – Target values.
Returns:

X_new – Transformed array.

Return type:

numpy array of shape [n_samples, n_features_new]

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters:deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:params – Parameter names mapped to their values.
Return type:mapping of string to any
predict(X)[source]

Predict data using the centroids_ of subclusters.

Avoid computation of the row norms of X.

Parameters:X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Input data.
Returns:labels – Labelled data.
Return type:ndarray, shape(n_samples)
set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:self
transform(X)[source]

Transform X into subcluster centroids dimension.

Each dimension represents the distance from the sample point to each cluster centroid.

Parameters:X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Input data.
Returns:X_trans – Transformed data.
Return type:{array-like, sparse matrix}, shape (n_samples, n_clusters)

Examples using freediscovery.cluster.Birch