freediscovery.cluster.Birch¶

class freediscovery.cluster.Birch(threshold=0.5, branching_factor=50, n_clusters=3, compute_labels=True, copy=True, compute_sample_indices=False)[source]¶

Non online version of the Birch clustering algorithm

This is a patched version of sklearn.cluster.Birch that allows to store indices of samples belonging to each subcluster in the hierarchy (scikit-learn/scikit-learn#8808). As a result this version does not allow online learning, however it,

allows to more easily explore the hierarchy of clusters

can scale better for high dimensional data

See user manual.

For general information about the Birch algorithm, see the sklearn.cluster.Birch documentation and the scikit-learn User Guide.

Parameters:	args (other parameters) – See `sklearn.cluster.Birch` compute_sample_indices (bool, default False) – Whether the indices of samples belonging to each hierarchical subcluster should be included in the `_CFSubcluster.samples_id_` attribute. This option can have some memory overhead.

Examples

>>> from freediscovery.cluster import Birch
>>> X = [[0, 1], [0.3, 1], [-0.3, 1], [0, -1], [0.3, -1], [-0.3, -1]]
>>> brc = Birch(branching_factor=50, n_clusters=None, threshold=0.5,
... compute_labels=True)
>>> brc.fit(X)
... 
Birch(branching_factor=50, compute_labels=True,
   compute_sample_indices=False, copy=True, n_clusters=None,
   threshold=0.5)
>>> brc.predict(X)
array([0, 0, 0, 1, 1, 1])

fit(X, y=None)[source]¶

Build a CF Tree for the input data.

Parameters:	X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Input data. y (Ignored) –

fit_predict(X, y=None)[source]¶

Performs clustering on X and returns cluster labels.

Parameters:	X (ndarray, shape (n_samples, n_features)) – Input data.
Returns:	y – cluster labels
Return type:	ndarray, shape (n_samples,)

fit_transform(X, y=None, **fit_params)[source]¶

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters:	X (numpy array of shape [n_samples, n_features]) – Training set. y (numpy array of shape [n_samples]) – Target values.
Returns:	X_new – Transformed array.
Return type:	numpy array of shape [n_samples, n_features_new]

get_params(deep=True)[source]¶

Get parameters for this estimator.

Parameters:	deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:	params – Parameter names mapped to their values.
Return type:	mapping of string to any

predict(X)[source]¶

Predict data using the centroids_ of subclusters.

Avoid computation of the row norms of X.

Parameters:	X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Input data.
Returns:	labels – Labelled data.
Return type:	ndarray, shape(n_samples)

set_params(**params)[source]¶

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns:
Return type:	self

transform(X)[source]¶

Transform X into subcluster centroids dimension.

Each dimension represents the distance from the sample point to each cluster centroid.

Parameters:	X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Input data.
Returns:	X_trans – Transformed data.
Return type:	{array-like, sparse matrix}, shape (n_samples, n_clusters)

Examples using `freediscovery.cluster.Birch`¶

Exploring BIRCH cluster hierarchy

freediscovery.cluster.Birch¶

Examples using freediscovery.cluster.Birch¶

Examples using `freediscovery.cluster.Birch`¶