(threshold=0.5, branching_factor=50, n_clusters=3, compute_labels=True, copy=True, compute_sample_indices=False)[source]¶ Non online version of the Birch clustering algorithm
This is a patched version of
that allows to store indices of samples belonging to each subcluster in the hierarchy (scikit-learn/scikit-learn#8808). As a result this version does not allow online learning, however it,- allows to more easily explore the hierarchy of clusters
- can scale better for high dimensional data
See user manual.
For general information about the Birch algorithm, see the
documentation and the scikit-learn User Guide.Parameters: - args (other parameters) – See
- compute_sample_indices (bool, default False) – Whether the indices of samples belonging to each hierarchical
subcluster should be included in the
attribute. This option can have some memory overhead.
>>> from freediscovery.cluster import Birch >>> X = [[0, 1], [0.3, 1], [-0.3, 1], [0, -1], [0.3, -1], [-0.3, -1]] >>> brc = Birch(branching_factor=50, n_clusters=None, threshold=0.5, ... compute_labels=True) >>> ... Birch(branching_factor=50, compute_labels=True, compute_sample_indices=False, copy=True, n_clusters=None, threshold=0.5) >>> brc.predict(X) array([0, 0, 0, 1, 1, 1])
(X, y=None)[source]¶ Build a CF Tree for the input data.
Parameters: - X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Input data.
- y (Ignored) –
(X, y=None)[source]¶ Performs clustering on X and returns cluster labels.
Parameters: - X (ndarray, shape (n_samples, n_features)) – Input data.
- y (Ignored) – not used, present for API consistency by convention.
Returns: labels – cluster labels
Return type: ndarray, shape (n_samples,)
(X, y=None, **fit_params)[source]¶ Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
Parameters: - X (numpy array of shape [n_samples, n_features]) – Training set.
- y (numpy array of shape [n_samples]) – Target values.
Returns: X_new – Transformed array.
Return type: numpy array of shape [n_samples, n_features_new]
(deep=True)[source]¶ Get parameters for this estimator.
Parameters: deep (boolean, optional) – If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns: params – Parameter names mapped to their values. Return type: mapping of string to any
(X)[source]¶ Predict data using the
of subclusters.Avoid computation of the row norms of X.
Parameters: X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Input data. Returns: labels – Labelled data. Return type: ndarray, shape(n_samples)
(**params)[source]¶ Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form
so that it’s possible to update each component of a nested object.Returns: Return type: self
(X)[source]¶ Transform X into subcluster centroids dimension.
Each dimension represents the distance from the sample point to each cluster centroid.
Parameters: X ({array-like, sparse matrix}, shape (n_samples, n_features)) – Input data. Returns: X_trans – Transformed data. Return type: {array-like, sparse matrix}, shape (n_samples, n_clusters)