freediscovery.cluster.utils._merge_clusters¶

freediscovery.cluster.utils._merge_clusters(X, rename=False)[source]¶

Compute a union of all clusters

Used to determine which cluster_id a document should belong to if at least one of it’s lexicons suggest that it’s a duplicate

Approximate time complexity O(n_samples*n_features)

Parameters:	X (array (n_samples, n_features)) – input arrays with the cluster id’s to merge rename (binary) – make sure the output array is between 0 and len(unique(cluster_id)) cluster_id (array (n_samples)) –