freediscovery.cluster.utils._merge_clusters

freediscovery.cluster.utils._merge_clusters(X, rename=False)[source]

Compute a union of all clusters

Used to determine which cluster_id a document should belong to if at least one of it’s lexicons suggest that it’s a duplicate

Approximate time complexity O(n_samples*n_features)

Parameters:
  • X (array (n_samples, n_features)) – input arrays with the cluster id’s to merge
  • rename (binary) – make sure the output array is between 0 and len(unique(cluster_id))
  • cluster_id (array (n_samples)) –