freediscovery.cluster.BirchSubcluster¶
-
class
freediscovery.cluster.
BirchSubcluster
(**args)[source]¶ A container class for BIRCH cluster hierarchy
This is a dict like container, that is used to store each subcluster in the cluster hierarchy computed by
freediscovery.cluster.Birch
. A given subcluster links to the parent / children subclusters in the hierarchy with the following attributes,- parent :
BirchSubcluster
, the parent container - children :
list
ofBirchSubcluster
, contains the children subclusters
Each subcluster stores the following dictionary keys,
- document_id :
list
, a list of document / sample ids contained in this subcluster (excluding its children). - document_id_accumulated: a list of document / sample ids
contained in this subcluster and its children. Only available when
this class was build using
birch_hierarchy_wrapper()
with thecompute_document_id=True
parameter. It can be re-computed with thedocument_id_accumulated
class property. - cluster_size: int, the number of samples contained in this
subcluster and its children. This corresponds to the length of
the
document_id_accumulated
property. Only available when this class was build usingbirch_hierarchy_wrapper()
with thecompute_document_id=True
parameter.
other keys may be user-computed as necessary.
See User Manual for more details.
Note
This class descends from
freediscovery.externals.jwzthreading.Container
originally used to represent e-mail threads obtained with the JWZ algorithm in jwzthreading, though it is general enough to represent other hierarchical stuctures, such as BIRCH cluster hierarchy.In FreeDiscovery this class is primarily used for documents. As a result the variables/methods containing the term “document” have the same meaning as “sample” in the general scikit-learn context.
-
clear
() → None. Remove all items from D.¶
-
copy
() → a shallow copy of D¶
-
current_depth
¶ Compute the depth in the hierarchy of the current container
-
document_count
¶ Count of all documents in the children subclusters
-
document_id_accumulated
¶ Returns list of document / sample ids contained in this subcluster or any of its children.
-
flatten
()[source]¶ Return a flatten version of the hierarchical tree
Returns: list – a flat list of containers Return type: Containers
-
fromkeys
()¶ Returns a new dict with keys from iterable and values equal to value.
-
get
(k[, d]) → D[k] if k in D, else d. d defaults to None.¶
-
has_descendant
(ctr)[source]¶ Check if ctr is a descendant of this container.
Parameters: ctr (Container) – possible descendant container. Returns: Return type: True if ctr is a descendant of self, else False.
-
is_dummy
¶ Check if the container has some content.
-
items
() → a set-like object providing a view on D's items¶
-
keys
() → a set-like object providing a view on D's keys¶
-
limit_depth
(max_depth=None)[source]¶ Truncate the tree to the provided maximum depth
Parameters: max_depth (int) – hierarchy depth to which truncate the tree
-
pop
(k[, d]) → v, remove specified key and return the corresponding value.¶ If key is not found, d is returned if given, otherwise KeyError is raised
-
popitem
() → (k, v), remove and return some (key, value) pair as a¶ 2-tuple; but raise KeyError if D is empty.
-
remove_child
(child)[source]¶ Remove a child from the container
Parameters: child (Container) – Child to remove.
-
root
¶ Get the root container
Returns: Containe Return type: the top most level container
-
setdefault
(k[, d]) → D.get(k,d), also set D[k]=d if k not in D¶
-
tree_size
¶ Recursively count the number of children containers. The current container is also included in the count.
-
update
([E, ]**F) → None. Update D from dict/iterable E and F.¶ If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
-
values
() → an object providing a view on D's values¶
- parent :