API¶
This is the API reference for the FreeDiscovery Python package
Datasets¶
freediscovery.datasets.load_dataset ([name, ...]) |
Download a benchmark dataset. |
Feature extraction¶
freediscovery.feature_weighting.SmartTfidfTransformer ([...]) |
TF-IDF weighting and normalization with the SMART IR notation |
Categorization¶
freediscovery.neighbors.NearestNeighborRanker ([...]) |
A nearest neighbor ranker. |
Clustering¶
freediscovery.cluster.Birch ([threshold, ...]) |
Non online version of the Birch clustering algorithm |
freediscovery.cluster.BirchSubcluster (\*\*args) |
A container class for BIRCH cluster hierarchy |
freediscovery.cluster.birch_hierarchy_wrapper (birch) |
Wrap BIRCH cluster hierarchy with a container class |
freediscovery.cluster.ClusterLabels (vect, model) |
Calculate the cluster labels. |
Near Duplicates detection¶
freediscovery.near_duplicates.SimhashNearDuplicates ([...]) |
Near duplicates detection using the simhash algorithm. |
freediscovery.near_duplicates.IMatchNearDuplicates ([...]) |
Near duplicates detection using the randomized I-Match algorithm. |
Semantic search¶
freediscovery.search.Search (vectorizer, tfidf) |
(Semantic) search in a document collection |
IO¶
freediscovery.io.parse_smart_tokens (text) |
Parse a dataset stored in the SMART tokenized format, used in particular for the RCV1-v2 dataset, http://www.jmlr.org/papers/volume5/lewis04a/lyrl2004_rcv1v2_README.htm (cf. |
Metrics¶
This module aims to extend sklearn.metrics with a few additional metrics,
freediscovery.metrics.recall_at_k_score (...) |
Recall after retrieving k documents from the collections |