FreeDiscovery
1.4.dev0

Contents:

  • FreeDiscovery Engine
  • FreeDiscovery Core
    • API
      • Datasets
      • Feature extraction
      • Categorization
      • Clustering
      • Near Duplicates detection
      • Semantic search
        • freediscovery.search.Search
      • IO
      • Metrics
    • Examples
  • User Manual
  • Contributing
  • Release history
FreeDiscovery
  • Docs »
  • FreeDiscovery Core »
  • API »
  • freediscovery.search.Search
  • Edit on GitHub

freediscovery.search.Search¶

class freediscovery.search.Search(vectorizer, tfidf, lsi=None)[source]¶

(Semantic) search in a document collection

Parameters:
  • vectorizer ({CountVectorizer, HashingVectorizer}) – the (fitted) vectorizer that was used extract tokens from the document collection
  • tfidf ({TfidfTransformer, SmartTfidfTransfomer}) – the (fitted) IDF transformer used to weight and normalize the bag of word/n-gram features
  • lsi (TruncatedSVD) – (optional) an LSI model fitted on the vectorised document-term matrix If provided this corresponds to a semantic search, default=None
fit(X)[source]¶

Fit using a document term matrix (optionally in the LSI space)

Parameters:X (ndarray) – the sparse document-terms arrays (if lsi was not used) or dense documents / lsi terms array (if lsi was provided)
search(text, metric='cosine')[source]¶

Perform the search operation

Parameters:
  • text (str) – the search query text
  • metric (str) – the output metric to use
search_id(internal_id, metric='cosine')[source]¶

Perform the search operation

Parameters:
  • internal_id (int) – the internal_id of the document used as a search query
  • metric (str) – the output metric to use
Next Previous

© Copyright 2016 - 2017, Grossman Labs LLC. Last updated on Jan 31, 2019.

Built with Sphinx using a theme provided by Read the Docs.