freediscovery.search.Search¶
-
class
freediscovery.search.
Search
(vectorizer, tfidf, lsi=None)[source]¶ (Semantic) search in a document collection
Parameters: - vectorizer ({CountVectorizer, HashingVectorizer}) – the (fitted) vectorizer that was used extract tokens from the document collection
- tfidf ({TfidfTransformer, SmartTfidfTransfomer}) – the (fitted) IDF transformer used to weight and normalize the bag of word/n-gram features
- lsi (TruncatedSVD) – (optional) an LSI model fitted on the vectorised document-term matrix If provided this corresponds to a semantic search, default=None
-
fit
(X)[source]¶ Fit using a document term matrix (optionally in the LSI space)
Parameters: X (ndarray) – the sparse document-terms arrays (if lsi was not used) or dense documents / lsi terms array (if lsi was provided)