freediscovery.lsi.LSI¶
-
class
freediscovery.lsi.
LSI
(cache_dir='/tmp/', dsid=None, mid=None, verbose=False)[source]¶ Document categorization using Latent Semantic Indexing (LSI)
Parameters: - cache_dir (str) – folder where the model will be saved
- dsid (str) – dataset id
- mid (str) – LSI model id (the dataset id will be inferred)
- verbose (bool, optional) – print progress messages
Methods
__init__
([cache_dir, dsid, mid, verbose])delete
()Delete a trained model get_dsid
(cache_dir, mid)get_params
()Get model parameters get_path
(mid)list_models
()load
(mid)Load the computed features from cache specified by mid predict
(index, y[, accumulate, chunk_size])Predict the document relevance using a previously trained LSI model transform
(n_components[, n_iter])Perform the SVD decomposition -
predict
(index, y, accumulate='nearest-max', chunk_size=100)[source]¶ Predict the document relevance using a previously trained LSI model
Parameters: - index (array-like, shape (n_samples)) – document indices of the training set
- y (array-like, shape (n_samples)) – target binary class relative to index
- accumulate (str, optional, default='nearest-max') – if accumulate==”nearest-max” the cosine distance to the closest relevant/non relevant document is used as classification score, otherwise if accumulate==”centroid-max” the centroid of relevant documents is used as the query vector.
-
transform
(n_components, n_iter=5)[source]¶ Perform the SVD decomposition
Parameters: - n_components (int) – number of selected singular values (number of LSI dimensions)
- n_iter (int) – number of iterations for the stochastic SVD algorithm
Returns: - mid (str) – model id
- lsi (BaseEstimator) – the TruncatedSVD object
- exp_var (float) – the explained variance of the SVD decomposition