freediscovery.lsi.LSI¶

class freediscovery.lsi.LSI(cache_dir='/tmp/', dsid=None, mid=None, verbose=False)[source]¶

Document categorization using Latent Semantic Indexing (LSI)

Parameters:	cache_dir (str) – folder where the model will be saved dsid (str) – dataset id mid (str) – LSI model id (the dataset id will be inferred) verbose (bool, optional) – print progress messages

__init__(cache_dir='/tmp/', dsid=None, mid=None, verbose=False)[source]¶

Methods

`__init__`([cache_dir, dsid, mid, verbose])
`delete`()	Delete a trained model
`get_dsid`(cache_dir, mid)
`get_params`()	Get model parameters
`get_path`(mid)
`list_models`()
`load`(mid)	Load the computed features from cache specified by mid
`predict`(index, y[, accumulate, chunk_size])	Predict the document relevance using a previously trained LSI model
`transform`(n_components[, n_iter])	Perform the SVD decomposition

load(mid)[source]¶: Load the computed features from cache specified by mid

predict(index, y, accumulate='nearest-max', chunk_size=100)[source]¶

Predict the document relevance using a previously trained LSI model

Parameters:

index (array-like, shape (n_samples)) – document indices of the training set
y (array-like, shape (n_samples)) – target binary class relative to index
accumulate (str, optional, default='nearest-max') – if accumulate==”nearest-max” the cosine distance to the closest relevant/non relevant document is used as classification score, otherwise if accumulate==”centroid-max” the centroid of relevant documents is used as the query vector.

transform(n_components, n_iter=5)[source]¶

Perform the SVD decomposition

Parameters:

n_components (int) – number of selected singular values (number of LSI dimensions)
n_iter (int) – number of iterations for the stochastic SVD algorithm

Returns: