freediscovery.parsers.EmailParser¶

class freediscovery.parsers.EmailParser(cache_dir=u'/tmp/', dsid=None, verbose=False)[source]¶

Parse emails

Parameters:	cache_dir (str, default='/tmp/') – directory where to save temporary and regression files dsid (str) – load an exising dataset verbose (bool) – pring progress messages

Methods

`__init__`([cache_dir, dsid, verbose])
`delete`()	Delete the current dataset
`get_params`()	Get the vectorizer parameters
`list_datasets`()	List all datasets in the working directory
`load`(dsid)	Load a computed features from disk
`search`(filenames)	Return the document ids that correspond to the provided filenames, without preserving order.
`transform`(data_dir[, file_pattern, ...])	Parse all emails in data_dir

Attributes

n_samples_ Number of documents in the dataset

search(filenames)[source]¶

Return the document ids that correspond to the provided filenames, without preserving order.

Parameters:	filenames (list[str]) – list of filenames (relatives to the data_dir)
Returns:	indices – corresponding list of document id (order is not preserved)
Return type:	array[int]

transform(data_dir, file_pattern=u'.*', dir_pattern=u'.*', encoding=u'latin-1')[source]¶: Parse all emails in data_dir