freediscovery.parsers.EmailParser

class freediscovery.parsers.EmailParser(cache_dir=u'/tmp/', dsid=None, verbose=False)[source]

Parse emails

Parameters:
  • cache_dir (str, default='/tmp/') – directory where to save temporary and regression files
  • dsid (str) – load an exising dataset
  • verbose (bool) – pring progress messages
__init__(cache_dir=u'/tmp/', dsid=None, verbose=False)[source]

Methods

__init__([cache_dir, dsid, verbose])
delete() Delete the current dataset
get_params() Get the vectorizer parameters
list_datasets() List all datasets in the working directory
load(dsid) Load a computed features from disk
search(filenames) Return the document ids that correspond to the provided filenames, without preserving order.
transform(data_dir[, file_pattern, ...]) Parse all emails in data_dir

Attributes

n_samples_ Number of documents in the dataset
delete()[source]

Delete the current dataset

get_params()[source]

Get the vectorizer parameters

list_datasets()[source]

List all datasets in the working directory

load(dsid)[source]

Load a computed features from disk

n_samples_

Number of documents in the dataset

search(filenames)[source]

Return the document ids that correspond to the provided filenames, without preserving order.

Parameters:filenames (list[str]) – list of filenames (relatives to the data_dir)
Returns:indices – corresponding list of document id (order is not preserved)
Return type:array[int]
transform(data_dir, file_pattern=u'.*', dir_pattern=u'.*', encoding=u'latin-1')[source]

Parse all emails in data_dir