freediscovery.threading.EmailThreading

class freediscovery.threading.EmailThreading(cache_dir=u'/tmp/', dsid=None, mid=None, decode_header=False)[source]

JWZ Email threading class

Parameters:
  • cache_dir (str) – folder where the model will be saved
  • dsid (str, optional) – dataset id
  • mid (str, optional) – model id
__init__(cache_dir=u'/tmp/', dsid=None, mid=None, decode_header=False)[source]

Methods

__init__([cache_dir, dsid, mid, decode_header])
delete() Delete a trained model
get_dsid(cache_dir, mid)
get_params() Get model parameters
get_path(mid)
list_models()
thread([index, group_by_subject, ...]) Thread the emails
delete()[source]

Delete a trained model

get_params()[source]

Get model parameters

thread(index=None, group_by_subject=False, sort_by_key=u'message_idx', sort_missing=-1)[source]

Thread the emails

Parameters:index (array-like, shape (n_samples)) – document indices of the training set
Returns:
  • cmod (sklearn.BaseEstimator) – the scikit learn classifier object
  • Y_train (array-like, shape (n_samples)) – training predictions
  • group_by_subject (boolean, default=True) – group emails by subject
Returns:
  • tree (array (N_samples)) – the id of the parent element in the tree
  • root_idx (array (N_samples)) – the id of the root element in the tree