freediscovery.io.parse_smart_tokens¶
-
freediscovery.io.
parse_smart_tokens
(text)[source]¶ Parse a dataset stored in the SMART tokenized format, used in particular for the RCV1-v2 dataset, http://www.jmlr.org/papers/volume5/lewis04a/lyrl2004_rcv1v2_README.htm (cf. Appendix B.12.i.)
Parameters: text (str) – the full text of the dataset Returns: result – the parsed dataset in a OrderedDict, with document_ids as keys, and a string of tokens as values Return type: dict