Release history¶
Version 1.3.1¶
May 22, 2018
Version 1.3.0¶
Oct 1, 2017
New features¶
- Additional TF-IDF weighting schemes and pivoted normalization (#164)
- Exposed the wrapper functions to visualize Birch hierarchical trees in the Python package (#175)
- Better separation between the FD engine (REST API) and the FD Python package.
- Support for both Python 2.7 and 3.5+ for the Python package. The FD engine remains Python 3.5+ only.
Enhancements¶
- Improved documentation and examples
- Added compatibility with scikit-learn 0.19.0 (#169) which fixed several issues found in 0.18.1.
API Changes¶
- In
POST /api/v0/feature-extractionparametersbinary,use_idfandsublinear_tfare replaced by a single parametersweightingthat defines term document weighting and normalization using the SMART notation (#164)
Version 1.0¶
May 2, 2017
New features¶
- Ability to add / remove documents in an existing processed dataset using
/api/v0/feature-extraction/{dsid}/appendand/api/v0/feature-extraction/{dsid}/deleteURL endpoints - Pagination in search and document categorization with the
batch_idandbatch_sizeparameters.
Enhancements¶
- Better handling of data persistence, which leads to faster response time for all URL endpoints, and in particular semantic search and categorization. This breaks backward compatibility for the internal data format: datasets need to re-processed and models re-trained.
- Additional tests for categorization and semantic search
API Changes¶
- The
nn_metricparameter was renamed tometric; a new metriccosine-positivewas added - Breaking change: by default, the
cosinesimilarity score is used. - The
/email-parser/*endpoints are removed and merged into the/feature-extraction/endpoint, thus unifying data ingestion.
Version 0.9¶
Jan 28, 2017
New features¶
Enhancements¶
- Categorization and semantic search support sorting and filtering of documents below a user provided threashold. (PR #96)
- Categorization returns only
max_result_categoriescategories with the highest score. - The similarity and ML scores can now be scaled to [0, 1] range using
nn_metricandml_outputinput parameters (PR #101). - The REST API documentation is generated automatically from the code (using an OpenAPI specification) which allows to enforce consistency between the code and the docs (PR #85)
- Adapted clustering and duplicate detection API to return structured objects indexed by
document_id( and optionallyrendering_id) - Improved tests coverage and overall simplified the API
API Changes¶
- The following endpoints accepting a request body are modified from
GETtoPOSTmethod (PR #94), in accordance with the HTTP/1.1 spec, section 4.3,/api/v0/metrics/categorization/api/v0/metrics/clustering/api/v0/metrics/duplicate-detection/api/v0/feature-extraction/{dsid}/id-mapping/flat/api/v0/feature-extraction/{dsid}/id-mapping/nested/api/v0/email-parser/{dsid}/index
- Significant changes in the categorization REST API to accommodate for multi-class cases
- The endpoint
/api/v0/feature-extraction/{dsid}/id-mapping/flatis removed, while/api/v0/feature-extraction/{dsid}/id-mapping/nestedis renamed to/api/v0/feature-extraction/{dsid}/id-mapping. - Removed the
/categorization/<mid>/testwhich is superseded by/metrics/categorization. - The
internal_idis no longer exposed in the public API
Version 0.8¶
Feb. 25, 2017
New features¶
Enhancements¶
API Changes¶
- Ability to associate external
document_id,rendition_idfields when ingesting documents; document categorization can now be used with these external ids. - All the wrappers classes are made private in the Python API
- The same categorization and clustering enpoints can operate ether in the document-term space or in the LSI space (PR #57)