Release history¶
Version 1.3.1¶
May 22, 2018
Version 1.3.0¶
Oct 1, 2017
New features¶
- Additional TF-IDF weighting schemes and pivoted normalization (#164)
- Exposed the wrapper functions to visualize Birch hierarchical trees in the Python package (#175)
- Better separation between the FD engine (REST API) and the FD Python package.
- Support for both Python 2.7 and 3.5+ for the Python package. The FD engine remains Python 3.5+ only.
Enhancements¶
- Improved documentation and examples
- Added compatibility with scikit-learn 0.19.0 (#169) which fixed several issues found in 0.18.1.
API Changes¶
- In
POST /api/v0/feature-extraction
parametersbinary
,use_idf
andsublinear_tf
are replaced by a single parametersweighting
that defines term document weighting and normalization using the SMART notation (#164)
Version 1.0¶
May 2, 2017
New features¶
- Ability to add / remove documents in an existing processed dataset using
/api/v0/feature-extraction/{dsid}/append
and/api/v0/feature-extraction/{dsid}/delete
URL endpoints - Pagination in search and document categorization with the
batch_id
andbatch_size
parameters.
Enhancements¶
- Better handling of data persistence, which leads to faster response time for all URL endpoints, and in particular semantic search and categorization. This breaks backward compatibility for the internal data format: datasets need to re-processed and models re-trained.
- Additional tests for categorization and semantic search
API Changes¶
- The
nn_metric
parameter was renamed tometric
; a new metriccosine-positive
was added - Breaking change: by default, the
cosine
similarity score is used. - The
/email-parser/*
endpoints are removed and merged into the/feature-extraction/
endpoint, thus unifying data ingestion.
Version 0.9¶
Jan 28, 2017
New features¶
Enhancements¶
- Categorization and semantic search support sorting and filtering of documents below a user provided threashold. (PR #96)
- Categorization returns only
max_result_categories
categories with the highest score. - The similarity and ML scores can now be scaled to [0, 1] range using
nn_metric
andml_output
input parameters (PR #101). - The REST API documentation is generated automatically from the code (using an OpenAPI specification) which allows to enforce consistency between the code and the docs (PR #85)
- Adapted clustering and duplicate detection API to return structured objects indexed by
document_id
( and optionallyrendering_id
) - Improved tests coverage and overall simplified the API
API Changes¶
- The following endpoints accepting a request body are modified from
GET
toPOST
method (PR #94), in accordance with the HTTP/1.1 spec, section 4.3,/api/v0/metrics/categorization
/api/v0/metrics/clustering
/api/v0/metrics/duplicate-detection
/api/v0/feature-extraction/{dsid}/id-mapping/flat
/api/v0/feature-extraction/{dsid}/id-mapping/nested
/api/v0/email-parser/{dsid}/index
- Significant changes in the categorization REST API to accommodate for multi-class cases
- The endpoint
/api/v0/feature-extraction/{dsid}/id-mapping/flat
is removed, while/api/v0/feature-extraction/{dsid}/id-mapping/nested
is renamed to/api/v0/feature-extraction/{dsid}/id-mapping
. - Removed the
/categorization/<mid>/test
which is superseded by/metrics/categorization
. - The
internal_id
is no longer exposed in the public API
Version 0.8¶
Feb. 25, 2017
New features¶
Enhancements¶
API Changes¶
- Ability to associate external
document_id
,rendition_id
fields when ingesting documents; document categorization can now be used with these external ids. - All the wrappers classes are made private in the Python API
- The same categorization and clustering enpoints can operate ether in the document-term space or in the LSI space (PR #57)