Compute duplicatesΒΆ
The option use_hashing=False
must be set for the feature extraction. Recommended options also include, use_idf=1, sublinear_tf=0, binary=0
.
URL:
/api/v0/duplicate-detection/
Method:
POST
URL Params: NoneData Params:
dataset_id
: dataset idmethod
: str, default=’simhash’ Method used for duplicate detection. One of “simhash”, “i-match”
Success Response:
HTTP 200
{"id": <str>}