I'm using Prodigy 1.9.8. I've updated the textcat to only return the data with score higher than 0.7. I used it to teach and exclude some dataset ID. I've found data which should be excluded data were still displayed during the teach. Is it a bug in Prodigy or something is missing in my custom sorter?
I used the following command to teach -
prodigy textcat.teach --label lab db_model_2 db2 /tmp/trained_model/ ~/new_dataset.jsonl -e db_pattern,db1
The db_pattern dataset was created by the new match recipe. I compared the output of this teach db_model2 and db_pattern. The same text has the same input_hash, but the task_hash is different. I wonder whether that's the problem. Here is the custom_sort method in the textcat -
def custom_sorter(scored_examples): for score, example in scored_examples: # your own logic here to decide whether to send out an # example for annotation if score > 0.70: yield example