Multilabel JSONL format & active learning

hi @suryaiitkgp !

My colleague @Jette16 reminded me there's an alternative data format for multilabel. This includes each tag as the "options" and then whichever are the accepted (selected) labels are in the "accept"

{"options": [{"id": "OTHER"}, {"id": "baking"}, {"id":"bread"}, {"id":"chicken"}, {"id":"eggs"}, {"id":"equipment"}, {"id": "food-safety"},{"id":"meat"}, {"id":"sauce"}, {"id":"storage-method"}, {"id":"substitutions"}], "accept":["OTHER","baking"], "text": "How can I get chewy chocolate chip cookies?\n<p>My chocolate chips cookies are always too crisp. How can I get chewy cookies, like those of Starbucks?</p>\n<hr/>\n<p>Thank you to everyone who has answered. So far the tip that had the biggest impact was to chill and rest the dough, however I also increased the brown sugar ratio and increased a bit the butter. Also adding maple syrup helped. </p>\n"}

Perhaps to test out, can you try to use the example above as test.jsonl and run:

prodigy db-in import_data ./test.jsonl --rehash
prodigy train ./model --textcat-multilabel import_data --eval-split 0.2 --base-model blank:en

This will give us a reproducible use case to ensure there aren't other issues. If this works, then try to convert your data to this format.