hi @suryaiitkgp !
My colleague @Jette16 reminded me there's an alternative data format for multilabel. This includes each tag as the "options"
and then whichever are the accepted (selected) labels are in the "accept"
{"options": [{"id": "OTHER"}, {"id": "baking"}, {"id":"bread"}, {"id":"chicken"}, {"id":"eggs"}, {"id":"equipment"}, {"id": "food-safety"},{"id":"meat"}, {"id":"sauce"}, {"id":"storage-method"}, {"id":"substitutions"}], "accept":["OTHER","baking"], "text": "How can I get chewy chocolate chip cookies?\n<p>My chocolate chips cookies are always too crisp. How can I get chewy cookies, like those of Starbucks?</p>\n<hr/>\n<p>Thank you to everyone who has answered. So far the tip that had the biggest impact was to chill and rest the dough, however I also increased the brown sugar ratio and increased a bit the butter. Also adding maple syrup helped. </p>\n"}
Perhaps to test out, can you try to use the example above as test.jsonl
and run:
prodigy db-in import_data ./test.jsonl --rehash
prodigy train ./model --textcat-multilabel import_data --eval-split 0.2 --base-model blank:en
This will give us a reproducible use case to ensure there aren't other issues. If this works, then try to convert your data to this format.