textcat-multilabel annotations format

Gio · January 17, 2024, 7:41pm

Hi there, new to Prodigy but very excited to get training models to see how predictive my current labels are and try some active learning from there!

Could anyone help me with the correct format of the annotations needed for textcat-multilabel?
I have several labels, of which multiple could be assigned to a text at once.
I am currently trying to read in the JSONL which contains my existing annotations (not made in Prodigy), where each dict has the following format:
{"text": " I can't complain, you've got to take the rough with the smooth.", "cats": {"30": false, "12": false, "24": false, "19": true, "25": false, "23": false, "11": false, "32": false, "36": false, "13": false, "33": false, "15": false, "28": false, "14": false, "20": false, "17": false, "16": false, "27": false, "37": true, "38": false, "21": false, "10": false, "31": false, "29": false, "22": false}}

prodigy db-in reads these into my database but says :

✔ Created unstructured dataset 'verbal' in database SQLite
✔ Imported 984 annotated examples and saved them to 'verbal' (session
2024-01-17_20-32-34) in database SQLite
Found and keeping existing "answer" in 0 examples

Then, indeed, when I try to train I get an error:
TypeError: [E930] Received invalid get_examples callback in MultiLabel_TextCategorizer.initialize. Expected function that returns an iterable of Example objects but got: []
Any help on how to adjust my JSONL to the correct format would be greatly appreciated. Thanks!

ryanwesslen · January 19, 2024, 1:50pm

hi @Gio,

Thanks for your question and welcome to the Prodigy community

Check out this thread or this thread.

Hope this helps!

Gio · January 26, 2024, 2:15pm

Very helpful, thank you very much!

The following format worked a charm:
{"options": [{"id": "OTHER"}, {"id": "baking"}, {"id":"bread"}, {"id":"chicken"}, {"id":"eggs"}, {"id":"equipment"}, {"id": "food-safety"},{"id":"meat"}, {"id":"sauce"}, {"id":"storage-method"}, {"id":"substitutions"}], "accept":["OTHER","baking"], "text": "How can I get chewy chocolate chip cookies?\n<p>My chocolate chips cookies are always too crisp. How can I get chewy cookies, like those of Starbucks?</p>\n<hr/>\n<p>Thank you to everyone who has answered. So far the tip that had the biggest impact was to chill and rest the dough, however I also increased the brown sugar ratio and increased a bit the butter. Also adding maple syrup helped. </p>\n"}

Topic		Replies	Views
Correcting textcat.manual textcat	6	410	November 8, 2022
textcat.manual seems to be exclusive by default usage , textcat , solved	2	508	March 26, 2020
Yes/no annotations with textcat.manual usage , textcat , solved	3	692	December 21, 2020
What is the input format for annotated multi-label text classification data Getting Started textcat , solved	2	768	July 10, 2020
textcat_multilabel with only some labels annotated for some examples	5	377	June 14, 2022

textcat-multilabel annotations format

Related topics