Prodigy shows finished all sentences although there are some left

Sowmya · June 15, 2018, 5:01pm

I am annotating a few sentences using Prodigy. There are about 500 sentences in the database. However, Prodigy pops up a message indicating all sentences are done after every few annotations. I cancel the last annotation and redo it, and then it starts showing other annotations in the UI. Is this a bug?

ines · June 16, 2018, 1:16pm

Thanks for the report. Which recipe are you using, and how are you loading in your texts? And did you customise the batch size?

It sounds like for some reason, your queue runs out of examples and the new examples fetched in the background aren’t enough to fill it up in time. One thing you could try is to run the command with PRODIGY_LOGGING=basic. This will output log statements for everything that’s going on behind the scenes, including API requests and the number of tasks that are sent back and forth.

Sowmya · June 19, 2018, 8:08pm

I think I understood why I saw this issue. I changed my tagset from IOB to just Entities vs Others i.e., instead of B-PER, I-PER, B-ORG, I-ORG, O etc, I made it PER, ORG, O etc [As I did not see much difference in results]. I don’t fully understand how this caused that problem, but it vanished when I switched back to IOB notation. So, now, I just added a I- to all tags except O, and it does not show that message anymore.

ines · June 20, 2018, 8:29am

Oh, so your label set included the full IOB tags? Do you have an example of the code you ran? I’m curious to see how this might have impacted the stream and how we could possibly prevent that (or show a better error or warning).

In general, Prodigy will handle the IOB / BILUO mapping for you, including the O label. So if you label a span PER, the included tokens will receive the respective BILUO tags when you train the model. The ner.batch-train recipe also lets you set the --no-missing flag, to explicitly tell Prodigy how to handle untagged tokens. If you set the flag, the annotations are assumed to be gold standard and all unlabelled tokens will be assigned O and treated as not part of an entity. Otherwise, unlabelled tokens will be considered unknown, which obviously has a different effect on the model.

This lets you train from both gold-standard annotations, as well as sparse annotations created using the binary active learning-powered annotation modes. There’s also an ner.gold-to-spacy recipe that lets you convert a Prodigy dataset to spaCy’s training format, with an option to export BILUO tags.

Sowmya · June 20, 2018, 6:43pm

I did not write any code for this part, just used prodigy’s ner.iob-to-gold recipe. I am unable to attach iob files here, but let me show example:

I converted both these using ner.iob-to-gold recipe.
Output for File 1:
{"_input_hash": -376269529, “_task_hash”: 1927151313, “no_missing”: true, “spans”: [{“end”: 14, “label”: “LOC”, “start”: 9, “text”: “JAPAN”}, {“end”: 36, “label”: “PER”, “start”: 31, “text”: “CHINA”}], “text”: “SOCCER - JAPAN GET LUCKY WIN , CHINA IN SURPRISE DEFEAT . “}
{”_input_hash”: -1517997846, “_task_hash”: 806001464, “no_missing”: true, “spans”: [{“end”: 11, “label”: “PER”, “start”: 0, “text”: “Nadim Ladki”}], “text”: “Nadim Ladki “}
{”_input_hash”: -701145227, “_task_hash”: 533024259, “no_missing”: true, “spans”: [{“end”: 6, “label”: “LOC”, “start”: 0, “text”: “AL-AIN”}, {“end”: 29, “label”: “LOC”, “start”: 9, “text”: “United Arab Emirates”}], “text”: "AL-AIN , United Arab Emirates 1996-12-06 "}

Output for File 2:
{"_input_hash": -376269529, “_task_hash”: 367841107, “no_missing”: true, “spans”: [{“end”: 14, “label”: “C”, “start”: 9, “text”: “JAPAN”}, {“end”: 36, “label”: “R”, “start”: 31, “text”: “CHINA”}], “text”: “SOCCER - JAPAN GET LUCKY WIN , CHINA IN SURPRISE DEFEAT . “}
{”_input_hash”: -1517997846, “_task_hash”: -2079071972, “no_missing”: true, “spans”: [{“end”: 11, “label”: “R”, “start”: 0, “text”: “Nadim Ladki”}], “text”: “Nadim Ladki “}
{”_input_hash”: -701145227, “_task_hash”: 1798522814, “no_missing”: true, “spans”: [{“end”: 6, “label”: “C”, “start”: 0, “text”: “AL-AIN”}, {“end”: 29, “label”: “C”, “start”: 9, “text”: “United Arab Emirates”}], “text”: "AL-AIN , United Arab Emirates 1996-12-06 "}

In File 2, you notice that the tags are “C” for LOC, “R” for PER etc. With this kind of file, I got the above mentioned error.

Topic		Replies	Views
Missing first N annotations when using ner.manual recipe usage , ner , solved	6	1070	May 15, 2019
Prodigy looping over again on the same texts to annotate usage , done , solved , streams	14	1160	February 2, 2023
Prodigy showing repeated sentences to annotator when feed_overlap: true	1	85	June 10, 2024
Annotation tasks finish even when more samples are in the jsonl dataset usage , solved , streams	5	445	April 8, 2022
Edit saved annotations ner , solved	4	1372	March 2, 2018

Prodigy shows finished all sentences although there are some left

Related topics