Prodigy with spacy-llm ner.llm.correct - not showing the text to be annotated on the UI

Hello all,
Am back having some issues that I couldn't sort out using Prodigy. I am annotating a very large set of text for NER with 50 long and complex labels. I run this command:
dotenv run -- python3 -m prodigy ner.llm.correct annotated-xxx config.cfg examples.jsonl
It loaded the labels and prompted me to use the prodigy UI. On the UI I could see the labels, "Show prompt sent to the LLM" and Show responses from the LLM" and the Accept, Reject buttons. I don't see the text to be annotated - so that I can manually manipulate the annotation when I found a mistake. Looks like I'm missing something here. With very small dataset it works fine as expected, but with the large dataset like mine it couldn't show the text annotated on the UI. I appreciate any help on this.

Thank you

Hi @Fantahun ,

We haven't seen that issue before. Could you share how many examples does your dataset have so that I can try to reproduce the issue.
Thanks!

I had the same problem for any model, but it works perfectly fine with OpenAI's model.

Thanks for your reply @magdaaniol. I'm using 262 texts (lines of text) in my examples.jsonl, fifty labels - most of which are multi-word and my ner_example.yaml is almost empty. I'm planning to use a couple tens of thousands of text in examples.jsonl. BTW I'm using OpenAI's GPT-4 LLM in the background. If possible/and required I can share screenshots as well.
Thank you again.

Hi @Fantahun

Thanks for the extra info. Are you sure the only factor that changes how things are rendered is the length of the input file? That in principle shouldn't be the case as the input is processed in batches.

Could you try the same input and the number of labels with the regular ner.manual and see if you experience the same issue? (Just to exclude the potential issues with the input being corrupted/empty due to the LLM API)
I'm suspecting the labels might be covering up most of the UI, but you still should be able scroll to see all the elements. It might be helpful if you share the screenshot of what you're seeing - thanks!
Also, which Prodigy version are you on? in v1.13.2 we introduced a dedicated front end component for handling LLMs so that would help me to figure out how is your UI being rendered.

I'm using Prodigy v1.14.12

I need help with this please!!

I'm talking about a bug in the software I purchased. The company is responsible to fix this. I appreciate any help. Thank you.

Hi @Fantahun,

In order to be able to help you, we really need you to try to answer the questions I asked before (the Prodigy version was only one of the questions). Otherwise, it would be much harder and would take much more time to reproduce your problem and understand if it is a bug.
If you are not able to engage in this process, the only thing we can offer is a refund.

We try our best to answer questions as quickly and detailed as possible but we're a small team and we're not able to get to everyone's questions immediately, especially not on a weekend. You send your follow-ups within less than 24 hours and on a Saturday. This isn't very helpful and makes it a lot harder for us to answer everyone's questions on the forum.

Sorry for looking a little harsh or my request and am not looking for a refund as well. I'm feeling there is a bug with the tool or maybe the way I'm using it. My back to back question is just by chance. I'm one of early advocates of spacy and want to test prodgy spacy-llm combo to its limits. I hope this will benefit the project.
Getting back to the question, I don't think my dataset is suited for ner.manual. I guess I've to dig a little deeper to see where the problem is really originated from. Thanks

Hello again, and sorry for bother you.
I think I identified the problem I guess it's a bug with prodigy UI. I reduced the number of labels to 12 and tried it and it worked fine as expected - showing the labels, text and other parts. Please see the screenshot. Previously I used 50 labels - Maybe the UI is not able to accommodate that with the text to be annotated. I've checked scrolling if it's hidden, but that didn't work either.
Thank you

Hi @Fantahun,

The truth is that the UI was not really designed to handle this many labels. But there's a reason to it as it is, likely, not the best idea to try to annotate this many labels at the same time.
This would be really taxing for the annotators, as they need to think about big data model with every annotation task and not the easiest task for the model ether.
This thread by @ines explains very well why you might consider splitting your annotation in steps.

Additionally, this high number of labels is even less recommended in the context of LLM annotation.
It makes the prompt much bigger and more difficult for the model. it also slows down the interference time. The official prompt engineering guide by OpenAI explicitly instructs to split complex tasks into simpler ones and have one task per prompt.

Finally, if you do need to show the high number of labels in the UI, you could make the label area scrollable with custom css via global_css setting in .prodigy.json:

#prodigy.json
 "global_css": " .prodigy-labels { max-height: 150px; overflow-y: auto;} .prodigy-container { max-width: 950px; }",

I'd like reiterate that it would not be our recommended way of dealing with a high number of labels.

If you're interested in some more NER annotation good practice tips, this thread has plenty of relevant references on the topic of dealing with a high number of labels.

Great. Thank you very much @magdaaniol for your suggestions. I will try to simplify my annotation tasks to a manageable level. As always, keep up the good work at Explosion.

1 Like