ner.teach - couple of questions

Hi,

I've trained a model with gold-std data and now I want to improve it by using ner.teach.
The problem is that prodigy seems to show score 1.0 for almost all it's suggestions and the vast majority of it's suggestions are actually correct. It's not showing me a whole lot of the entities that I actually need to improve the score for either.

Should I run ner.teach only with the labels that I want to improve?

Another thing is that the % wen'up to 90% really fast and now, at 95%, the model doesn't seem to improve anymore.

I'm assuming that's normal?

I've also noticed that the model is making mistakes in this recipe, that it isn't making with ner.correct. I could be wrong about this but maybe it's the tokenization? I do have a custom tokenizer but that should be part of the model I'm using.
Also the UI shows me that the lang is en when it should be de (it is in my train.cfg), but I don't see a way to overwrite that.
It's worth noting that I'm also setting --unsegmented. My model doesn't have a sentencizer and whatever the recipe was doing, it mas messing up my samples big time :slight_smile:

Any suggestions would be much appreciated.

Hi! One quick note about the support for binary ner.teach annotations in the current nightly: this is the one feature we're still actively working on, so it's currently expected that you may see worse results when training from your binary data compared to v1.10. We'll have a new nightly available that we'll release once spaCy v3.1 is out, which should make this more stable and possibly lead to even better results.

This is expected, because the ner.teach workflow will use the beam parse with all possible interpretations of the given example for its suggestions and by default, it uses uncertainty sampling to choose which suggestions to ask about. So you will see suggestions that aren't necessarily the model's most confident predictions (i.e. those that it currently chooses when you just run it over your text and which is what you see in a workflow like ner.correct that just shows you the final predictions).

One thing to keep in mind is that when you're training from only binary annotations, the model is also only evaluated on a held-back sample of those examples. So the score here reflects how well the model does on the binary evaluation examples. Depending on how many examples you have in total, this may or may not be very representative. If the score goes up, it certainly shows that the model has learned something – but ideally, you'd still want to be evaluating the resulting model properly in a separate step, and check how its predictions improve on your dedicated, gold-standard evaluation data. (This is typically best done as a separate step and directly with spaCy.)

I really appreciate your detailed answer, thank you!

One more question, though: Can I run ner.teach with only one label, despite there being multiple labels in most samples?
I'm a little worried that I might be making the others worse, just in case this isn't a use-case that was intended.
I's only specific labels that I want to improve this way and since they are underrepresented in my dataset, it takes a while to actually see any of those.

Hi!

That's a good question. If you only train on the one label, you might indeed cause what we call "catastrophic forgetting" where the model starts adjusting too much on the new data, and "forgets" what it learned before.

The prodigy train implementation for Prodigy 1.11 has a good way to remedy this, as you can provide multiple NER datasets to train on. This means that you can mix in annotations with all labels, with data obtained from ner.teach on just the one label. To obtain the data for all the labels, what you could do is run ner.correct and quickly glance whether it looks OK and hit accept if it does. You could even exclude the one label that is important to you for this part, so you don't have to judge it twice. If you're not too concerned about the other labels, this annotation should hopefully go rather quick, and it will give the model some nice examples for all other labels to mix in with your binary annotations.

Awesome, thank you!

Just released a new nightly that includes improvements around the ner.teach workflow. Could you try re-running your experiments and see how you go? :slightly_smiling_face: You'll now also be able to mix binary with manual annotations and ner.teach will ask about sentences with no entities in order to improve performance.

I've decided to go with a different scheme for annotating my data so I'm starting my dataset from scratch. If and when I get to the ner.teach phase, I'll let you know how it goes :slight_smile:

1 Like