Incredibly poor training results


We are running Prodigy v1.6.1 and have done proof-of-concept type example which is relatively simple - from T-shirt description, identify material.

Trained on about 200 examples, did ner.batch-train, even got following results:

However, when launching Prodigy and trying to continue, absurd entities are still marked (with rather huge score as well)

and so on - question marks, the, even spaces are counted as correct entity

And in general, how our training pipeline looks:

We were following this guide:

Yes, this definitely looks suspicious! Could you show some examples of the data you collected? And how many instances of the MATERIAL terms from your patterns were in your data? Did they come up a lot? 200 examples is still a pretty small set, so it’s possible that you simply didn’t use enough data.

One quick note on this: When you tried out the model in Prodigy, did you use ner.teach? Because what you see here can potentially be very misleading: Prodigy will get all possible analyses for the sentence and present you the examples the model is most uncertain about, i.e. the ones with predictions closest to 0.5. So those aren’t necessarily the entities with the highest scores or the most “correct” ones.

If you want to see how the model performs “in real life”, it’d make more sense to load it with spaCy, process a bunch of (unseen) text and look at the MATERIAL entities in doc.ents. Those are the ones that the model will actually predict.

Thank you for quick response :slight_smile:

Ok, so I’ve trained quite a bit more examples:

Did training:

And when doing actual print stream (I’ve done it with Spacy as well, results seems to be similar)

It clearly catches trivial cases, however way too much of false-positives

The latest screenshot of training you’ve posted still shows a very small dataset. Is that the correct run? Because it shows only 206 training examples, and 14 examples used for evaluation. That’s probably just not enough data to train with.

Another problem is that it looks like you’re training on top of a model that already gets 13/14 correct. It’s better to start with a blank model each time you run batch train. Finally, try setting the batch size lower. When you have very little data, you usually want a low batch size.

Thank you for insights.

Yes, for some reason, some of the annotated data is ignored, thus number for training was low.

Finally got decent performance just by using en_core_web_lg instead blank model. Can’t even remember exact reason why it was used in the first place.

I think the reason the number is lower is that a) you might have ignored examples and b) before training, Prodigy will combine all annotations on the same sentence into one training example. So if you’re using ner.teach and you accept / reject multiple entities on the same text, all of these examples become one training example later on.

I think it might be a good idea to include a message about this in the output of the training recipes. Even just something like “Merging X examples…” or “Merged X annotations into X training examples”.