F1-score doesn't improve for larger annotation sets

koaning · April 13, 2023, 9:54am

Hi there!

Just to make sure we're talking about the same thing. Are you annotating companies and persons (two entities?) and do you have 5 to 1000 unique examples of each?

Is there a reason why you're stopping early?

What are you judging this on? Do you have a set validation set or does the validation set also change as you increase?

It's hard to say for sure, but it could be that by increasing the number of annotations you're also increasing the diversity of the ML task. Maybe the first few entities are much easier to detect? Do you have examples of situations where the model gets it correct and where it gets it wrong?

It's a phenomenon that I've stumbled apon a few times. This PyData talk gives one such example related to detecting programming languages in text.

A final thing that comes to mind, have you annotated this data yourself manually or with a group? Could it be that there are label errors or annotators that disagree?

Let me know!

Topic		Replies	Views
More annotations worsen the F-score? usage , ner , best-practices	6	742	January 27, 2021
We are not able to get more-or-less similar F-score when upgrading prodigy and spacy ner , spacy , transformers , training	0	596	March 28, 2022
Improve trained models with annotations usage , ner , training	3	557	September 20, 2021
Annotation score drops	21	703	May 18, 2023
Training few new entities: Result very low usage , ner , spacy	3	85	January 29, 2025

F1-score doesn't improve for larger annotation sets

Related topics