Which number of training labels should I trust

hi @nvasil!

Have you seen this related post?

I suspect you either have duplicates or you have merged entity spans of annotations on the same data. In the second case, if you’ve accepted/rejected several entities on the same text, those will be combined into one example.

Be sure to use logging PRODIGY_LOGGING=basic that should show the dedup step explicitly.

The final one is what you should go with (if you're comfortable with how Prodigy's is defaulting its behavior by deduping/merging entities, etc.).

Let me know if this helps!

1 Like