Annotating correctly using the ner.correct recipe

NNN · January 17, 2022, 5:58pm

Hi,

I'm wondering how I should annotate correctly when using the ner.correct recipe. We're using the pre-trained en_core_web_trf model with some known NER labels (such as PERSON, PRODUCT, ORG) while also adding a few custom entities of our own (e.g., ADDRESS).

Prodigy's guidelines suggest you should reject partial NER classifications and be strict with it. So, if my sentence was "The new iPhone X is expensive", but only "iPhone" was marked by the model as PRODUCT, I should be strict and hit reject. I'm wondering, is it also possible to simply change the marking in the UI such that it includes "X" inside and then hit accept? Would it be the same?

How about sentences that mislabel some span with a wrong entity label. For example, suppose "Siri" was mislabeled as PERSON instead of PRODUCT? Should I reject or remove the PERSON label and mark it as PRODUCT in the UI and then accept? What would be the difference?

Additionally, how should I treat sentences with more than one named entity where some are correct and others are not? Should I accept/reject or change it myself in the UI?

Many thanks!!
Nadav

ljvmiranda921 · January 19, 2022, 1:00am

Hi @NNN

For all cases you've mentioned, it is advisable to correct the mistake first and hit ACCEPT. The only time we should hit REJECT is when we cannot verify if the entities are correct or not (maybe the tokenization is weird, maybe the data is corrupted, etc.).

After correcting your samples, you can train a model using prodigy train.

If you hit REJECT on a sample that can still be corrected, then you are losing valuable data for model training.

NNN · January 20, 2022, 9:33am

Great, many thanks!!

NNN · January 20, 2022, 10:56am

So, in the following example - should I reject or rather remove the label and hit accept?
I tend towards the latter.

Thanks

ines · January 20, 2022, 11:00am

Yes, that's correct – in this case, you would remove the label ORG and accept the example. The annotation you're creating here will then tell later tell the model during training that this example contains no entities, which is what you want

(Unless this example contains broken markup or is an example that you don't want to include because it's not representative. You can then hit reject or ignore.)

NNN · January 20, 2022, 11:29am

Thanks so much for the super quick reply!

Topic		Replies	Views
Prodigy Annoation: Best Practise usage , ner , solved	3	404	February 18, 2022
ner.manual: Accept/Reject confusion usage , ner , solved	2	801	February 22, 2021
Help with messy data usage , ner	8	666	January 20, 2019
Fixing NER Spans usage , ner , solved	4	660	May 7, 2018
I'd like to extend the existing NER model usage , ner , solved	3	595	September 25, 2020

Annotating correctly using the ner.correct recipe

Related topics