Hard limit on consecutive tokens in NER annotations

kabir · April 23, 2020, 4:52pm

I'm working with clients on getting them upskilled in annotation. They actually do annotation work on a custom built system and I'm getting them setup with Prodigy to get better quality annotations moving forward.

One main issue they have is over-annotating (e.g. annotating 10 words in a row to capture the full context where the core thing they are annotating is only 3 words long)

Would it be possible to add strict token length constraint to annotations? Would love to hear your thoughts on this.

ines · April 23, 2020, 5:21pm

Ha, this is probably the best-timed enhancement proposal because I just implemented something for this today So v1.10 should have you covered here, and you'll be able to provide a validate_answer callback that's called on every answer when the annotator hits "accept" or "reject" and lets you raise custom errors that are then shown as alerts in the UI.

So you could do something like this:

def validate_answer(eg):
    for span in eg.get("spans", []):
        span_len = span["token_end"] - span["token_start"] + 1
        span_text = eg["text"][span["start"]:span["end"]]
        if span_len >= 10:
            raise ValueError(f"Selected span longer than 10 tokens: {span_text}")

You can either raise an error or use asserts with a message. The user will see the verbatim text of the error message, so you can use that to provide more info or explanation if needed.

kabir · April 23, 2020, 6:31pm

That's incredibly great to hear! And a super powerful generic feature. Thanks!! Always impressed

ines · June 17, 2020, 4:58pm

Just released Prodigy v1.10, which includes the validate_answer callback! See here for details and examples.

Topic		Replies	Views
NER, additional checking after highlighting spans usage , ner	2	275	July 2, 2021
Limit number of annotations usage , ner , custom , solved	4	444	March 8, 2022
Fully manual NER annotations without tokeniser enhancement , ner , done	3	996	June 17, 2020
ner.train on data not annotated by Spacy? ner	3	1148	June 11, 2018
Correction of annotation in UI enhancement , done	5	1349	December 25, 2017

Hard limit on consecutive tokens in NER annotations

Related topics