NER, additional checking after highlighting spans

Hi Prodigy Team. I wonder if we can make additional checking when the annotator highlights the words. We have annotation tasks similar to NER where the annotator highlights some phrases from a review (many sentences) and labels them. There is a requirement that the annotator should not highlight phrases from 2 different sentences.

Thanks :'D

Hi! It sounds like this is a good use case for the validate_answer recipe callback: https://prodi.gy/docs/custom-recipes#validate_answer

You'll be able to define a Python function that receives the annotated example when it's submitted and raises an error if the annotations are considered invalid. The error is then shown to the user as an alert so they can fix the annotations, and they'll only be able to submit if the checks pass.

For efficiency, you probably want to do the sentence segmentation when you generate the examples you send out for annotation, and store the information with the JSON, so you don't have to do it during validation (which can potetially take longer). One simple option would be to just store the character offsets of the sentence starts, e.g. "sentence_starts": [50, 125, 293, 290]. For each span, you'll have the start and end character offsets, so you can easily calculate which sentence it belongs to based on its "start". You can then raise if there are two spans from two different sentences.

Great! I will try this. Thank you for your support :"D

1 Like