Review NER + Relation annotations jointly

Hi,

I am working on biomedical IE and have used prodigy to annotate ~4K sentences, labelling entities and relations jointly. My recipe considers 6 types of entities and 3 relationship types and I have used something similar to your example:

prodigy rel.manual rel_bio en_core_web_sm ./bio_events.jsonl --label Theme,Cause --span-label GGP,Gene_Expr,Transcr,Prot_Cat,Phosph,Loc,Bind,Reg,Reg+,Reg- --wrap 

Each sentence has been annotated by at least two different annotators and now, I would like to ask an expert to review and resolve disagreements at the entity and relationship level. For this, it would be useful to display, for each sentence, how each annotator has labelled it and give the option to the expert to modify spans and relations in the interface. I have tried with

 prodigy review reviewed_rel_bio rel_bio_an1,rel_bio_an2 --view-id relations --label Theme,Cause --span-label GGP,Gene_Expr,Transcr,Prot_Cat,Phosph,Loc,Bind,Reg,Reg+,Reg-

However, the --view-id relations doesn't recognise the argument --span-label. For the nature of this task, it's very important to be able to annotate both, entities and relations simultaneously and be able to easily perform reviews/fixes on those cases in which there are disagreements. Is there any workaround for this?

Ah yeah, that makes sense! The easiest solution would be to just put a prodigy.json in your current working directory and put a "relations_span_labels": [] in it, containing your span labels. This should overwrite the default and make your span labels available to the interface.

(I'll need to think of a way to handle these extra settings in the review UI going forward!)

Thank you so much for this Ines, your solution worked well and allowed us to have entities and relations in the same review interface.

Something I noticed is that when annotations mismatch/disagree at the entity or relation levels you don't see what each annotator labelled in one task/interface as you do in reviews of NER/text classification. Instead, you see the same sentence repeated in different (consecutive) tasks, each task with the labels of each annotator. That means that you have to "resolve" the same sentence as many times as input datasets. However, if two annotators agree on the task, then you only see that example once.

Is this the expected behaviour? It seems like a big limitation to not be able to visualise/compare labels across datasets in a single task.

This is definitely not expected and somehow must come down to the hashes that are generated – because these are used to decide whether two annotations are referring to the same input example (i.e. are different versions of the same annotation) or whether they're different tasks.

If you look at the JSON data for the examples generated in the review workflow (especially disagreements on the same example), do they end up with different _input_hash values?

Hi Ines, thanks a lot for the answer.

I looked at the JSON entries from each annotator dataset and they have the same sequence of _input_hash values. I re-tried with a very small sample data and annotating 2 datasets with rel.manual. But when I launch the review recipe each version of the same annotation appears in a different task. These are the sample datasets I was using in jsonl format: labels_2.jsonl (11.4 KB) labels_1.jsonl (9.5 KB) .

Thanks, this is very helpful! I think what might be happening here is that the relations UI annotations are not correctly interpreted as "manual" annotations, so the review recipe doesn't merge all annotations with the same input hash into one example.

Basically, there are 2 different ways to interpret examples for review: by input (= consider all annotations on the same input hash different versions, e.g. manual annotations) or by task (= consider only different answers on the same exact annotation as different version, e.g. binary feedback where you'd have different suggestions about the same text and want to review/compare the accept/reject answers separately).

So here's one quick thing to try before I look into this deeper: if you run prodigy stats, you'll find the location of your Prodigy installation. If you open the file recipes/review.py, you should find the following line:

is_manual = global_view_id and global_view_id.endswith(("_manual", "choice"))

Try adding "relations" to that list and see if that solves th problem!

Thanks, Ines that's very good to know. I included "relations" into that list and now all the versions are grouped into the same task and each example appears only once. However, the interface displays only one default annotation (I think it's the one from the first input dataset) but you still can't see what other annotators labelled, so it's not possible to know whether all annotators agreed or disagreed (and why they disagreed) on that example. I leave a toy example of how one example that has been labelled differently by two annotators looks in my UI.

Ah, damn, I thought we could just hack this in, but I just noticed there's a corresponding change we also need to make in the web app (so the web app also understands that relations is indeed a manual UI). I'll add this to my list and we'll update this with the next version! :slightly_smiling_face:

1 Like

Thank you so much!