Calculating Inter-Annotator Agreement (IAA) for relationships

ale · September 19, 2024, 1:39am

Hi,

I have relationships annotated with a custom recipe that used the relations interface to annotate NER + relations at the same time. How can I calculate the IAA for the relationship annotations?

I am not sure how to use Prodigy's iaa commands for relationships, or if that's a possibility.

Any advice you can provide me would be of great help.

Thanks!

magdaaniol · September 19, 2024, 9:57am

Hi @ale,

The build-in IAA recipes currently consider annotations from either span/token or text/audio/image classification recipes. relations is not supported yet.

You would need a custom script to compute IAA for your dataset. The tricky bit about calculating IAA on a joint NER+relations annotations is that you need to consider both token-level (for NER) and document level (for relations) annotations and only compute relation agreement for examples where child and head spans of the relations are agreed upon.
For this reason, I would recommend you compute IAA for spans and relation labels separately.
This will help you understand where most disagreements come from i.e:

compute agreement on NER spans - I recommend using pairwise F1 score for this. You will need to decide if you accept only strict agreement or allow for some span boundary errors.
compute agreement on relation labels for agreed-upon spans - You can use Fleiss' Kappa (Cohen's Kappa if you only work with two annotators) or Krippendorff's Alpha (Python package)

You can of course combine the two agreements in some way to get the final score, but it's probably best to report the two numbers separately.

ale · September 20, 2024, 6:27am

Thanks for your feedback @magdaaniol !

Indeed, I will keep IAA metrics for NER and relations separate. I will work on a custom script for the relations part.

On a related note, I used the recipe metric.iaa.span for the span IAA. Does this recipe take into account all examples that were either accepted, skipped, or rejected or does it only focus on accepted examples?

Also, quick feedback: The documentation for metric.iaa.span in Built-in Recipes does not list the partial flag. It is only mentioned in Annotation Metrics. It may be good to add it there too.

Thanks!

magdaaniol · September 23, 2024, 10:10am

Hi @ale,

Does this recipe take into account all examples that were either accepted, skipped, or rejected or does it only focus on accepted examples?

The metric.iaa.span recipe takes into account only accepted examples. The reject and ignore cases are interpreted as if the annotator didn't provide the annotation to the given example at all.

Thanks so much for the feedback on the missing partial flag documentation! It's added now.

Topic		Replies	Views
✨ Prodigy v1.14.3 is out! ✨ news	4	407	October 17, 2023
Proper way to calculate inter-annotator agreement for spans/ner? ner , spancat , multi-user	5	5741	October 9, 2023
iaa-score in a context of blocks with multiclass custom	3	160	March 20, 2024
Inter-Annotator Agreement Recipes - Feedback Wanted multi-user	2	1107	December 20, 2022
IAA metrics error usage , ner	13	285	October 23, 2023

Calculating Inter-Annotator Agreement (IAA) for relationships

Related topics