Hey everyone - I thought I would share some recipes I made for calculating Inter-Annotator Agreement (also called Inter-Rater Reliability) in Prodigy. It supports a variety of annotation situations so you can have any number of annotators and incomplete overlap of annotations. Currently it supports classification tasks (binary/multiclass/multilabel), but not any spans, NER, audio, or image tasks.
Here's a link to the repo: GitHub - pmbaumgartner/prodigy-iaa
Everything is bundled as a python package so you should be able to install it and try things out. The installation instructions and more details are in the README. If anyone has any feedback comment here or open an issue on that repo if you find any bugs.