Error in metric.iaa.doc missing view_id in task_hash

We are working on a web version of prodigy deployed in an Kubernetes environment. We started a multiclass task with feed_overlap=true for multiple annotators. When I am trying to get the iaa metric with python -m prodigy metric.iaa doc dataset:dataset_name multiclass it is giving the attached error.
image
We are not using any custom recipes. We are using textcat.manual with two labels.

Can you please help on this issue?

Welcome to the forum @Sagnik_Bhattacharyya :wave:

This error tells us that the input data, concretely the example with _task_hash -964377616 (but it could be just the first example) doesn't have the expected format. We should verify it. Could you inspect the dataset that you're inputting to the metric.iaa.doc and see what are the dictionary keys there?
You can inspect the dataset using Prodigy db-out command.

1 Like

Hi @magdaaniol, Thanks for the quick response! I looked into the dataset as you suggested and for the specific task hash that I am getting the error is having missing keys.
image
So in the screenshot you can see, these are the only key-value pairs in the data for this specific task. And it has got answer=accept, yet, there are no annotator_id, or view_id or options, etc.

Hi @Sagnik_Bhattacharyya ,

If all your dataset entries look like the one you've shared, the data must have been collected using a custom recipe or the dataset was otherwise postprocessed. Prodigy built-in recipes always save the _view_id, _annotator_id and _session_id attributes.
In order to use the inter-annotator agreement recipes out of the box, you'd have to modify your dataset to add this information. After all, you need to know the id of the annotator that annotated each example to be able to compute the agreement.
From your screenshot it looks like you are storing a custom annotator id field under meta. Is annon_id the annotator id perhaps?
Also, the data entry you shared does not come from textcat.manual with two labels. It is a binary annotation, so the designated IAA recipe would be metric.iaa.binary.

Either way, as mentioned above your dataset will need the _view_id and _annotator_id key. If you have that information (I'm assuming you would be able to use the currenf annon_id key) you can iterate through the examples and add these keys in a python script.
So, first you might want to store your dataset to disk:

python -m prodigy db-out your-dataset > ./your-dataset.jsonl

Then you'd add the missing information in a python script:

import copy

import srsly
from wasabi import msg

dataset = srsly.read_jsonl("test.jsonl")
new_dataset = []

view_id = "choice"
for eg in dataset:
    eg_copy = copy.deepcopy(eg)
    task_hash = eg_copy.get("task_hash")
    annotator_id = eg_copy.get("meta", {}).get("annon_id")
    if annotator_id is None:
        msg.warn(
            f"No annon_id available in task with _task_hash {task_hash}. Skipping this example."
        )
    else:
        eg_copy["_view_id"] = view_id
        eg_copy["_annotator_id"] = annotator_id
        new_dataset.append(eg_copy)


srsly.write_jsonl("./modified_test.jsonl", new_dataset)

The modified dataset could then be used in IAA recipes. As mentioned above, double check which IAA recipe is the right choice for your data because the sample provided looks like a binary annotation while you mentioned you've been annotating two labels.

1 Like

Hi @magdaaniol, Thanks for the support. We will keep this in mind while conducting future experiments. However, in my case, I discovered the root of the bug. I had used the same dataset for storing the examples and the annotations. So, when invoking the metric.iaa it was not finding the view_id in the raw data examples. Ideally these two should be different datasets. I isolated the annotated data and reuploaded it in another dataset and the metric.iaa was working fine.

Thanks for the help. We can mark this issue as closed.

Regards,
Sagnik Bhattacharyya

1 Like