metric.iaa.doc fails with exception if option ids are integers.

Hello!
I'm running multiclass classification tasks and using "view_id": "choice". I've defined "options" for each task as

[
    {"id": 0, "text": "A"},
    {"id": 1, "text": "B"},
    {"id": -1, "text": "C"},
]

as suggested in Computer Vision · Prodigy · An annotation tool for AI, Machine Learning & NLP.

Everything seems to work.

Then prodigy metric.iaa.doc dataset:<dataset> multiclass fails with

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/venv/lib/python3.11/site-packages/prodigy/__main__.py", line 50, in <module>
    main()
  File "/venv/lib/python3.11/site-packages/prodigy/__main__.py", line 44, in main
    controller = run_recipe(run_args)
                 ^^^^^^^^^^^^^^^^^^^^
  File "cython_src/prodigy/cli.pyx", line 123, in prodigy.cli.run_recipe
  File "cython_src/prodigy/cli.pyx", line 124, in prodigy.cli.run_recipe
  File "/venv/lib/python3.11/site-packages/prodigy/recipes/metric.py", line 137, in metric_iaa_doc
    m.measure(stream)
  File "cython_src/prodigy/components/metrics/iaa_doc.pyx", line 92, in prodigy.components.metrics.iaa_doc.IaaDoc.measure
  File "cython_src/prodigy/components/metrics/_util.pyx", line 53, in prodigy.components.metrics._util._validate_dataset
  File "cython_src/prodigy/components/metrics/_util.pyx", line 143, in prodigy.components.metrics._util._validate_labels
TypeError: sequence item 0: expected str instance, int found

After some time I figured out to change task options to

[
    {"id": "a", "text": "A"},
    {"id": "b", "text": "B"},
    {"id": "c", "text": "C"},
]

and metric.iaa.doc started working properly.

I assume this is a bug but maybe I misunderstand something.

  1. Would be great if either examples stopped using integer ids (if they are unsupported), or metric.iaa.doc started working with them
  2. I think Prodigy would really benefit from user input validation (or at least typing) with something like Pydantic so that there would never be inconsistency in what user can enter and what Prodigy expects and accepts.

Thanks!

Hi @egor,

Thanks so much for the report. That indeed is a bug. It is actually only a print statement inside metrics that complains about int. int are allowed which is why the validation check didn't fire (although this particular recipe uses internal validation functions, not Pydantic models which currently are used for the recipes that interface with the UI). We'll fix it asap.

It is obviously a very good remark about the data validation. Atm Prodigy uses Pydantic models for validating data structures sent to the UI, but we have plans (and work under way) to extend this validation to the level of input Stream and different annotation tasks (including custom types) (structured streamfeature to be released in Prodigy 2.0).

1 Like