metric.iaa.doc fails with exception if option ids are integers.

egor · May 3, 2024, 8:07am

Hello!
I'm running multiclass classification tasks and using "view_id": "choice". I've defined "options" for each task as

[
    {"id": 0, "text": "A"},
    {"id": 1, "text": "B"},
    {"id": -1, "text": "C"},
]

as suggested in Computer Vision · Prodigy · An annotation tool for AI, Machine Learning & NLP.

Everything seems to work.

Then prodigy metric.iaa.doc dataset:<dataset> multiclass fails with

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/venv/lib/python3.11/site-packages/prodigy/__main__.py", line 50, in <module>
    main()
  File "/venv/lib/python3.11/site-packages/prodigy/__main__.py", line 44, in main
    controller = run_recipe(run_args)
                 ^^^^^^^^^^^^^^^^^^^^
  File "cython_src/prodigy/cli.pyx", line 123, in prodigy.cli.run_recipe
  File "cython_src/prodigy/cli.pyx", line 124, in prodigy.cli.run_recipe
  File "/venv/lib/python3.11/site-packages/prodigy/recipes/metric.py", line 137, in metric_iaa_doc
    m.measure(stream)
  File "cython_src/prodigy/components/metrics/iaa_doc.pyx", line 92, in prodigy.components.metrics.iaa_doc.IaaDoc.measure
  File "cython_src/prodigy/components/metrics/_util.pyx", line 53, in prodigy.components.metrics._util._validate_dataset
  File "cython_src/prodigy/components/metrics/_util.pyx", line 143, in prodigy.components.metrics._util._validate_labels
TypeError: sequence item 0: expected str instance, int found

After some time I figured out to change task options to

[
    {"id": "a", "text": "A"},
    {"id": "b", "text": "B"},
    {"id": "c", "text": "C"},
]

and metric.iaa.doc started working properly.

I assume this is a bug but maybe I misunderstand something.

Would be great if either examples stopped using integer ids (if they are unsupported), or metric.iaa.doc started working with them
I think Prodigy would really benefit from user input validation (or at least typing) with something like Pydantic so that there would never be inconsistency in what user can enter and what Prodigy expects and accepts.

Thanks!

magdaaniol · May 3, 2024, 10:01am

Hi @egor,

Thanks so much for the report. That indeed is a bug. It is actually only a print statement inside metrics that complains about int. int are allowed which is why the validation check didn't fire (although this particular recipe uses internal validation functions, not Pydantic models which currently are used for the recipes that interface with the UI). We'll fix it asap.

It is obviously a very good remark about the data validation. Atm Prodigy uses Pydantic models for validating data structures sent to the UI, but we have plans (and work under way) to extend this validation to the level of input Stream and different annotation tasks (including custom types) (structured streamfeature to be released in Prodigy 2.0).

Topic		Replies	Views
Error in metric.iaa.doc missing view_id in task_hash usage , textcat , solved , metrics	4	141	April 23, 2024
unable to use classify-images	2	14	October 5, 2024
computer-vision multi-label example fails out-of-the-box when using `print-dataset` done	2	423	February 4, 2020
Cannot build reliability matrix: multiple annotations from <ANNOTATOR> textcat	2	185	October 10, 2023
iaa-score in a context of blocks with multiclass custom	3	160	March 20, 2024

metric.iaa.doc fails with exception if option ids are integers.

Related topics