Correcting textcat.manual

Hi,

I have a couple questions about correcting a text categorization model!

This is how I ran the annotation session:

prodigy textcat.manual annotated-data ./input-data.jsonl --label cat1, cat2, cat3

In my text categorizations, it is possible that sometimes two categories apply to a given text. So when I run the following code, the model will sometimes predict two categories:

prodigy textcat.correct corrected-data ./textcat-model ./inputdata.jsonl --label cat1, cat2, cat3 --exclude annotated-data

In some cases, however, the model predicts two categories, but only one of them is actually correct. It does not seem able for me to correct that; I can't "de-select" one category, and accept the other.

Should I just reject or ignore the annotation?

Can you explain what happens to rejected and ignored annotations when I retrain the model?

Thank you!

Perhaps a nitpick, but are you sure that line would run? I think we don't support whitespaces between labelnames.

> prodigy textcat.manual annotated-data ./issue-6064/examples.jsonl --label cat1, cat2, cat3
Using 1 label(s): cat1
usage: prodigy textcat.manual [-h] [-lo None] [-l None] [-E] [-e None] dataset source
prodigy textcat.manual: error: unrecognized arguments: cat2, cat3

This would work though.

prodigy textcat.manual annotated-data ./issue-6064/examples.jsonl --label cat1,cat2,cat3

Just to check though, are you seeing an interface that looks like this when you run textcat.correct? If not, could you share a screenshot?

Thanks for your response, apologies for the delay in replying!

You are correct, I ran the code without whitespaces between labelnames – I made a mistake in the example!

My interface doesn't look exactly like yours, however. I can only select one category. In this example there are two categories (incorrectly) labeled in grey. I was assuming this meant that they are both selected, but now I am not sure. If my interface looked more like your example, that would be more useful.

Thank you!

Interesting.

Just to check, do you have a custom prodigy.json file defined? If so, does it have the choice_style setting set to "single"? It might be good to manually set it to "manual". The docs explain the reasoning behind this setting here.

We do have a prodigy.json file, but it doesn't specify any choice_style settings!

{
    "db": "postgresql",
    "db_settings": {
      "postgresql": {
        "dbname": "prodigy"
      }
    }
  }

Are there any environment variables that might influence the situation?

You can print them in a Python script via:

import os
​
for name, value in os. environ. items():
    print(f"{name}: {value}")

If it's not that, is there a minimum viable example that you can share? Maybe a single item from your input-data.jsonl file that causes the issue?