What is the syntax for textcat.teach for multi-class?


I can manually label as

prodigy textcat.manual email ./subject-text-stratified-990.jsonl --exclusive --label x,y,z

What is the equivalent for teach?

prodigy textcat.teach email  blank:en ./subject-text-stratified-990.jsonl --label x,y,z 

I get this when I try the above.

Using 3 label(s): x, y, z
Added dataset email to database SQLite.
Traceback (most recent call last):
  File "/home/aia/.conda/envs/prodigy/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/aia/.conda/envs/prodigy/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/aia/.conda/envs/prodigy/lib/python3.9/site-packages/prodigy/__main__.py", line 61, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src/prodigy/core.pyx", line 374, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "cython_src/prodigy/core.pyx", line 63, in prodigy.core.Controller.from_components
  File "cython_src/prodigy/core.pyx", line 160, in prodigy.core.Controller.__init__
  File "cython_src/prodigy/components/feeds.pyx", line 104, in prodigy.components.feeds.Feed.__init__
  File "cython_src/prodigy/components/feeds.pyx", line 150, in prodigy.components.feeds.Feed._init_stream
  File "cython_src/prodigy/components/stream.pyx", line 107, in prodigy.components.stream.Stream.__init__
  File "cython_src/prodigy/components/stream.pyx", line 58, in prodigy.components.stream.validate_stream
  File "cython_src/prodigy/components/sorters.pyx", line 104, in __iter__
  File "cython_src/prodigy/components/sorters.pyx", line 14, in genexpr
  File "cython_src/prodigy/models/textcat.pyx", line 78, in predict
TypeError: 'str' object does not support item assignment

Any help appreciated.

1 Like

Hello @jdewsnip,

thank you for your question and welcome to the prodigy forum!

Could you post an example of your .jsonl-file? I suspect that you might use the key "meta" with a string value in your data, for which Prodigy uses a dict as value. During the prediction, it tries to add a score to the meta-dictionary which could cause the error you see.

Ah ok that makes sense.

"meta":"{\"uuid\":\"<xxx@mail.gmail.com>\",\"subject\":\"RE: la la la\",\"date\":\"2020-03-07 09:21:24\",\"to\":\"someguy@somewhere.com\",\"from\":\"a@b.com\",\"n_attachments\":4,\"topic_name\":\"foo\",\"topic_num\":6}"

I have "validate": false in prodigy.json and I guess this can not be ignored for teach as it updates the meta?

So its bad dict encoding?


"meta":"{\"uuid\":\"<xxx@mail.gmail.com>\",\"subject\":\"RE: la la la\",\"date\":\"2020-03-07 09:21:24\",\"to\":\"someguy@somewhere.com\",\"from\":\"a@b.com\",\"n_attachments\":4,\"topic_name\":\"foo\",\"topic_num\":6}"

So its bad dict encoding?

I have "validate": false would that explain why manual works and teach does not?

Sorry @jdewsnip , our system thought that your replies were spam which is why they're hidden. I unmarked one of them as non-spam, so hopefully it is not hidden anymore.

textcat.manual does not access the meta-key which is why the error does not occur there.

You could change your meta-value to a dictionary with a key like "additional_information" or "mail_header" having the string as value:

"meta": {"additional_information": "{\"uuid\":\"<xxx@mail.gmail.com>\",\"subject\":\"RE: la la la\",\"date\":\"2020-03-07 09:21:24\",\"to\":\"someguy@somewhere.com\",\"from\":\"a@b.com\",\"n_attachments\":4,\"topic_name\":\"foo\",\"topic_num\":6}"}

Or you encode your string directly as a dictionary, similar to this:

"meta": {"uuid": "<xxx@mail.gmail.com>","subject":"RE: la la la","date":"2020-03-07 09:21:24","to": "someguy@somewhere.com", "from":"a@b.com","n_attachments":4,"topic_name":"foo","topic_num":6}

I hope one of the two approaches works for you. Please let me know if not or if you have any further questions.