TypeError: 'str' object does not support item assignment

Hi, I am trying to test a pattern matching with the standard email recognition LIKE_EMAIL.

I have the following recipe:

python -m prodigy match email_NG_data en_core_web_md ./NG_data.jsonl --patterns ./patterns.jsonl --label EMAIL

I get the following error message:

Using 1 label(s): EMAIL
Added dataset email_NG_data to database SQLite.
Traceback (most recent call last):
  File "C:\Users\xxx\Miniconda3\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\xxx\Miniconda3\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\xxx\Miniconda3\lib\site-packages\prodigy\__main__.py", line 61, in <module>
    controller = recipe(*args, use_plac=True)
  File "cython_src\prodigy\core.pyx", line 337, in prodigy.core.recipe.recipe_decorator.recipe_proxy
  File "cython_src\prodigy\core.pyx", line 364, in prodigy.core._components_to_ctrl
  File "cython_src\prodigy\core.pyx", line 125, in prodigy.core.Controller.__init__
  File "cython_src\prodigy\components\feeds.pyx", line 168, in prodigy.components.feeds.Feed.__init__
  File "cython_src\prodigy\components\stream.pyx", line 107, in prodigy.components.stream.Stream.__init__
  File "cython_src\prodigy\components\stream.pyx", line 58, in prodigy.components.stream.validate_stream
  File "C:\Users\xxx\Miniconda3\lib\site-packages\prodigy\recipes\generic.py", line 121, in <genexpr>
    stream = (eg for _, eg in matcher(stream))
  File "cython_src\prodigy\models\matcher.pyx", line 258, in __call__
TypeError: 'str' object does not support item assignment

My environment:

============================== ✨  Prodigy Stats ==============================

Version          1.11.2
Location         C:\Users\xxx\Miniconda3\lib\site-packages\prodigy
Prodigy Home     C:\Users\xxx\.prodigy
Platform         Windows-10-10.0.18362-SP0
Python Version   3.8.3
Database Name    SQLite
Database Id      sqlite
Total Datasets   3
Total Sessions   4

patterns.jsonl:

{"label":"EMAIL","pattern":[{"LIKE_EMAIL":true}]}

First rows NG_data.jsonl:

{"text":"From: mlee@eng.sdsu.edu (Mike Lee)\nSubject: MPEG for x-windows MONO needed.\n\nHello, and thank you for reading this request.  I have a Mpeg viewer for x-windows and it did not run because I was running it on a monochrome monitor.  I need the mono-driver for mpeg_play.   \n\nPlease post the location of the file or better yet, e-mail me at mlee@eng.sdsu.edu.\n\n","meta":"NG"}
{"text":"From: ab245@cleveland.Freenet.Edu (Sam Latonia)\nSubject: Re: Monitors - Nanao?\n\n\nThere is a good report list on most all of the good monitors in this\nmonths issue of Computer Shoppers magazine, with their phone munbers\nand all (April issue) $2.99....Sam\n-- \nGosh..I think I just installed a virus..It was called MS DOS6...\nDon't copy that floppy..BURN IT...I just love Windows...CRASH...\n","meta":"NG"}
{"text":"From: pmy@vivaldi.acc.virginia.edu (Pete Yadlowsky)\nSubject: Re: Who's next?  Mormons and Jews?\n\nCOCHRANE,JAMES SHAPLEIGH writes\n\n>it wouldn't be the first time a group has committed suicide to avoid the \n>shame of capture and persecution.\n\nThis group killed itself to fulfill its interpretation of prophecy\nand to book a suite in Paradise, taking innocent kids along for the\nride. I hardly think the feds were motivated by persecution. If they\nwere, all Koresh would have had to do was surrender quietly to the\nauthorities, without firing a shot, to get the American people behind\nhim and put the feds in the hot seat. But no, God told him to play\nthe tough guy. There's great strength in yielding, but few appreciate\nthis. \n\n--\nPeter M. Yadlowsky              |  Wake! The sky is light!\nAcademic Computing Center       | Let us to the Net again...\nUniversity of Virginia          |    Companion keyboard.\npmy@Virginia.EDU                |                      - after Basho\n","meta":"NG"}
{"text":"From: weverett@jarthur.claremont.edu (William M. Everett)\nSubject: Re: The earth also pollutes......\n\nIn article <1993Apr21.090638.6253@titan.ksc.nasa.gov> rodger-scoggin@ksc.nasa.gov (Rodger C. Scoggin) writes:\n>In article <DZVB3B6w164w@cellar.org>, techie@cellar.org (William A Bacon) says:\n>>\n\n>The Earth may spew alot of substances into the atmosphere, but the quality \n>of your toxic output can easily make up for the lack of quantity. \n\tExcuse me? Quality? As in grade A CO2 and grade B CO2? I may not have\nthis quite right but I was under the impression that CO2 was CO2.\n\n Furthermore, \n>the planet is a system of carbon, sulfur and other chemicals which have been\n>acting for billions of years, we are but newcomers to the system - we must adapt\n>and control in order to bring about stability.  Also, two wrongs do not make a right, \n>so continuing our practices despite overwhelming data is just ignorance in (non)action.\n\n\tA) There is no reason to believe this system is inherently stable- \nThe Ice ages occured without any help from humans.\n\n\tB) The point was that the human contribution of CO2 and other \ngreenhouse gasses is insignificant and it won't really make a difference if\nwe make more or less.\n\n\tC) What overwhelming data? I see lots of 'projections' of the future,\nwhich is fascinating, considering they can't predict the weather two weeks\nin advance.\n\n\t*********************************************************\n\t*  William Everett\t\tTan, Rested, and ready  *\n\t*  Harvey Mudd College\t\t     NIXON in '96       *\n\t*                                                       *   \n\t*  These opinions are mine- you can't have them         *   \n\t*********************************************************\n\n\n","meta":"NG"}

Could you please give me a hint where the error is coming from?

Thanks
Alfred

Hi! I just tried it and it looks like the problem here is that your data includes a simple string value as its "meta", which Prodigy normally uses as a dict:

"meta":"NG"

The matcher tries to add its score to it thinking it's a dict, so that's what causes the error you see. We should add a workaround for this to prevent the error and port the meta to a dict automatically, but in the meantime, you can fix this by making the meta a dictionary with keys, e.g. like this:

"meta": {"id": "NG"}

Hi @ines, Thank you so much for the quick reply and your help! It works great!
Is there a clean and efficient way to create a perfekt jsonl file from a dataframe in the correct dictionary format?

Especially in the meta field I always get the wrong format:

,"meta":"{\"source\": \"NG\"}"}

I tried it with to_json from the df and srsly via a dictionary from the df.
My solution now is a bit messy :slight_smile: but works -> search and replace with the jsonl file.... Sorry, I am not the most experienced programmer.

Thanks Alfred

I think the problem here is that you end up with the string {\"source\": \"NG\"} in your dataframe, which is essentially just a table. Instead, you either want a dictionary there as the value, or convert your dataframe to a dictionary and add the "meta" dict afterwards.

You also take care of stupid questions, thank you @ines! I think I have found my solution here:

Best Alfred