Hi Ines! I was able to do as you said and just run the next command and prodigy automatically created the dataset for me. When I try to run your next command to load the reddit comments, I get this error saying it doesn't recognize the dataset I am trying to pass in...not sure if I'm using a deprecated command again?
When I run these commands, it seems to work and I am able to save the annotation:
!python -m prodigy terms.teach insult_seeds en_core_web_lg --seeds insults.txt
then to check:
!python -m prodigy db-out insult_seeds
which outputs:
{"text":"dick","meta":{"score":0.8098309199},"_input_hash":-690856415,"_task_hash":-1277424855,"_session_id":null,"_view_id":"text","answer":"accept"}
{"text":"fuck","meta":{"score":0.8087529978},"_input_hash":-192289499,"_task_hash":1698676519,"_session_id":null,"_view_id":"text","answer":"reject"}
{"text":"pussy","meta":{"score":0.8068196275},"_input_hash":436531074,"_task_hash":1495846518,"_session_id":null,"_view_id":"text","answer":"accept"}
{"text":"fucking","meta":{"score":0.8057290128},"_input_hash":481507976,"_task_hash":-1327770953,"_session_id":null,"_view_id":"text","answer":"reject"}
{"text":"fucker","meta":{"score":0.8030490282},"_input_hash":-1821531370,"_task_hash":2079464020,"_session_id":null,"_view_id":"text","answer":"accept"}
{"text":"cock","meta":{"score":0.7993727761},"_input_hash":1199091593,"_task_hash":-1521700463,"_session_id":null,"_view_id":"text","answer":"reject"}
{"text":"whore","meta":{"score":0.7981011512},"_input_hash":1483481241,"_task_hash":179717541,"_session_id":null,"_view_id":"text","answer":"accept"}.............
But then when I try to run the command to use this dataset with the reddit comments, this is what I get:
!python -m prodigy textcat.teach insults en_core_web_sm RC_2015-01.bz2 --loader reddit --label INSULT --patterns insult_seeds
output:
Using 1 label(s): INSULT
Traceback (most recent call last):
File "C:\Users\t724614\Anaconda3\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\t724614\Anaconda3\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\t724614\Anaconda3\lib\site-packages\prodigy\__main__.py", line 54, in <module>
controller = recipe(*args, use_plac=True)
File "cython_src\prodigy\core.pyx", line 329, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "C:\Users\t724614\Anaconda3\lib\site-packages\plac_core.py", line 367, in call
cmd, result = parser.consume(arglist)
File "C:\Users\t724614\Anaconda3\lib\site-packages\plac_core.py", line 232, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "C:\Users\t724614\Anaconda3\lib\site-packages\prodigy\recipes\textcat.py", line 71, in teach
matcher = matcher.from_disk(patterns)
File "cython_src\prodigy\models\matcher.pyx", line 260, in prodigy.models.matcher.PatternMatcher.from_disk
ValueError: Can't find patterns file: insult_seeds
The dataset insult_seeds
should be in the same folder i am working in (I assume) so I haven't added a ./
. That being said, I tested it with a ./
as well and got the same error.