Hi there
I am trying to build a model for text classification to classify sentences that are History, i have done the following steps:
- Build a terminology list/seed terms using Prodigy using the following commands:
python -m prodigy dataset History_seed "Collect seeds for History"
python -m prodigy terms.teach History_seed en_core_web_lg --seeds history.txt
- Output the seeds collected in the previous step as a jsonl file using the following commands:
python -m prodigy db-out History_seed > History_terms.jsonl
- Annotating sentences that belong to a particular class, using the following commands:
python -m prodigy dataset History_anno "collect annotations History"
python -m prodigy textcat.teach History_anno en_core_web_lg history_train.jsonl -- label History -- patterns History_terms.jsonl
(history_train.jsonl is the jsonl which contain the training set which i created earlier)
Unfortunately, i have faced errors when running the last command, which seems to suggest that the pattern for the terminology jsonl file is not recognised, the following is the error
D:\usersp\admin\Desktop\Prodigy\History>python -m prodigy textcat.teach History_anno en_core_web_lg history_train.jsonl --label HISTORY --patterns History_terms.jsonl
Using 1 labels: HISTORY
Traceback (most recent call last):
File "D:\usersp\admin\Anaconda3\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "D:\usersp\admin\Anaconda3\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "D:\usersp\admin\Anaconda3\lib\site-packages\prodigy\__main__.py", line 259, in <module>
controller = recipe(*args, use_plac=True)
File "cython_src\prodigy\core.pyx", line 253, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "D:\usersp\admin\Anaconda3\lib\site-packages\plac_core.py", line 328, in call
cmd, result = parser.consume(arglist)
File "D:\usersp\admin\Anaconda3\lib\site-packages\plac_core.py", line 207, in consume
return cmd, self.func(*(args + varargs + extraopts), **kwargs)
File "D:\usersp\admin\Anaconda3\lib\site-packages\prodigy\recipes\textcat.py", line 58, in teach
matcher = matcher.from_disk(patterns)
File "cython_src\prodigy\models\matcher.pyx", line 192, in prodigy.models.matcher.PatternMatcher.from_disk
File "cython_src\prodigy\models\matcher.pyx", line 118, in prodigy.models.matcher.PatternMatcher.add_patterns
File "cython_src\prodigy\models\matcher.pyx", line 55, in prodigy.models.matcher.create_matchers
File "cython_src\prodigy\models\matcher.pyx", line 29, in prodigy.models.matcher.parse_patterns
ValueError: Invalid pattern: {'text': 'pmhx', 'answer': 'accept', '_input_hash': -1857482619, '_task_hash': -1534345529}
I have previously done text classification in the same manner and was able to proceed with the annotation. Would you be able to share on what is the issue that i am facing ?
This is an example of the content of the History_terms.jsonl file
{"text":"years","answer":"accept","_input_hash":-766644989,"_task_hash":-303167857}
{"text":"history","answer":"accept","_input_hash":-718773650,"_task_hash":-52608476}
{"text":"medical history","answer":"accept","_input_hash":-1321831807,"_task_hash":-2121037943}
{"text":"university","meta":{"score":0.7690472063},"_input_hash":-968060743,"_task_hash":1617902080,"answer":"accept"}
{"text":"education","meta":{"score":0.7611488893},"_input_hash":389341424,"_task_hash":-303253017,"answer":"reject"}
{"text":"student","meta":{"score":0.7539709622},"_input_hash":1449487300,"_task_hash":1808011982,"answer":"accept"}
Thanks in advance!
I m using Prodigy version 1.6.1