Hi there
I am trying to match a list of patterns to text. My pattern file looks like this:
{“label”: “Action”, “pattern”: [{“lower”: “RIH”}]}
{“label”: “Equipment”, “pattern”: [{“lower”: “ps-21”}, {“lower”: “slips”}]}
My input CSV looks like this:
id,text
65ff62f85f222e98ac292682c4f7eee8,Installed PS21 slips. RU auto pipe handling system.
50fb94effa05495d39636894121086bc,Installed RST toolstring into lubricator.
466dd1bf0dd8ed7661ff4bcb4cf5fde9,Installed Riser centralizer in tension deck.
cabadddd5a88eaf89ba6f134f52fcca2,Installed Slick joint and Diverter.
256fae0b8ee0410824d15f9d62deb3f4,“Installed Spider. Time Wind speed Wind direction Sea Wave direction Knots deg m deg 21:30 4/8 160° 1,7/2,7 350°”
093e2f523b1b01d54a731cf725ec5b19,“Installed PS-21 slips. Continued RIH w/ 7"” liner on 5½"" DP from 487m to 3036m, filling every 5th stand. Entered 9 5/8"" liner @ 2404m without any obstructions."
I used this command:
prodigy ner.match sample_dataset en_core_web_sm my_csv.csv --patterns sample_patterns.jsonl
I end up with these problems:
- It starts with the last input sentence and tags only “PS-21 slips” for equipment. RIH is not identified even if it exists in the text. (Changed case also and checked it does not come up).
- it finishes up with that one sentence. and i get “no tasks available”. if i delete that line of the file and run it again, it gives an error. The error message looks like this:
Traceback (most recent call last):
File “cython_src/prodigy/core.pyx”, line 55, in prodigy.core.Controller.init
File “/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/toolz/itertoolz.py”, line 368, in first
return next(iter(seq))
StopIteration
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py”, line 193, in _run_module_as_main
“main”, mod_spec)
File “/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/prodigy/main.py”, line 259, in
controller = recipe(*args, use_plac=True)
File “cython_src/prodigy/core.pyx”, line 178, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File “cython_src/prodigy/core.pyx”, line 60, in prodigy.core.Controller.init
ValueError: Error while validating stream: no first batch. This likely means that your stream is empty.
I get the same error when i try to use any other files. I am stuck here. Could you help me on how to proceed?