Hi everyone!
I have a problem when I want to do the NER for two-word entities, for example, :'crude oil'.And I read the following discussion
Then I wrote a python script to transform a list of terms (text file) to the patterns(Jsonl file). I don't know it is right or not, but anyway, I put my code below:
'----------------------------Input Parameter-------------------------------------'
#import argparse
#parser = argparse.ArgumentParser(description='Input the Parameter')
filename = input('Please input the text file name,example:stock.txt ')
LABEL=input('Please input the label,example:STOCK ')
patterns_name=input('Please input the name of jsonl file,example:stock_patterns.jsonl ')
#filename = 'stock.txt'
#LABEL='STOCK'
#patterns_name='stock_patterns.jsonl'
'------------------Read the text file and Lower case the letter----------------'
result=[]
file = open(filename,"r",encoding="utf-8",errors="ignore")
while True:
mystr = file.readline()#read line by line
result.append(mystr)
if not mystr:
result.pop()
file.close()
break
'--------------------build the dictionary--------------------------------------'
final=[]
dictionary={}
item_list=[]
pattern_list=[]
for item in result:
if ' ' in item:
item_list=item.split(' ')
for element in item_list:
element=element.strip('\n')
pattern_list.append({'lower':element})
dictionary={'label':LABEL,'pattern':pattern_list}
item_list=[]
pattern_list=[]
final.append(dictionary)
else:
item=item.strip('\n')
dictionary={'label':LABEL,'pattern':[{'lower':item}]}
final.append(dictionary)
'--------Transform the file to JSON--------------------------------------------'
import json
with open(patterns_name, 'w') as f:
for item in final:
json.dump(item, f)
f.write('\n')
'------------------------------------------------------------------------------------------'
This script will give me a result similar to the prodigy command:
prodigy terms.to-patterns
But when I use the patterns to do the ner. match, with the command:
prodigy ner.match commodities_ner en_core_web_sm commodities_dataset.jsonl --patterns commodities_patterns.jsonl
I received the following error:
Traceback (most recent call last):
File "cython_src/prodigy/components/feeds.pyx", line 130, in prodigy.components.feeds.SessionFeed.get_session_stream
File "/home/ec2-user/.local/lib/python3.7/site-packages/toolz/itertoolz.py", line 368, in first
return next(iter(seq))
StopIteration
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/waitress/channel.py", line 338, in service
task.service()
File "/usr/local/lib/python3.7/site-packages/waitress/task.py", line 169, in service
self.execute()
File "/usr/local/lib/python3.7/site-packages/waitress/task.py", line 399, in execute
app_iter = self.channel.server.application(env, start_response)
File "/usr/local/lib/python3.7/site-packages/hug/api.py", line 423, in api_auto_instantiate
return module.__hug_wsgi__(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/falcon/api.py", line 244, in __call__
responder(req, resp, **params)
File "/usr/local/lib/python3.7/site-packages/hug/interface.py", line 793, in __call__
raise exception
File "/usr/local/lib/python3.7/site-packages/hug/interface.py", line 766, in __call__
self.render_content(self.call_function(input_parameters), context, request, response, **kwargs)
File "/usr/local/lib/python3.7/site-packages/hug/interface.py", line 703, in call_function
return self.interface(**parameters)
File "/usr/local/lib/python3.7/site-packages/hug/interface.py", line 100, in __call__
return __hug_internal_self._function(*args, **kwargs)
File "/usr/local/lib64/python3.7/site-packages/prodigy/app.py", line 105, in get_questions
tasks = controller.get_questions()
File "cython_src/prodigy/core.pyx", line 109, in prodigy.core.Controller.get_questions
File "cython_src/prodigy/components/feeds.pyx", line 56, in prodigy.components.feeds.SharedFeed.get_questions
File "cython_src/prodigy/components/feeds.pyx", line 61, in prodigy.components.feeds.SharedFeed.get_next_batch
File "cython_src/prodigy/components/feeds.pyx", line 137, in prodigy.components.feeds.SessionFeed.get_session_stream
ValueError: Error while validating stream: no first example. This likely means that your stream is empty.
I tried to change the dataset,so there is no error,but the website stay in LOADING....
May I ask if I am doing something wrong?If this method is not feasible,how can I deal with the two-word entities?
Thanks!!