I am trying to verify if it is possible to add a pipeline to re-label the entities and evaluate the final entity results with prodigy.
For example, I want to re-label GOE_Other and Facility_Other to Org and evaluate them with prodigy.
Question1. The following steps do not recognize the relabeled entities on the prodigy side.
Please let me check if my technical approach is correct.
Question2.If I want to evaluate the final relabeled labels in prodigy, which recipe should I use?
The procedure is as follows
- Add a pipeline to re-label with Spacy (Google Colab)
setup.py
from setuptools import setup
setup(
name="extend_module",
entry_points={
"spacy_factories": ["relabeling = extend_module:relabeling",
"relabeling_factory = extend_module:RelabelongFactory"
],
}
)
extend_module.py
def delete_old_entities(doc, new_ents):
doc_ents = []
for doc_ent in doc.ents:
# print("doc_ent.text:",doc_ent.text)
is_exist = False
for ent in new_ents:
if doc_ent.start_char == ent.start_char:
is_exist = True
if is_exist is True:
continue
else:
doc_ents.append(doc_ent)
return doc_ents
def relabeling(nlp, doc):
new_ents = []
for ent in doc.ents:
lif (ent.label_ in ["GOE_Other","Facility_Other"]):
ent.label_ = "Org"
new_ent = Span(doc, ent.start, ent.end, label=nlp.vocab.strings['Org'])
new_ents.append(new_ent)
else:
ent.label_ = " "
new_ents.append(ent)
doc_ents = delete_old_entities(doc, new_ents)
filtered = filter_spans(doc_ents + new_ents ) # THIS DOES THE TRICK
doc.ents = filtered
return doc
@Language.factory("relabeling")
class RelabelongFactory:
def __init__(self, nlp: Language, name: str):
self.nlp = nlp
def __call__(self, doc: Doc) -> Doc:
doc = relabeling(self.nlp, doc)
return doc
config.cfg
[nlp]
lang = "ja"
pipeline = ["relabeling"]
[components]
[components.relabeling]
factory = "relabeling"
- Run the setup module (Google Colab)
!python setup.py develop
- Add labels and pipeline (Google Colab)
def add_labels(nlp):
nlp.vocab.strings.add('Org')
return nlp
def add_piplines_other(nlp):
add_piplines_list = ["relabeling"]
print("nlp.pipe_names:", nlp.pipe_names)
for pipline in add_piplines_list:
if pipline in nlp.pipe_names:
nlp.remove_pipe(pipline)
nlp.add_pipe(pipline)
return nlp
- Check the behavior of the added pipeline (Google Colab)
Check the output of relabeling Org in displacy.
import spacy
nlp1 = spacy.load("ja_ginza_electra")
nlp1 = add_labels(nlp1)
nlp1 = add_piplines_other(nlp1)
text = "XXXXXXXXXXXXXXXXX"
doc = nlp1(text)
spacy.displacy.render(doc, jupyter=True, style="ent")
- Generate pip module (Google Colab)
nlp1.to_disk("output/serialize")
!python -m spacy package --force output/serialize output/pip
- ftp the following files to the prodigy trial VM environment
setup.py
extend_module.py
config.cfg
ja_ginza_electra-5.0.0.tar.gz
- Install ja_ginza_electra-5.0.0.tar.gz in prodigy trial version VM environment
pip install ja_ginza_electra-5.0.0.tar.gz
- Run setup.py in prodigy trial VM environment
python setup.py develop
- Run prodigy startup command
prodigy ner.correct example_dataset ja_ginza_electra news_headlines.jsonl --label Org
- I can't check the Org labeling on the prodigy screen.
Thank you very much for your time and help with the above.
