How to evaluate the relabeled results with prodigy

I am trying to verify if it is possible to add a pipeline to re-label the entities and evaluate the final entity results with prodigy.
For example, I want to re-label GOE_Other and Facility_Other to Org and evaluate them with prodigy.

Question1. The following steps do not recognize the relabeled entities on the prodigy side.
Please let me check if my technical approach is correct.

Question2.If I want to evaluate the final relabeled labels in prodigy, which recipe should I use?

The procedure is as follows

  1. Add a pipeline to re-label with Spacy (Google Colab)

setup.py

from setuptools import setup

setup(
    name="extend_module",
    entry_points={
        "spacy_factories": ["relabeling = extend_module:relabeling",
        "relabeling_factory = extend_module:RelabelongFactory"
        ],
        
    }
)

extend_module.py


def delete_old_entities(doc, new_ents):    
  doc_ents = []    
  for doc_ent in doc.ents:
    # print("doc_ent.text:",doc_ent.text)
    is_exist = False
    for ent in new_ents:
      if doc_ent.start_char == ent.start_char:
        is_exist = True
    if is_exist is True:
      continue
    else:
      doc_ents.append(doc_ent)
  return  doc_ents
  
def relabeling(nlp, doc):
    new_ents = []    
    for ent in doc.ents:
      lif (ent.label_ in ["GOE_Other","Facility_Other"]):
        ent.label_ = "Org"            
        new_ent = Span(doc, ent.start, ent.end, label=nlp.vocab.strings['Org'])
        new_ents.append(new_ent)
      else:
        ent.label_ = " "          
        new_ents.append(ent)
            
    doc_ents = delete_old_entities(doc, new_ents)
    filtered = filter_spans(doc_ents + new_ents ) # THIS DOES THE TRICK
    doc.ents = filtered
    return doc    

@Language.factory("relabeling")
class RelabelongFactory:
    def __init__(self, nlp: Language, name: str):
      self.nlp = nlp
        
    def __call__(self, doc: Doc) -> Doc:
        doc = relabeling(self.nlp, doc)
        
        return doc

config.cfg

[nlp]

lang = "ja"

pipeline = ["relabeling"]

[components]

[components.relabeling]
factory = "relabeling"
  1. Run the setup module (Google Colab)
!python setup.py develop
  1. Add labels and pipeline (Google Colab)
def add_labels(nlp):
  nlp.vocab.strings.add('Org')
  return nlp

def add_piplines_other(nlp):

  add_piplines_list = ["relabeling"]
  print("nlp.pipe_names:", nlp.pipe_names)
  for pipline in add_piplines_list:
    if pipline in nlp.pipe_names:
      nlp.remove_pipe(pipline)
    nlp.add_pipe(pipline) 
  
  return nlp  

  1. Check the behavior of the added pipeline (Google Colab)
    Check the output of relabeling Org in displacy.
import spacy
nlp1 = spacy.load("ja_ginza_electra")
nlp1 = add_labels(nlp1)
nlp1 = add_piplines_other(nlp1)
text = "XXXXXXXXXXXXXXXXX"
doc = nlp1(text)
spacy.displacy.render(doc, jupyter=True, style="ent")
  1. Generate pip module (Google Colab)
nlp1.to_disk("output/serialize") 
!python -m spacy package --force output/serialize output/pip
  1. ftp the following files to the prodigy trial VM environment
setup.py
extend_module.py
config.cfg
ja_ginza_electra-5.0.0.tar.gz
  1. Install ja_ginza_electra-5.0.0.tar.gz in prodigy trial version VM environment
pip install ja_ginza_electra-5.0.0.tar.gz
  1. Run setup.py in prodigy trial VM environment
python setup.py develop
  1. Run prodigy startup command
prodigy ner.correct example_dataset ja_ginza_electra news_headlines.jsonl --label Org
  1. I can't check the Org labeling on the prodigy screen.

Thank you very much for your time and help with the above.

This issue has been resolved.
I ran the above procedure once again with the prodigy VM initialized, and it worked fine.
The problem may have been in the pip reinstallation part.
I will close this page.

1 Like

Thanks for the update, glad it got solved :blush:

1 Like