two EntityRecognizers

i just started working on prodigy. Based upon EntityLinking example, it looks there are prodigy.models.ner.EntityRecognizer and spacy.pipeline.EntityRecognizer.

all these classes under Prodigy should be under spacy? How these packages are different? Thanks in advance.

Hi there!

could you link the example that you're referring to? There are a bunch of examples out there these days and it helps me to understand which one you are using.

With regards to the implementation; yes there are differences between spaCy pipeline components and model classes in Prodigy. The goal of the Prodigy models is to eventually have a JSON representation that the Prodigy fron-tend can render nicely. The Prodigy model classes all assume spaCy under the hood, but it's typically the Prodigy models that you'll want to use to get the items to render nicely.

Let me know if you appreciate more context!

i am referring to https://github.com/explosion/projects/blob/v3/tutorials/nel_emerson/scripts/el_recipe.py based upon v3 and I did paste the section here. As you can see "from prodigy.models.ner.import EntityRecognizer", but EnttiyRecognizer is at EntityRecognizer · spaCy API Documentation from spacy.pipeline.ner. EntityRecognizer from both packages look working. I see the example developed with v2.x, and it has upgraded version with v3.x. Please let me know. Thanks,

import spacy
from spacy.kb import KnowledgeBase

import prodigy
from prodigy.models.ner import EntityRecognizer**
from** prodigy.components.loaders import TXT
from prodigy.util import set_hashes
from prodigy.components.filters import filter_duplicates

import csv
from pathlib import Path

@prodigy.recipe(
"entity_linker.manual",
dataset=("The dataset to use", "positional", None, str),
source=("The source data as a .txt file", "positional", None, Path),
nlp_dir=("Path to the NLP model with a pretrained NER component", "positional", None, Path),
kb_loc=("Path to the KB", "positional", None, Path),
entity_loc=("Path to the file with additional information about the entities", "positional", None, Path),
)
def entity_linker_manual(dataset, source, nlp_dir, kb_loc, entity_loc):
# Load the NLP and KB objects from file
nlp = spacy.load(nlp_dir)
kb = KnowledgeBase(vocab=nlp.vocab, entity_vector_length=1)
kb.load_bulk(kb_loc)
model = EntityRecognizer(nlp)

# Read the pre-defined CSV file into dictionaries mapping QIDs to the full names and descriptions
id_dict = dict()
with entity_loc.open("r", encoding="utf8") as csvfile:
    csvreader = csv.reader(csvfile, delimiter=",")
    for row in csvreader:
        id_dict[row[0]] = (row[1], row[2])

# Initialize the Prodigy stream by running the NER model
stream = TXT(source)
stream = [set_hashes(eg) for eg in stream]
stream = (eg for score, eg in model(stream))

The entity recogizer from spaCy that you link to (this one) is referring to a spaCy pipeline component. That is a different object than the entity recognizer object in Prodigy. As mentioned above:

I can understand the confusion because they have the same name, but the spaCy class is meant to be used inside of a spaCy nlp pipeline object while the EntityRecognizer from Prodigy is meant to add entity information to a steam to be used for annotation in Prodigy.

Does this help? If not, feel free to elaborate where the confusion is.

so these are different. very confusing approach from one company. Thanks,