NER Training for Corporate Names

From this gold-standard data, is it possible to then do something like to-patterns in order to extend our patterns.jsonl file? Likewise, is it possible to mark a term as "definitely not" the entity in question, if I see that the model has incorrectly predicted it?

Right now, I find myself randomly paging through predictions like so in a Jupyter Notebook:

random_index = np.random.randint(1, len(text_json))
doc = nlp(text_json[random_index]['text'])

for ent in doc.ents:
    print(ent.text, ent._.entity_norm, ent.label_)    

# Visualise the entities
colors = {"BRAND": "linear-gradient(90deg, #aa9cfc, #fc9ce7)"}
options = {"ents": ["BRAND"], "colors": colors}
spacy.displacy.render(doc, style='ent', jupyter=True, options=options)

And then when I find a brand name that doesn't exist in my entity normalization dict (using code inspired by this post, except the entity norm outputs N/A if it cannot be normalized, meaning we don't have a record of the entity yet), I then manually add it to the spreadsheet that my patterns.jsonl is created from.

As i've been sitting here doing that, I assume this must be a task I can manage more effectively with the Prodigy methodology somehow.

Is it possible to only annotate examples that were not caught by the EntityRuler and then export those annotations to patterns.jsonl ?

Can't thank you @ines enough for your on-going support, and apologies for hijacking this thread -- although I do think it's still quite relevant to the original topic :sweat_smile: