NER Training for Corporate Names

trevorwelch · June 7, 2019, 2:57pm

From this gold-standard data, is it possible to then do something like to-patterns in order to extend our patterns.jsonl file? Likewise, is it possible to mark a term as "definitely not" the entity in question, if I see that the model has incorrectly predicted it?

Right now, I find myself randomly paging through predictions like so in a Jupyter Notebook:

random_index = np.random.randint(1, len(text_json))
doc = nlp(text_json[random_index]['text'])

for ent in doc.ents:
    print(ent.text, ent._.entity_norm, ent.label_)    

# Visualise the entities
colors = {"BRAND": "linear-gradient(90deg, #aa9cfc, #fc9ce7)"}
options = {"ents": ["BRAND"], "colors": colors}
spacy.displacy.render(doc, style='ent', jupyter=True, options=options)

And then when I find a brand name that doesn't exist in my entity normalization dict (using code inspired by this post, except the entity norm outputs N/A if it cannot be normalized, meaning we don't have a record of the entity yet), I then manually add it to the spreadsheet that my patterns.jsonl is created from.

As i've been sitting here doing that, I assume this must be a task I can manage more effectively with the Prodigy methodology somehow.

Is it possible to only annotate examples that were not caught by the EntityRuler and then export those annotations to patterns.jsonl ?

Can't thank you @ines enough for your on-going support, and apologies for hijacking this thread -- although I do think it's still quite relevant to the original topic

Topic		Replies	Views
Company name matching usage , ner	1	1311	March 16, 2020
Transfer Learning for NER usage , ner	6	2506	May 24, 2021
Manual Input of Entities to a prodigy database usage , ner , solved	5	430	July 10, 2021
Text normalization / conversion with Prodigy / spaCy usage , ner , spacy	3	1527	August 20, 2018
strategy for training multiple entities usage , ner	2	624	February 14, 2019

NER Training for Corporate Names

Related topics