From this gold-standard data, is it possible to then do something like to-patterns
in order to extend our patterns.jsonl
file? Likewise, is it possible to mark a term as "definitely not" the entity in question, if I see that the model has incorrectly predicted it?
Right now, I find myself randomly paging through predictions like so in a Jupyter Notebook:
random_index = np.random.randint(1, len(text_json))
doc = nlp(text_json[random_index]['text'])
for ent in doc.ents:
print(ent.text, ent._.entity_norm, ent.label_)
# Visualise the entities
colors = {"BRAND": "linear-gradient(90deg, #aa9cfc, #fc9ce7)"}
options = {"ents": ["BRAND"], "colors": colors}
spacy.displacy.render(doc, style='ent', jupyter=True, options=options)
And then when I find a brand name that doesn't exist in my entity normalization dict (using code inspired by this post, except the entity norm outputs N/A
if it cannot be normalized, meaning we don't have a record of the entity yet), I then manually add it to the spreadsheet that my patterns.jsonl
is created from.
As i've been sitting here doing that, I assume this must be a task I can manage more effectively with the Prodigy methodology somehow.
Is it possible to only annotate examples that were not caught by the EntityRuler
and then export those annotations to patterns.jsonl
?
Can't thank you @ines enough for your on-going support, and apologies for hijacking this thread -- although I do think it's still quite relevant to the original topic