How to perform automatically NER annotation based on patterns?

ines · June 2, 2021, 1:33am

Hi! In that case, you can just go directly via spaCy, for example, using the entity ruler: https://spacy.io/usage/rule-based-matching#entityruler

It lets you add your patterns and will add all matches to the doc.ents, just like an entity recognizer. You can then use that nlp object to process your texts and extract the pattern-based NER annotations. In theory, you don't even have to go through Prodigy at all and you could just export the data and train with spaCy directly. But if you want to mix these annotations in with other annotations you've created manually, you can create data in Prodigy's format pretty easily using the processed doc:

doc = nlp("This is a text")
spans = [{"start": ent.start_char, "end": ent.end_char, "label": ent.label} for ent in doc.ents]
example = {"text": doc.text, "spans": spans}

You can then add it to a dataset using Prodigy's database API: https://prodi.gy/docs/api-database#database Or alternatively, using the db-in command. I'd recommend setting up a separate dataset for your automatically generated annotations – if there's a bug in your code or a pattern that you want to improve, you can just remove the dataset and re-add it (which is harder if you've added the data to the same set as your manual annotations).

Topic		Replies	Views
Store the annotation obtained by ner.manual and --patterns at once usage , ner , spacy , solved	4	662	June 28, 2021
Pre Annotate Data with Pattern ner	3	539	December 9, 2021
a question about custom recipe usage , solved	9	675	July 18, 2021
match pattern work in spacy but does not work in prodigy usage , ner , spacy	2	436	January 25, 2021
Patterns with dependency annotation ner	3	294	February 21, 2023

How to perform automatically NER annotation based on patterns?

Related topics