Passing additional information to NER model

Algis · November 29, 2018, 8:43am

Usually, a semi-structured data has several fields, for instance - name, description, comment, etc.
I would like to pass this meta information to NER model.
I can format string like “<NAME> name text <DESC> description text <COMM> comment text”. In this case, as I understand, I need to add <NAME>, <DESC>, <COMM> - as special words into the vocabulary and teach tokenizer to keep it as a single token.

Does it make sense to incorporate such information about fields division into input for NER model? I mean for short texts 1-10 words.
Could you suggest the best way to do it with minimal customization default ner.teach / ner.batch-train recipes?

P.S.: Thank for the great Prodigy tool.

honnibal · November 29, 2018, 1:14pm

That’s a reasonable idea. I’d like for it to be easier to add features to the NER model, but currently we don’t have a good solution for that. So, what you’re suggesting makes sense. I think you can just craft your tokens so that the tokenizer naturally keeps them together, something like μCOMMμ should work. The tokenizer should keep that together, and it should be unambiguous enough.

If you don’t want to change the recipes, you could put the data generation or manipulation code into a separate script that writes to stdout. Most of the scripts accept input from stdin, so you can just pipe data from your generator script forward into the recipe.

Topic		Replies	Views
Annotating strings without correct separation ner , best-practices	8	254	November 21, 2024
Advice on training NER models with new entities usage , ner , hr	13	4024	January 25, 2019
Roadmap of having a unified model for tokenizing, NER and dependency parsing using Prodigy ner , spacy , custom , training	1	444	July 7, 2023
Adding Custom Features to Train a NER spaCy Model ner , spacy	1	720	February 16, 2021
NER for Financial Text ner	14	1872	October 25, 2023

Passing additional information to NER model

Related topics