Usually, a semi-structured data has several fields, for instance - name, description, comment, etc.
I would like to pass this meta information to NER model.
I can format string like “<NAME> name text <DESC> description text <COMM> comment text”. In this case, as I understand, I need to add <NAME>, <DESC>, <COMM> - as special words into the vocabulary and teach tokenizer to keep it as a single token.
Does it make sense to incorporate such information about fields division into input for NER model? I mean for short texts 1-10 words.
Could you suggest the best way to do it with minimal customization default ner.teach / ner.batch-train recipes?
P.S.: Thank for the great Prodigy tool.