Trying to teach NER from blank model for Russian language

ines · August 8, 2018, 11:38am

Nice, glad to hear it works now!

This link points to a very old thread, so you probably want to look at more recent discussion, or the docs instead.

You always want to be annotating drug names in context – the model needs to see the full text, not just single words. This thread explains some more of the reasoning behind this, plus possible strategies. For example, you could create a patterns.jsonl file that looks like this:

{"label": "DRUG", "pattern": [{"lower": "aspirin"}]}
{"label": "DRUG", "pattern": [{"lower": "aspirin"}, {"lower": "c"}]}

When you run ner.teach, you can then stream in all of your data and set --patterns patterns.jsonl, to tell Prodigy to select examples in your data that match the patterns (so you can say yes or no to them).

Another suggestion: If possible, try to make sure that your data includes a lot of other non-cyrillic spans that are not DRUG entities. You don't want your model to learn that "every span consisting of latin characters is a drug".

Where do these examples come from? Did you create them manually? Because entity spans are usually annotated as character offsets ("start" and "end"), so the first example here labels the character "э", instead of the full token "эднит".

If you're running ner.teach and the model suggest only partial spans, you should hit reject. This way, you're telling the model "nope, try again!". If you want your model to learn that the correct entity is "aspirin c forte", this is pretty important. Here's some more background on this:

Topic		Replies	Views
Blank spacy model without being trained usage , ner , spacy , solved	6	3337	July 29, 2021
NER and blank models usage , ner , spacy , solved	9	3747	December 11, 2019
Support for Japanese NER support in spacy! ner , spacy , solved	8	2627	January 24, 2019
Annotate using ner.manual for a new language usage , ner , spacy , solved	2	671	October 27, 2019
How do I train a custom ner model? usage , ner , spacy , solved	7	2392	June 25, 2019

Trying to teach NER from blank model for Russian language

Related topics