Improving NER detection at the end of string


I found that model predict NER at the string end less accurate than NER is surrounded by other words.

my task:
take all unit measure including values from product description string:
text: метилэтилпиридинол 10 мг/мл раствор инфузи 1мл ампул №10
ner correct: (10 мг/мл, 1мл, №10)
problem: ‘№10’ is not detected well in another similar products, sometime: ‘’, ‘№’, ‘10’

my base:
also did ner-gold
Annotations 10282
Accept 5564
Reject 4703
Ignore 15

possible solutions:

  • is any special symbols to tell model to read it correctly BOS, EOS etc?
  • or I have to add something like ’ thisisspecialwordtoindicateendofstring’?
  • or just train more items in database?
1 Like

After adding more data, I fixed my problem. Please delete this topic


I’ve worried that there are end-of-data effects in spaCy before, incidentally. It’s hard to make sure the gradient for the convolutions is correct at the beginnings and ends. I think it’s correct, but problems are always possible.

If you have doubts, you can always manually pad the string with symbols such as -BOS- and -EOS-. I’d be interested to hear if this helps.

1 Like