Improving NER detection at the end of string

Denis · August 15, 2018, 12:22pm

Hi,

I found that model predict NER at the string end less accurate than NER is surrounded by other words.

my task:
take all unit measure including values from product description string:
text: метилэтилпиридинол 10 мг/мл раствор инфузи 1мл ампул №10
ner correct: (10 мг/мл, 1мл, №10)
problem: ‘№10’ is not detected well in another similar products, sometime: ‘’, ‘№’, ‘10’

my base:
also did ner-gold
Annotations 10282
Accept 5564
Reject 4703
Ignore 15

possible solutions:

is any special symbols to tell model to read it correctly BOS, EOS etc?
or I have to add something like ’ thisisspecialwordtoindicateendofstring’?
or just train more items in database?

Denis · August 17, 2018, 10:42pm

After adding more data, I fixed my problem. Please delete this topic

honnibal · August 20, 2018, 12:16pm

Thanks!

I’ve worried that there are end-of-data effects in spaCy before, incidentally. It’s hard to make sure the gradient for the convolutions is correct at the beginnings and ends. I think it’s correct, but problems are always possible.

If you have doubts, you can always manually pad the string with symbols such as -BOS- and -EOS-. I’d be interested to hear if this helps.

Topic		Replies	Views
Trained NER Model persistently ignoring the last token ner , spacy	1	575	February 13, 2018
NER detection and comma (,) ner	5	2140	March 28, 2018
Questionable results from NER - we must be doing something wrong ner , spacy , best-practices , legal	5	4353	August 30, 2018
Does spacy NER model use POS for modelling enhancement , ner , spacy	3	1225	October 25, 2018
Recipe ner.batch-train results in ValueError: [E030] usage , ner , spacy , solved	10	2448	June 25, 2019

Improving NER detection at the end of string

Related topics