Trained NER Model persistently ignoring the last token

sginn · February 13, 2018, 7:53pm

I've been working on training new entities for food items.
My model is performing fairly well, except when the entity appears as the final token in input.

If I append any token to my input example, I get the right result.
See the same example phrase copied from ner.print-stream output below:

Hello! Can I please place a take away order for  4  QUANTITY  
special  PASTA   meatballs  SAUCE ,  5  QUANTITY   marinara  SAUCE  gnocchi

vs

Hello! Can I please place a take away order for  4  QUANTITY 
special  PASTA   meatballs  SAUCE ,  5  QUANTITY   marinara  SAUCE
gnocchi  PASTA .

At first I thought it was a problem with training data.
I had ~900 example phrases, 500 of them generated using ner.make-gold and the others from some additional samples ner.teach. In the data, I had several examples with "Can I order {items}?"

Thinking maybe the model had overfit that form, I added an equal set of examples using the make-gold without the trailing ?; and still led to the same result.

The next thing I tried was pruning all the training examples that had trailing punctuation completely and retraining from scratch. Still the same result.

from prodigy.components.db import connect
DB = connect()
orig_dataset = DB.get_dataset('ner_gold')
pruned_dataset = []
for entry in orig_dataset:
    if entry['text'].endswith('?') or entry['text'].endswith('.'):
        continue
    pruned_dataset.append(entry)
DB.add_dataset('ner_gold_no_punct', {'description': '', 'author': ''})
DB.add_examples(pruned_dataset, 'ner_gold_no_punct')

prodigy ner.batch-train ner_gold_no_punct en_core_web_lg -o models/ner_no_punct -l PASTA,SAUCE,...

is there something I'm missing? Have others encountered similar challenges?

honnibal · February 13, 2018, 11:11pm

This sounds like a bug, thanks. I can think of a few ways I could have messed this up — and it wouldn’t normally show up in the evaluations, because most of the training and evaluation data inputs end with punctuation.

Topic		Replies	Views
Spacy tags punctuations usage , ner , spacy , solved	3	558	November 19, 2018
Improving NER detection at the end of string usage , ner , solved	2	481	August 20, 2018
Recipe ner.batch-train results in ValueError: [E030] usage , ner , spacy , solved	10	2448	June 25, 2019
KeyError: 'token_end' when trying to use ner.batch-train ner , done	9	860	June 7, 2019
NER not containing <word_list> usage , ner , spacy	11	1249	September 9, 2019

Trained NER Model persistently ignoring the last token

Related topics