Error while using ner.correct

vahuja4 · January 17, 2020, 9:23am

Here is the process that I followed:

step1: Using ner.manual, I created an annotated dataset.

step2: Then, I used prodigy to train a spacy model for the custom entities that I have in my dataset.

prodigy train ner attributes blank:en -o ~/Desktop -TE

The model got saved in ~/Desktop/ner

step3: Improving the model using

prodigy ner.correct evalattributes ~/Desktop 
~/Downloads/desc_data.jsonl 
--label fit,length,neckline,occasion,style,occasion,fit

The error I got is as follows: The model you're using isn't setting sentence boundaries (e.g. via the parser or sentencizer). This means that incoming examples won't be split into sentences.

And, the prodigy UI shows No tasks available

Can you please tell me what am I missing here?

ines · January 17, 2020, 11:59am

Hi! This is not an error and just a warning that you see when sentence segmentation is enabled but the model can't segment sentences (because it doesn't have a rule-based component or a parser). So this shouldn't matter, unless you want sentence segmentation.

This typically means that there no valid examples in the data that haven't been annotated yet. What's in your desc_data.jsonl file? And what's in your evalattributes dataset? Are any of the examples already in that dataset?

vahuja4 · January 17, 2020, 12:06pm

Hi Ines, thank you for the quick reply! Okay, so I can forget about the warning. In desc_data.jsonl, I have text which hasn't been annotated. It could be that there is some duplication, but certainly not all the text has been annotated already. Based on the documentation, I understood that I had to add desc_data.jsonl to the database as well and I named that as evalattributes.

Here is the terminal output confirming that not all of the data has been annotated:
Warning: filtered 76% of entries because they were duplicates. Only 410 items were shown out of 1681. You may want to deduplicate your dataset ahead of time to get a better understanding of your dataset size.

vahuja4 · January 17, 2020, 1:36pm

In the command below, what exactly does dataset refer to?
prodigy ner.correct dataset spacy_model source --loader --label --exclude --unsegmented

ines · January 19, 2020, 12:35pm

Yes, this means that there are a lot of duplicates in the data, but 410 examples were not. However, it could still be possible that those 410 examples are all in the annotated dataset already and are skipped.

The dataset is the name of the dataset to save the annotations to.

Topic		Replies	Views
Implementing ner.correct says the model you are using isn't setting sentence boundaries ner , solved	8	364	July 24, 2023
Getting warning while using ner.correct usage , ner , solved	2	533	April 2, 2020
Prodigy sentence splitting during ner.correct usage , ner , spacy	3	430	February 24, 2021
Error while training NER model usage , spacy , training	4	1854	September 16, 2021
ner.train on data not annotated by Spacy? ner	3	1148	June 11, 2018

Error while using ner.correct

Related topics