Error while using ner.correct

Here is the process that I followed:

step1: Using ner.manual, I created an annotated dataset.

step2: Then, I used prodigy to train a spacy model for the custom entities that I have in my dataset.

prodigy train ner attributes blank:en -o ~/Desktop -TE

The model got saved in ~/Desktop/ner

step3: Improving the model using

prodigy ner.correct evalattributes ~/Desktop 
~/Downloads/desc_data.jsonl 
--label fit,length,neckline,occasion,style,occasion,fit

The error I got is as follows: The model you're using isn't setting sentence boundaries (e.g. via the parser or sentencizer). This means that incoming examples won't be split into sentences.

And, the prodigy UI shows No tasks available

Can you please tell me what am I missing here?

Hi! This is not an error and just a warning that you see when sentence segmentation is enabled but the model can't segment sentences (because it doesn't have a rule-based component or a parser). So this shouldn't matter, unless you want sentence segmentation.

This typically means that there no valid examples in the data that haven't been annotated yet. What's in your desc_data.jsonl file? And what's in your evalattributes dataset? Are any of the examples already in that dataset?

Hi Ines, thank you for the quick reply! Okay, so I can forget about the warning. In desc_data.jsonl, I have text which hasn't been annotated. It could be that there is some duplication, but certainly not all the text has been annotated already. Based on the documentation, I understood that I had to add desc_data.jsonl to the database as well and I named that as evalattributes.

Here is the terminal output confirming that not all of the data has been annotated:
Warning: filtered 76% of entries because they were duplicates. Only 410 items were shown out of 1681. You may want to deduplicate your dataset ahead of time to get a better understanding of your dataset size.

In the command below, what exactly does dataset refer to?
prodigy ner.correct dataset spacy_model source --loader --label --exclude --unsegmented

Yes, this means that there are a lot of duplicates in the data, but 410 examples were not. However, it could still be possible that those 410 examples are all in the annotated dataset already and are skipped.

The dataset is the name of the dataset to save the annotations to.