The error I got is as follows: The model you're using isn't setting sentence boundaries (e.g. via the parser or sentencizer). This means that incoming examples won't be split into sentences.
Hi! This is not an error and just a warning that you see when sentence segmentation is enabled but the model can't segment sentences (because it doesn't have a rule-based component or a parser). So this shouldn't matter, unless you want sentence segmentation.
This typically means that there no valid examples in the data that haven't been annotated yet. What's in your desc_data.jsonl file? And what's in your evalattributes dataset? Are any of the examples already in that dataset?
Hi Ines, thank you for the quick reply! Okay, so I can forget about the warning. In desc_data.jsonl, I have text which hasn't been annotated. It could be that there is some duplication, but certainly not all the text has been annotated already. Based on the documentation, I understood that I had to add desc_data.jsonl to the database as well and I named that as evalattributes.
Here is the terminal output confirming that not all of the data has been annotated: Warning: filtered 76% of entries because they were duplicates. Only 410 items were shown out of 1681. You may want to deduplicate your dataset ahead of time to get a better understanding of your dataset size.
Yes, this means that there are a lot of duplicates in the data, but 410 examples were not. However, it could still be possible that those 410 examples are all in the annotated dataset already and are skipped.
The dataset is the name of the dataset to save the annotations to.