Annotation tasks finish even when more samples are in the jsonl dataset

Hi, I'm facing issues with an annotation dataset where Prodigy shows no more tasks available even though the jsonl I'm feeding it has tasks. Im using ner.manual so there should not be an active learning component. I've also checked that the remaining data points in the annotation dataset have not already been annotated. I'm having to keep rebooting the prodigy server to get around this. Appreciate any help on this

Hi @shaikh58 , this might be hard to debug unless we have a sample of data, but one possible culprit might be the validity of the JSONL file: are there any special characters in the latter half of the file that wasn't being read? Are there weird newlines? Does Prodigy stop only at those samples? etc.

Thanks Lj for your quick reply. Good point with the special characters, I'll look into that and newlines etc. We haven't faced this issue with any of our other datasets so far so it definitely could be that

Hi LJ, we're running into this issue with other datasets too. Can i somehow share a snippet of the data with you to take a look? if you have an email or some other way to share rather than on the forum, that would be great

Thanks

How are you handling the datasets of collected annotations? Another thing to check would be whether the examples you're annotating are already in the dataset. If they are, Prodigy will skip them so you're only ever asked the same question once.

Hi Ines, thanks for your response. Apologies for the late reply, but after looking into it more, i found that some of our datasets had duplicate values in it and so Prodigy (rightly) ignored them! Thanks for your help