Annotated jsonl as source

Dear support,
I’m trying to use the annoted jsonl as source for manuale NER, but I’m not sure the follow the right flow. For example I used:

prodigy dataset my_it9
prodigy ner.manual my_it9 it_core_news_sm covered_warrant.txt -l FIN
prodigy db-out my_it9 export2 -a accept

Then I review the annotations and change them partially:

prodigy ner.manual my_it9 it_core_news_sm ./export2/my_it9.jsonl -l FIN

Now I want to review again the annotations:

prodigy db-out my_it9 export2 -a accept
prodigy ner.manual my_it9 it_core_news_sm ./export2/my_it9.jsonl -l FIN

But at this step I’m not able to re-use the annotated jsonl as source, I mean I don’t see the last marked entities.

Maybe this flow is wrong?

Thanks in advance for any suggestions.


I’ve seen a solution here overwriting annotations but I’m wondering if there is any good flow to accomplish this task as well?


Yes, I think what's going on here is that you're adding your new, reviewed annotations to the same dataset. So your dataset now contains the old annotations, as well as the new ones. By default, Prodigy is designed to always keep a record of each individual annotation decision so you can always reproduce it – that’s also why it doesn’t just silently overwrite existing records in your dataset.

You might find this thread interesting, which discusses a very similar workflow. I've also posted a more advanced recipe to automate the reviewing process and take random samples.