Hi @cgreco ,
I posted information about a fix for the next release to address duplicates while using force_stream_order
: Refresh browser fix with force_stream_order - #11 by justindujardin
cgreco:
Ok, I ran into a second issue that may or may not be related. If you drop a dataset then reload it, the samples that were previously annotated in the dataset are still treated as labeled by the annotator and are not sent again.
Example: (With a 100 sample dataset)
prodigy textcat.manual my_dataset data/my_dataset.jsonl --label='MyLable'
Annotator proceeds to annotate 50 examples
An issue is discovered that requires re-annotation of the data
prodigy drop my_dataset
- This should clear out the old data
prodigy textcat.manual my_dataset data/my_dataset.jsonl --label='MyLable'
The total reads as 0 in the Prodigy UI however the annotator is only able to annotate 50 samples, not the full 100
I think this is a separate issue related to the drop
command. Can you confirm that this happens when you are using named sessions, but not if you exclude the ?session=something
argument?