Duplicates in ner.correct using 1.11.0a8

daqieq · August 4, 2021, 10:03pm

I played around with this some more again today. The last test I ran was 130 annotations (looping after first 26 examples) that were saved to the SQLite database. However, when I exported the session using db-out there were only 30 annotation examples in the JSONL file.

This issue seems similar to this other one from earlier this year: ner.correct: Only 31 annotations to database no matter how many actually annotated everytime

I'm starting to suspect the TXT loader, but I can't examine the loaders.pyd file. I'll try to set up a test using TXT and JSONL loaders to see if I can replicate using different input files.

Edit for Update:
Well, with my tests below, I confirmed that it wasn't a 'direct' issue with the TXT data loader in ner.correct:

prodigy ner.correct test_ner .\Aug2-Sess1-model\model-best\ Jan_2021_Data_random.jsonl --label 2021_07_16_NER_labels2.txt --exclude 'alit_ner3,test_ner'
prodigy ner.correct test_ner2 .\Aug2-Sess1-model\model-best\ Jan_2021_Data_random.txt --loader txt --label 2021_07_16_NER_labels2.txt --exclude 'alit_ner3,test_ner2'

These are new data files and new datasets, and I ran both 2 times each to check the --exclude logic with empty and some data. I could not replicate the issue with new files and datasets.

However, the problem persists with the original TXT file and datasets. I tried running ner.correct with a different model path with the same result. I'm starting to suspect the existing datasets or the compare to the datasets to exclude examples might be the cause.

I'm going to 'start fresh' so that I can keep moving with my tagging, with more than 25 at a time.

Topic		Replies	Views
How to overwrite/correct annotations? ner , solved	7	2067	September 7, 2021
Duplicates in ner.correct in 1.10.2 done , streams	3	525	August 10, 2020
ner.correct examples repeat ner , done	5	398	December 30, 2021
Duplicated examples over sessions in NER manual ner	7	594	May 19, 2022
how to use ner.correct --update usage , ner , solved	4	686	October 21, 2021

Duplicates in ner.correct using 1.11.0a8

Related topics