hi @sudarshan85!
Sorry for the delay. We're trying to close out old tickets.
By default, ner.correct
does sentence segmentation (unlike ner.manual
. You can turn it off by adding --unsegmented
.
That's tough to confirm. Let me go through a reproducible example of what should happen.
Start with this source file:
nyt_text_dedup.jsonl (18.5 KB)
Step 1: Label 10 records into dataset ner_correct1
python -m prodigy ner.correct ner_correct1 en_core_web_sm nyt_text_dedup.jsonl --label LOC
I then labeled the first 10 records. You can see them by running:
$ python -m prodigy print-dataset ner_correct1
Step 2: Rerun but use --exclude
to exclude records in ner_correct1
python3 -m prodigy ner.correct ner_correct2 en_core_web_sm data/nyt_text_dedup.jsonl --exclude ner_correct1 --label LOC
Notice it starts on record 10 (see metadata in bottom right). Therefore, it skipped the first 10 records.
Yes. ner.manual
and ner.correct
will use based on order of documents. This is different than ner.teach
, which uses active learning and will alter the order of the documents based on uncertainty scoring.
Let us know if you have any other questions!