Annotated Dataset and NER task with Prodigy

ryanwesslen · February 3, 2023, 4:37pm

Sorry for the delay. We're trying to close out old tickets.

By default, ner.correct does sentence segmentation (unlike ner.manual. You can turn it off by adding --unsegmented.

That's tough to confirm. Let me go through a reproducible example of what should happen.

Start with this source file:
nyt_text_dedup.jsonl (18.5 KB)

Step 1: Label 10 records into dataset `ner_correct1`

python -m prodigy ner.correct ner_correct1 en_core_web_sm nyt_text_dedup.jsonl --label LOC

I then labeled the first 10 records. You can see them by running:

$ python -m prodigy print-dataset ner_correct1

Step 2: Rerun but use `--exclude` to exclude records in `ner_correct1`

python3 -m prodigy ner.correct ner_correct2 en_core_web_sm data/nyt_text_dedup.jsonl --exclude ner_correct1 --label LOC

Notice it starts on record 10 (see metadata in bottom right). Therefore, it skipped the first 10 records.

Yes. ner.manual and ner.correct will use based on order of documents. This is different than ner.teach, which uses active learning and will alter the order of the documents based on uncertainty scoring.

Let us know if you have any other questions!

Topic		Replies	Views
Datasets and using pre-annotated data Getting Started usage , solved	23	5516	November 15, 2020
Best strategy for training an NER engine usage , ner	8	2178	December 27, 2017
CSV with NER classifications to dataset usage	1	1562	December 13, 2018
ner.correct: Only 31 annotations to database no matter how many actually annotated everytime ner , database	3	579	March 9, 2021
How can I correct my annotations using the NER.manual recipe?	5	251	May 22, 2023

Annotated Dataset and NER task with Prodigy

Step 1: Label 10 records into dataset ner_correct1

Step 2: Rerun but use --exclude to exclude records in ner_correct1

Related topics

Step 1: Label 10 records into dataset `ner_correct1`

Step 2: Rerun but use `--exclude` to exclude records in `ner_correct1`