Editing approved NER dataset

Although there are some similar posts, the problem I am facing is a bit different.

Task:

  1. Manually edit already 'accept'ed annotations
  2. add a new label into NER with ner.manual.

Problem:

  1. ner.manual showing 'No tasks available' using the annotated dataset.
  2. ner.manual showing the same thing after db-out and db-in as different dataset. Prodigy db-in documentation (https://prodi.gy/docs/recipes#db-in) not mentioning how to import annotated examples while reverting "answer": "accept" to something else like 'False' or 'ignore'

Output from reading in the export of annotated dataset:

$ prodigy db-in dataset2 ./data/output/v01.jsonl
✔ Created dataset 'dataset2' in database SQLite
✔ Imported 49 annotations to 'dataset2' (session 2020-04-29_14-31-13) in
database SQLite
Found and keeping existing "answer" in 49 examples

Another try that didn't work:

  • Delete ,"answer":"accept" from end of one annotated document. Load as another dataset.
$ prodigy db-in dataset3 ./data/output/v01.jsonl
✔ Created dataset 'dataset3' in database SQLite
✔ Imported 49 annotations to 'dataset3' (session 2020-04-29_14-31-13) in
database SQLite
Found and keeping existing "answer" in 48 examples

Only change is the count in the last line 48 (vs. 49 before deletion). However when I do $ prodigy ner.manual dataset3 prodigy/models/ ./data/output/v01.jsonl --loader jsonl it still shows 'No tasks available.'

Any help is very appreciated!

Hi! In general, re-annotating existing examples shouldn't be a problem and you shouldn't have to modify the "answer" or re-import anything. Prodigy's exported input and output formats are the same, so you can just use your exported JSONL as the source when you start your Prodigy server again. The "answer" key will be overwritten. For example

prodigy db-out your_dataset > ./your_data.jsonl
prodigy ner.manual your_new_dataset blank:en ./your_data.jsonl --label LABEL1,LABEL2

When you re-annotate the examples, just make sure you're using a different dataset name. Otherwise, Prodigy will skip annotations that are already in the data (i.e. all of them), which can lead to the "No tasks available" message you're seeing. You also don't want to be mixing in new annotations with old annotations.

1 Like