using existing dataset as the input source for annotation

PaulJay1990 · April 11, 2023, 2:47pm

can I load annotated data to do a correction?

ryanwesslen · April 11, 2023, 3:20pm

hi @PaulJay1990!

Thanks for your question and welcome to the Prodigy community

Yes!

The title of your post mentioned "Existing dataset as the input", which would mean you have a Prodigy dataset with the annotations.

Example: Correct from a Prodigy dataset with annotations

As mentioned in the docs, you can use the dataset: prefix in your source:

The dataset: syntax lets you specify an existing dataset as the input source. Prodigy will then load the annotations from the dataset and stream them in again. Annotation interfaces respect pre-defined annotations and will pre-select them in the UI. This is useful if you want to re-annotate a dataset to correct it, or if you want to add new information with a different interface. The following command will stream in annotations from the dataset ner_data and save the resulting reannotated data in a new dataset ner_data_new:

Example: review all dataset annotations

prodigy ner.manual ner_data_new blank:en dataset:ner_data --label PERSON,ORG

Optionally, you can also add another : plus the value of the answer to load if you only want to load examples with specific answers like "accept" or "ignore"

Example: review only accepted

prodigy ner.manual ner_data_new blank:en dataset:ner_data:accept --label PERSON,ORG

Correct from an a file with annotations

Also, not sure if you're also asking on how to annotated an existing file with annotations. If the annotations are in the correct format (see the Annotation interfaces for examples of formats per different annotation interfaces), then the annotations should show up automatically.

Example

Let's say you have ner data labeled:
annotated_news_headlines.jsonl (252.9 KB)

Here's an example:

{
  "text": "How Silicon Valley Pushed Coding Into American Classrooms",
  "meta": {
    "source": "The New York Times"
  },
  "_input_hash": 1842734674,
  "_task_hash": 636683182,
  "tokens": [
    {
      "text": "How",
      "start": 0,
      "end": 3,
      "id": 0
    },
    {
      "text": "Silicon",
      "start": 4,
      "end": 11,
      "id": 1
    },
    {
      "text": "Valley",
      "start": 12,
      "end": 18,
      "id": 2
    },
    {
      "text": "Pushed",
      "start": 19,
      "end": 25,
      "id": 3
    },
    {
      "text": "Coding",
      "start": 26,
      "end": 32,
      "id": 4
    },
    {
      "text": "Into",
      "start": 33,
      "end": 37,
      "id": 5
    },
    {
      "text": "American",
      "start": 38,
      "end": 46,
      "id": 6
    },
    {
      "text": "Classrooms",
      "start": 47,
      "end": 57,
      "id": 7
    }
  ],
  "_session_id": null,
  "_view_id": "ner_manual",
  "spans": [
    {
      "start": 4,
      "end": 18,
      "token_start": 1,
      "token_end": 2,
      "label": "LOCATION"
    }
  ],
  "answer": "accept"
}

You can then run this input data as you would unannotated data:

python -m prodigy ner.manual issue-6489 blank:en data/annotated_news_headlines.jsonl --label PERSON,ORG,LOCATION
Using 3 label(s): PERSON, ORG, LOCATION

Hope this helps!

PaulJay1990 · April 11, 2023, 11:41pm

Thank you.

Topic		Replies	Views
NER: Pass annotated data set to Prodigy for validating / small corrections usage , ner , review	1	509	February 20, 2020
How to modify dataset marked?	1	185	March 20, 2023
Modify/reannotate existing documents usage , solved , streams	2	703	January 13, 2021
Make Prodigy "forget" the answers on data import usage , database , solved	2	534	November 4, 2020
Reviewing/Editing annotated data usage , review , streams	1	966	June 23, 2020