Make Prodigy "forget" the answers on data import

I have annotated a dataset, but made quite a few errors I'd like to fix. It should be easier for me to revise my old annotations than to start from scratch. I thought that probably exporting the data and then re-loading into a new dataset could do the trick (to go through already annotated texts).

Then I realized that there is an 'answer' field, and perhaps erasing it and/or setting all values to null would help. It did not, on running ner.manual I start from the document where I left off, although I see Found and keeping existing "answer" in 0 examples on import.

Is there any way that could help do the trick (allow me to go through the documents I annotated while preserving the old annotations)?

Hi! If you just want to re-annotate existing data, you don't even have to re-import it – you can just use the exported JSON data as the input source when you start ner.manual and save the results to a new dataset. In Prodigy v1.10+, you can also load input data from existing datasets using the dataset: syntax on the CLI. So instead of the path to a file, you can put dataset:your_dataset_name.

When you re-annotate an example, the "answer" field will be overwritten.

I somehow overlooked an ability to just load the data from the dataset like I would do with a normal JSONL file. Thank you a lot!

1 Like