Ambiguous NER annotation decisions

Thanks so much! We are aware that Prodigy introduces a lot of new (and sometimes quite surprising) concepts, and that users might have a lot of questions around the usage and best practices. So we're trying our best to provide as much information as possible :blush:

Do you mean, change annotations collected in a previous session or dataset? You could export the existing dataset to a file using the db-out command, and then re-annotate it using the mark recipe, which will disable any active learning logic and will simply ask you about feedback on the exact examples, in order. You can then store the result in a new dataset:

prodigy db-out my_bad_dataset /tmp  # save dataset to a file
prodigy mark new_dataset /tmp/my_bad_dataset.jsonl  # reannotate exact data

If you've added multiple sessions to the same dataset (i.e. started the Prodigy server multiple times), each annotation session will also be stored as a separate session dataset, using the timestamp as its name – for example, 2017-11-21_03-33-39.

The session ID is printed after you exit the server. You can also find all session IDs by running prodigy stats with the flag -ls. A nice way to preview a session and check if it's the one you're looking for is to use the ner.print-dataset recipe (which gives you pretty output like this):

prodigy stats -ls  # show stats and list all datasets and session names
# pretty-print the session dataset to preview it (use -r flag to preserve nice colors )
prodigy ner.print-dataset "2017-11-21_03-33-39" | less -r

If you only want to annotate specific labels or examples, you might have to pre-process the exported file to only re-annotate parts of it, and then merge it all back together (there's also a db-in command that lets you import files to a dataset).

The nice thing about JSONL is that it can be read in line by line, and is generally very easy to work with in Python. So you can write your own functions and scripts to structure your workflow however you like. (This is also part of the Prodigy philosophy btw – instead of giving you a parallel language and complex, arbitrary configuration API that you need to remember, Prodigy covers the basics and lets you plug in your own code and Python functions as custom recipes. Matt's comment on this thread has some more details on this.)

1 Like