Skip Functionality

Prodigy doesn't allow too much interaction with the database, as explained here, because it easily gets messy. If users are able to make changes to annotations, you probably also need a way to track who made what change and when.

So instead, here's how I've dealt with this in the past. I make two datasets, say ner_v1 and ner_v2. When I start annotating, everything goes into ner_v1. I'm fully aware that this v1 data will be a first draft. Many annotations are correct, but some might need to change later after understanding the problem better.

Then, once there are a few flagged examples, or when some bad labels have been detected, I re-label the relevant candidates and move these annotations to ner_v2.

Then, when it's time to make a model, I have a custom script that gets the examples from ner_v1 and ner_v2. If an example appears in both sets, I always prefer the annotation from ner_v2. This gives me a final dataset that can be used to train a model.

Other people might have another way to handle their data, but for my projects, this approach has worked quite well.

1 Like