I was wondering if there was a way to build a skip option in addition to, or in place of,
ignore, afaik, doesn't present skipped documents to the annotator again, and instead allows me (the admin) to gather the ignored documents and create a new task.
Use case: we have annotators who have questions about tricky cases, and since our annotators often work off-hours, there may a delay in a response back to them. It would be great if they could skip those and be presented with them again at the end.
If examples are tricky but deserve another look, wouldn't it be best to allow users to flag examples? Flagged examples can be queried with the
db-out command for re-use by setting the
There's a Prodigy Short that explains how to set this up too.
We already flag them and discuss them, but I wanted annotators to be able to revisit them later because sometimes there is a clear answer by the end of the task.
Prodigy doesn't allow too much interaction with the database, as explained here, because it easily gets messy. If users are able to make changes to annotations, you probably also need a way to track who made what change and when.
So instead, here's how I've dealt with this in the past. I make two datasets, say
ner_v2. When I start annotating, everything goes into
ner_v1. I'm fully aware that this
v1 data will be a first draft. Many annotations are correct, but some might need to change later after understanding the problem better.
Then, once there are a few flagged examples, or when some bad labels have been detected, I re-label the relevant candidates and move these annotations to
Then, when it's time to make a model, I have a custom script that gets the examples from
ner_v2. If an example appears in both sets, I always prefer the annotation from
ner_v2. This gives me a final dataset that can be used to train a model.
Other people might have another way to handle their data, but for my projects, this approach has worked quite well.