Yes, I definitely see the problem here, hmm One option would be to start by doing the 75% first and then deal with the remaining 25% with multiple relations separately.
In general, Prodigy avoids anything related to having the annotator modify the actual examples to annotate (e.g. duplicate examples) because that's something that should typically be decided on the development level. But this case is kind of an exception here. One option could be to set "instant_submit": true
, which will send annotated examples back immediately. And in your recipe, you could then keep sending the same example (e.g. with a while True
loop) and only break and move on to the next one if you've received the same example with a "reject"
answer. So once there are no more relations in the example, you reject it empty and the server moves on to the next. (Just make sure to use distinct "_task_hash"
values for the duplicates to prevent Prodigy from filtering them.)
Also, on a related note: This thread actually inspired me to do some experimentation for use cases like this to better support this type of relation annotation, while also keeping the workflow efficient. Still super experimental, but here's the one screenshot I remembered to take, complete with messy test data
The idea would be to allow streaming in pre-tokenized data with merged entities and phrases (and optionally disable all other tokens to only make the relevant spans selectable). Don't have an ETA yet for the beta, but I'll definitely share it on the forum once it's ready for testing