Hi, I'm currently working on a custom NER recipe, I use a regex pattern matching to automatically highlight the spans for annotator.
For eg: If there is a pattern match, the label is assigned and highlighted during annotation.
I want to also add the functionality to accept the examples without explicitly clicking on the ( ) during annotation.
Whichever example matches the regex pattern should directly be accepted and added to the database, I don't want to look at those examples during annotation.
How do I do the automatic accept/reject functionality?
Thanks for your question. Sorry on the delay - our team has been pretty busy
Our team has been thinking about this because we had a similar question a few weeks ago:
Some initial thoughts. If you know the criteria (e.g., using meta data) for when to auto accept and have the rules/patterns, could you just have a script that takes your input file and partitions it into two files?
File 1: what should be annotated (perhaps passing that through a standard loader)
File 2: what would be "auto accepted", outputted as a .jsonl file.
You could then have a script that appends the appropriate metadata (e.g., view_id, accept keys) so that the and adds those examples to the database.
from prodigy.components.db import connect
examples = [{"text": "hello world", "_task_hash": 123, "_input_hash": 456}]
db = connect() # uses settings from prodigy.json
db.add_examples(examples, ["test_dataset"])
But the key would be you'd need to set_hashes and add_tokens too.
Long story short, I agree there could be a better way to do this. Let me talk with a few teammates if we can get started on a cleaner option.
If this is a blocker, I'd recommend in the short term to remove those you want "auto-accepted" first so you can keep labeling. Then we can hopefully pile together a script for the "auto-accept" .jsonl on how to convert them as "auto-accept".