Is there a way I can train a binary accept/reject
textcat model that finds relevant sentences (the relevant category that we are looking for is rare in the whole dataset)?
Could the model then propose accepts that I can pipe into
ner.manual for more detailed annotation? I think, this way, I would be able to get more (relevant) annotations faster.
Something comparable was asked here: Span annotation with ner.manual -- how to make use of ner.teach
Thanks in advance!
That's a nice idea and should be possible! Assuming you've trained your text classification model, you could write a custom recipe with a stream like this:
from prodigy.components.loaders import JSONL
from prodigy.components.preprocess import add_tokens
# in your recipe:
nlp = spacy.load("./your_textcat_model")
stream = JSONL(source)
for eg in stream:
doc = nlp(eg["text"])
# Use the doc.cats and their scores to decide whether
# you want to send out the example or not
if doc.cats["RELEVANT"] > 0.5:
stream = get_suggestions_from_textcat()
# Add "tokens" to each example so it can be rendered in the
# "ner_manual" interface
stream = add_tokens(nlp, stream)
In the example above, it's just checking if the
RELEVANT text category has a score of over
0.5. But depending on the text classifier you've trained, you could also come up with a more sophisticated logic here. In the annotation UI, you're now only going to see the texts you selected based on the text categories.