Is there a way I can train a binary accept/reject textcat model that finds relevant sentences (the relevant category that we are looking for is rare in the whole dataset)?
Could the model then propose accepts that I can pipe into ner.manual for more detailed annotation? I think, this way, I would be able to get more (relevant) annotations faster.
That's a nice idea and should be possible! Assuming you've trained your text classification model, you could write a custom recipe with a stream like this:
import spacy
from prodigy.components.loaders import JSONL
from prodigy.components.preprocess import add_tokens
# in your recipe:
nlp = spacy.load("./your_textcat_model")
def get_suggestions_from_textcat():
stream = JSONL(source)
for eg in stream:
doc = nlp(eg["text"])
# Use the doc.cats and their scores to decide whether
# you want to send out the example or not
if doc.cats["RELEVANT"] > 0.5:
yield eg
stream = get_suggestions_from_textcat()
# Add "tokens" to each example so it can be rendered in the
# "ner_manual" interface
stream = add_tokens(nlp, stream)
In the example above, it's just checking if the RELEVANT text category has a score of over 0.5. But depending on the text classifier you've trained, you could also come up with a more sophisticated logic here. In the annotation UI, you're now only going to see the texts you selected based on the text categories.