Binary "pre-model" for faster annotation

blume · December 10, 2019, 8:48am

Hello!

Is there a way I can train a binary accept/reject textcat model that finds relevant sentences (the relevant category that we are looking for is rare in the whole dataset)?
Could the model then propose accepts that I can pipe into ner.manual for more detailed annotation? I think, this way, I would be able to get more (relevant) annotations faster.

Something comparable was asked here: Span annotation with ner.manual -- how to make use of ner.teach

Thanks in advance!

ines · December 10, 2019, 4:40pm

That's a nice idea and should be possible! Assuming you've trained your text classification model, you could write a custom recipe with a stream like this:

import spacy
from prodigy.components.loaders import JSONL
from prodigy.components.preprocess import add_tokens

# in your recipe:
nlp = spacy.load("./your_textcat_model")

def get_suggestions_from_textcat():
    stream = JSONL(source)
    for eg in stream:
        doc = nlp(eg["text"])
        # Use the doc.cats and their scores to decide whether
        # you want to send out the example or not
        if doc.cats["RELEVANT"] > 0.5:
            yield eg

stream = get_suggestions_from_textcat()
# Add "tokens" to each example so it can be rendered in the
# "ner_manual" interface
stream = add_tokens(nlp, stream)

In the example above, it's just checking if the RELEVANT text category has a score of over 0.5. But depending on the text classifier you've trained, you could also come up with a more sophisticated logic here. In the annotation UI, you're now only going to see the texts you selected based on the text categories.

Topic		Replies	Views
Is it possible to do NER and Textcat Annotation together? ner , textcat	4	37	October 28, 2024
textcat.manual? usage , ner , textcat , solved	4	1604	March 29, 2019
Training new model using annotations from ner.manual ner , spacy	2	681	June 28, 2018
Span annotation with ner.manual -- how to make use of ner.teach ner	6	859	December 3, 2019
Two Questions on Teach recipes usage , ner , textcat , solved	5	743	January 27, 2020

Binary "pre-model" for faster annotation

Related topics