Best way to annotate rare labels for classification

ines · January 17, 2019, 2:55pm

Ah, sorry about that. Here's the source for combine_models, it's pretty straightforward:

from toolz.itertoolz import interleave, partition_all

def combine_models(one, two, batch_size=32):
    """Combine two models and return a predict and update function. Predictions
    of both models are combined using the toolz.interleave function. Mostly
    used to combine an EntityRecognizer with a PatternMatcher.
    one (callable): First model. Requires a `__call__` and `update` method.
    two (callable): Second model. Requires a `__call__` and `update` method.
    batch_size (int): The batch size to use for predicting the stream.
    RETURNS (tuple): A `(predict, update)` tuple of the respective functions.
    """

    def predict(stream):
        for batch in partition_all(batch_size, stream):
            batch = list(batch)
            stream1 = one(batch)
            stream2 = two(batch)
            yield from interleave((stream1, stream2))

    def update(examples):
        loss = one.update(examples) + two.update(examples)
        return loss

return predict, update

Yep, your sorter function is pretty much exactly how it should be.

Sorters are functions that take a stream of (score, example) tuples (as produced by Prodigy's built in models) and yield examples. The built-in sorters like prefer_uncertain and prefer_high_scores use an exponential moving average to decide whether to yield out an example or not. This prevents it from getting stuck if there's no even distribution of scores.

But you can also incorporate your own logic that's more specific. For example, you could check how many examples are already annotated and use that to calibrate the bias. You could also check for custom metadata you've added to example["meta"] or other properties (maybe you want to prioritise examples with longer text over examples with shorter text, or something like that).

Topic		Replies	Views
Best Practices for text classifier annotations usage , textcat , best-practices	7	5001	March 24, 2021
Practical use of rejected textcat.teach annotations for downstream tasks	2	88	May 24, 2024
from textcat.manual to textcat.teach usage , textcat , best-practices	1	570	February 13, 2022
Multi-label annotation with Transfer Learning textcat , solved , best-practices	5	980	June 6, 2020
Text Classification: labeling usage , textcat	1	538	March 22, 2019

Best way to annotate rare labels for classification

Related topics