training on a regression task

ryanwesslen · December 19, 2022, 7:00pm

Thanks for your questions and welcome to the Prodigy community

I would recommend creating a custom Prodigy recipe with your existing PyTorch workflow. The good news is there's a text classification template with docs that describe how to do this. I've tried to answer both of your questions directly below.

So there are two parts to the question: the UI/interface (i.e., creating a way to capture the continuous output) and the aligned spaCy component to train your model.

UI/Interface

It is possible to create a custom Prodigy interface that can allow a continuous annotation. This is where you can use the concept of blocks to combine different interfaces.

For example, you can create a slider like this:

Here's a more detailed example of a slider:

The @tannonk also has a helpful GitHub repo with the recipes.

https://github.com/tannonk/prodigy_human_evaluation/tree/master/examples

There may be some HTML/JavaScript customizing that's needed but that can at least allow you to get users' input in a continuous format.

Training / Model

This is more of a challenge. Out-of-the-box, textcat.manual is for training model's using spaCy's textClassifier textcat or multilabel_textcat components. The problem is to my knowledge, neither of those components offer a regression training:

Therefore, if you wanted to train your model with spaCy, you'd need to create a custom spaCy component to handle training of a continuous value (regression).

An alternative: use your existing PyTorch model/setup

Given you have a PyTorch PoC, I would recommend skipping spaCy and use Prodigy to create your own PyTorch workflow.

There's a section in the Text Classification documentation on how to create a custom Prodigy recipe for a different model workflow.

As linked in those docs, I'd recommend starting with this script:

github.com

explosion/prodigy-recipes/blob/master/textcat/textcat_custom_model.py

import prodigy
from prodigy.components.loaders import JSONL
from prodigy.components.sorters import prefer_uncertain
from prodigy.util import split_string
import random
from typing import List, Iterable


class DummyModel(object):
    # This is a dummy model to help illustrate how to use Prodigy with a model
    # in the loop. It currently "predicts" random numbers – but you can swap
    # it out for any model of your choice, for example a text classification
    # model implementation using PyTorch, TensorFlow or scikit-learn.

    def __init__(self, labels: List[str]):
        # The model can keep arbitrary state – let's use a simple random float
        # to represent the current weights
        self.weights = random.random()
        self.labels = labels

This file has been truncated. show original

Yes! See the sub-section in the docs but you can use one of Prodigy's sorters to specify how you want Prodigy to use active learning to modify the order of your records for annotation:

from prodigy.components.sorters import prefer_uncertain

model = Model()
stream = model(stream)
stream = prefer_uncertain(stream)

After implementing this workflow, you'll likely still need to use the slider example above so that annotators can provide their continuous annotation value.

I hope this helps and let us know if you have questions (or please post back if you make progress!).

Topic		Replies	Views
Best Practices for text classifier annotations usage , textcat , best-practices	7	5004	March 24, 2021
Multi-label annotation with Transfer Learning textcat , solved , best-practices	5	980	June 6, 2020
Help needed to get started with text classification usage , textcat	10	3516	January 14, 2019
Email categorization with Prodigy usage , ner , textcat	2	398	February 15, 2022
Reduce the number of categories in textcat project usage , textcat , database	5	251	May 4, 2023

training on a regression task

UI/Interface

Training / Model

An alternative: use your existing PyTorch model/setup

Related topics