training on a regression task

hi @LucySkywalker!

Thanks for your questions and welcome to the Prodigy community :wave:

I would recommend creating a custom Prodigy recipe with your existing PyTorch workflow. The good news is there's a text classification template with docs that describe how to do this. I've tried to answer both of your questions directly below.

So there are two parts to the question: the UI/interface (i.e., creating a way to capture the continuous output) and the aligned spaCy component to train your model.


It is possible to create a custom Prodigy interface that can allow a continuous annotation. This is where you can use the concept of blocks to combine different interfaces.

For example, you can create a slider like this:

Here's a more detailed example of a slider:

The @tannonk also has a helpful GitHub repo with the recipes.

There may be some HTML/JavaScript customizing that's needed but that can at least allow you to get users' input in a continuous format.

Training / Model

This is more of a challenge. Out-of-the-box, textcat.manual is for training model's using spaCy's textClassifier textcat or multilabel_textcat components. The problem is to my knowledge, neither of those components offer a regression training:

Therefore, if you wanted to train your model with spaCy, you'd need to create a custom spaCy component to handle training of a continuous value (regression).

An alternative: use your existing PyTorch model/setup

Given you have a PyTorch PoC, I would recommend skipping spaCy and use Prodigy to create your own PyTorch workflow.

There's a section in the Text Classification documentation on how to create a custom Prodigy recipe for a different model workflow.

As linked in those docs, I'd recommend starting with this script:

Yes! See the sub-section in the docs but you can use one of Prodigy's sorters to specify how you want Prodigy to use active learning to modify the order of your records for annotation:

from prodigy.components.sorters import prefer_uncertain

model = Model()
stream = model(stream)
stream = prefer_uncertain(stream)

After implementing this workflow, you'll likely still need to use the slider example above so that annotators can provide their continuous annotation value.

I hope this helps and let us know if you have questions (or please post back if you make progress!).

1 Like