I have a batch of text i’d like to annotate for text classification, and I’m hoping to incorporate behind-the-scene features (i.e. topic scores for each text pre-computed using separate model) as a feature of the text. Is there a way for me to incorporate these hidden features/scores with Prodigy? Ideally I would want Prodigy to show only the text during the annotation process, but also consider the (hidden) topic scores when updating the model.
Hi! What model are you looking to train? Do you want to use spaCy’s text classifier, or a different implementation?
If you’re using spaCy’s text classifier, there’s currently no easy way to add custom features like that, especially not numeric features. But if you’re bringing your own text classification model, you can definitely do this with Prodigy. (If you haven’t seen it yet, check out this example recipe.)
Prodigy lets you add any custom data to the incoming example dictionary and will just pass it through as you annotate. For example, you could just add a "topic_score" field to it, so your task would look like this:
You could compute this beforehand and then export the data, or do it all in a custom recipe and assign the scores (and whatever else you need) as the text streams in. Here’s some pseudocode:
def get_stream():
stream = JSONL("your_data.jsonl")
for eg in stream:
eg["topic_score"] = get_topic_score_from_model(eg["text"])
yield eg
When you annotate the example, you’ll receive back the same object with the annotation and the previously assigned custom data. In your update callback, you can then update the model with it:
def update(answers):
for eg in answers:
if eg["answer"] == "accept":
update_your_model(eg["text"], eg["label"], eg["topic_score"|)
If you do want to see the scores and other meta during annotation, you can add it to the "meta" property (e.g. "meta": {"topic_score": 0}) and it’ll be shown in the bottom right corner of the annotation card in the UI. This might be useful during development. You can always set "hide_meta": true in your config later if you don’t want to show the meta to the annotators.