How to write a recipe comparing two strings?

I want to generate a recipe to compare to strings to determine if they are semantically the same. Which is the best way of doing this? And can it be done?

Thanks for the question! Here are some ideas and details for a possible annotation strategy:

Simple solution for collecting annotations

Since you want to collect direct feedback on whether two texts are the same, the annotation interface should probably show the two texts, and ask for accept (texts are semantically similar / the same) or reject. So ideally, your annotation task should consist of two text keys, which are then displayed together in one annotation card. The best and most elegant way to get this across would be to create a custom HTML template.

Let’s assume your input is a JSONL file that looks like this:

{"text1": "I love pizza", "text2": "pizza is great"}
{"text1": "I love pizza", "text2": "fast food sucks"}

In your recipe, you can use the built-in JSONL loader to transform the file into a stream of annotation tasks, and then add a html_template to the config. All task properties are available as Mustache variables – e.g. if your task contains a "text1" key, you can refer to it as the template variable {{text1}}. For simplicity, I’m just adding two line breaks between the texts – but you can obviously choose any formatting you want.

import prodigy
from prodigy.components.loaders import JSONL

def compare_strings(dataset, input_file):
    stream = JSONL(input_file)
    html_template = '{{text1}}<br/><br/>{{text2}}' # format this however you want

    return {
        'dataset': dataset,
        'stream': stream,
        'view_id': 'html',
        'config': {'html_template': html_template}

You can then run the recipe as follows:

prodigy compare-strings my_dataset input_file.jsonl -F # path to recipe file

The annotated tasks will be saved to your dataset, and when you’re done, you can export them to a file, and use them to train your model however you like:

prodigy db-out my_dataset /output/directory

Annotating with a model in the loop

Prodigy currently doesn’t come with a built-in model to do exactly what you want, so you’d either have to plug in your own, or collect the annotations and then use them in your own training process afterwards.

To make use of the active learning component, you’ll need:

  1. A model that comes with a method to rank a stream of examples and output a score for each example.
  2. An update function that receives a list of annotated examples and updates your model.

You can use any format and machine learning library of your choice – you just need to be able to load your model in Python. You can then use one of the built-in sorters in prodigy.components.sorters, for example prefer_uncertain or prefer_high_scores. The sorters take a stream of (score, example) tuples and return a stream of annotation tasks that can be consumed by Prodigy. The components returned by your recipe could then look like this:

return {
    'dataset': dataset,
    'stream': prefer_uncertain(model(stream)),
    'update': model.update,
    'view_id': 'html',
    'config': {'html_template': html_template}

Every time Prodigy receives a new batch of annotated tasks, it will call the update function with the list of examples. The updated model will then re-rank the stream and propose relevant examples for annotation whenever a new batch of tasks is requested from the stream.

Other ideas to explore

If your data contains a lot of overlapping or very subtle semantic similarities, another approach could be to use the choice annotation interface and annotate multiple options compared to one text at a time. This would let you shuffle the texts, and annotate them in relation to a base text. For example, your input could look like this:

    "text": "i love pizza", 
    "options": [
        {"id": 1, "text": "pizza is great"},
        {"id": 2, "text": "fast food sucks"},
        {"id": 3, "text": "i'm quite fond of pizzas"}

In your recipe, you’d simply stream in the examples and use the choice interface with 'choice_style': 'multiple', to allow multiple selections.

def compare_string_options(dataset, input_file):
    stream = JSONL(input_file)
    return {
        'dataset': dataset,
        'stream': stream,
        'view_id': 'choice',
        'config': {'choice_style': 'multiple'}

As you annotate, Prodigy will add a selected key containing the IDs of the selected options to your annotation task – for example "selected": [1, 3].

Ultimately, the best approach to solve this problem depends on your data, and the best way to find out what works best is to run a few annotation experiments using different interfaces and ideas.


You might also want to check out the A/B evaluation workflow for details on how to evaluate the model(s) you train on the data. Since similarity is always subjective, you definitely want some feedback on how your model is performing, and how its predictions compare to a human’s.

I am trying to compare a prodigy/spacy trained text classification model with manual user established sentences (collected outside of prodigy). We have implemented a version of the compare-strings recipe that reads a jsonl file containing the spacy model results and the manual user values.

Once the annotations are complete, what is the best way to calculate precision, recall, and accuracy? Or would it be better to implement this comparison as an ab-evaluation?