Is A/B evaluation workflow gone?

The link to the A/B evaluation workflow now appears to redirect to . Is this feature still supported?

I’m hoping to use the A/B feature for human evaluation. That is, to do pairwise evaluation of model outputs and rank models using the trueskill algorithm. This would presumably work well in conjunction with the active learning / sorting feature.

Yes, absolutely – in fact, we think of A/B evaluation as a super important and often very underrated feature. (Since the original workflow page was actually more of a feature description anways, we ended up merging it with the feature page.)

Your idea sounds cool, so if you end up implementing this, definitely let us know how you go!

Prodigy ships with built-in workflows like ner.eval-ab, but you can also put together your own recipes using the compare interface (see here for an example). It takes data in the following format:

    "id": 1,
    "input": { "text": "NLP" },
    "accept": { "text": "Natural Language Processing" },
    "reject": { "text": "Neuro-Linguistic Programming" },
    "mapping": { "A": "accept", "B": "reject" }

"input" is the original input (optional) and "accept" / "reject" are the outputs displayed in green and red, respectively. Instead of "text", they could also specify "html" or an "image". The "mapping" lets you resolve the two randomised outputs back to the A and B models.

In your custom recipe function, you can create this data however you like and return it as the "stream" setting. In order to use Prodigy’s built in sorter components like prefer_uncertain, your ranking function needs to yield out (score, example) tuples. You can then wrap it in the sorter function:

from prodigy.components.sorters import prefer_uncertain

stream = get_your_ranked_stream()
stream = prefer_uncertain(stream)

You can also find more info and API docs in your PRODIGY_README.html, available for download with Prodigy.

Perfect! Thanks for the response. Will let you know how it gets on. Thanks as well for the academic licence!

1 Like