text quality estimation


I'm building a parallel corpus where I want to compare German and English sentences.

Basically, I try to reproduce the "Diff" example at the https://prodi.gy/demo page.

As far as I understand Diff is only working for mark, review, and custom recipe.

Which one do I need to use for my use-case?

I have tried the mark recipe with diff as view_id but it seems not to be the right recipe.


Edit: I think that I have figured it out. I assume the "Diff" example uses some kind of translation model and the concept of blocks right?

Hi! From what you describe, it sounds like you might be looking for something like the compare recipe? https://prodi.gy/docs/recipes#compare

It lets you pass in two files with the two outputs you want to compare, and lets you set a --diff flag to show a visual diff. You can also configure whether the A/B mapping should be randomised so you can perform an actual A/B evaluation :slightly_smiling_face:

Alternatively, you can also write a custom recipe and use the diff interface. You can see an example of the expected format here – that's what your stream needs to yield: https://prodi.gy/docs/api-interfaces#diff You can also put together a custom combined interface, e.g. using diff plus choice for multiple choice options, or whatever else you need: https://prodi.gy/docs/custom-interfaces