DPR annotation


I'm developing a pet project where I would need to fine-tune a DPR model. I was wondering if I could use prodigy as an annotation tool to label my data. I would have to build a custom prodigy recipe where it has a question and a span categorizer to categorize answers, positive or negative context. (I think this part is easy since you can create a new recipe). But my question is, it is possible to customize the output json format?

To better understand what I need, this is the output format that I'm looking forward to having:

	"question": "....",
	"answers": ["...", "...", "..."],
	"positive_ctxs": ["...."],
	"negative_ctxs": ["..."],
	"hard_negative_ctxs": ["..."]

For the same question, each span would be appended to the corresponding list.



Hi @joaomsimoes , welcome to Prodigy!

For this, you might also want to check the spans.manual recipe, or base your custom recipe from it. :slight_smile:

The best way to achieve that is to use the db-out command and write your own transformation step on the JSONL output. From there you can sort the spans into different categories based on the label.

