DPR annotation

joaomsimoes · November 14, 2021, 7:43am

Hi!

I'm developing a pet project where I would need to fine-tune a DPR model. I was wondering if I could use prodigy as an annotation tool to label my data. I would have to build a custom prodigy recipe where it has a question and a span categorizer to categorize answers, positive or negative context. (I think this part is easy since you can create a new recipe). But my question is, it is possible to customize the output json format?

To better understand what I need, this is the output format that I'm looking forward to having:

[
  {
	"question": "....",
	"answers": ["...", "...", "..."],
	"positive_ctxs": ["...."],
	"negative_ctxs": ["..."],
	"hard_negative_ctxs": ["..."]
  },
  ...
]

For the same question, each span would be appended to the corresponding list.

LG,

João

ljvmiranda921 · November 15, 2021, 1:22am

Hi @joaomsimoes , welcome to Prodigy!

For this, you might also want to check the spans.manual recipe, or base your custom recipe from it.

The best way to achieve that is to use the db-out command and write your own transformation step on the JSONL output. From there you can sort the spans into different categories based on the label.

Topic		Replies	Views
Customize recipe for text generation tasks usage , solved	3	349	May 22, 2022
Customise Prodigy interface for NLP Q&A Task with Multiple Questions docs , custom , front-end	1	704	July 13, 2022
Prodigy & Rasa usage , third-party	1	809	October 13, 2020
are the list of tags customizable? usage	1	409	May 20, 2019
Custom JSONL output usage , solved	6	1266	March 13, 2020

DPR annotation

Related topics