Hi all,
I'm trying to use prodigy to create a dataset to train an extractive Question Answering system. As in SQuAD, each sample contains a question
, a context
(ie. a paragraph in natural language, related to the question and likely to contain a possible answer) and the specific answer
, extracted directly from the context. Notice that the answer
is a substring of the context
.
I have already collected a set of pairs of questions and contexts. I need the human annotators to select the span of the context that corresponds with the answer, if any.
So, my initial idea was to combine the functionality of ner.manual
to tokenize the context and select the span of the answer along with some kind of custom HTML view to show for each sample both the question and the context. Is there any better approach? Any hints to tackle this? Thanks in advance,
EDIT: I've found this custom question answering recipe but it doesn't fit my needs.