Missed the the cat facts example and that sorts me out; thanks much @ines.
Just to share where I goto and the use case incase there are better ways or this helps someone else. Here is what I have:
My Recipe:
@prodigy.recipe(
'srl',
dataset=("Dataset target", "positional", None, str),
model=('Language model', 'positional', None, str),
source=("path to data", "positional", None, str)
)
def srl(dataset, model, source):
nlp = spacy.load(model)
def get_tasks(source):
with jsonlines.open(source, 'r') as rdr:
for eg in rdr:
yield {
TEXT: eg[TEXT],
TOKENS: eg[TOKENS],
OPTIONS: [
{ID: BUY, TEXT: BUY},
{ID: SELL, TEXT: SELL}
],
HTML: displacy.render(nlp(eg[TEXT]), style='dep', page=True)
}
stream = get_tasks(source)
return {
DATASET: dataset,
VIEW_ID: BLOCKS,
STREAM: stream,
CONFIG: {
BLOCKS: [
{VIEW_ID: NER_MANUAL},
{VIEW_ID: CHOICE},
{VIEW_ID: HTML}
],
LABELS: [
V1, ARG1,
V2, ARG2,
V3, ARG3,
ARG12, ARG13, ARG23,
V12, V13, V23
],
}
}
A screenshot of the resulting interface:
Small remaining issue is if I am using keyboard shortcuts (the only way to fly!), then if I hit 1 both V1 in the ner labels and BUY in the choice labels are selected. This is not really a big deal; if I do my ner tagging before my choice tagging, everything works fine...just wanted to point it out in case there is abetter way to implement.
Also just to share the full use case. I am using prodigy to harvest data on semantic dependencies. I will use the dataset with the spacy parser model (your chat semantics recipie), as well as some other implementations as I don't think the spacy dependancy parser can represent some of the nuances I am encountering. For example the V* are predicate heads, and the ARG* are predicate arguments. The reason I have ARG12 is for elliptical cases where a single argument is acting in two structures. For example here is a typical sentence in my data:
"I paid $50 for the Ken Griffey Jr Card ... I am offered now at $65"
Here the "Ken Griffey Jr Card" is an argument for both the predicate "paid" and the predicate "offered". So I would annotate "paid" as V1 and "offered" as V2; while the chunk "Ken Griffey Jr Card" gets the ARG12, meaning its an argument for V1 and V2. I can translate that into CoNLL format latter which can represent overlapping spans, where the Spacy parser cannot (I could be wrong about this?).