Hi,
I would like to label a dataset for extractive summarization, where the annotation jsonl file would look like this:
{"document": ["line 1", "summary line 2", "line 3"], "meta": {...}}
{"document": ["summary x 1", "x 2", "x 3", "summary x 4"], "meta": {...}}
I am imagining an interface with checkboxes beside each sentence, and ticking the checkbox would indicate its a positive label whereas unchecked indicates negative. Ultimately, I would like the annotated output to look like this:
{"document": ["line 1", "summary line 2", "line 3"], "labels": [0, 1, 0] "meta": {...}}
{"document": ["summary x 1", "x 2", "x 3", "summary x 4"], "labels": [1, 0, 0, 1] "meta": {...}}
Q1) What would be the easiest way to do this in Prodigy (v1.10.4)?
Q2) Would it be possible to do active learning and have an underlying model make predictions for each sentence to speed up labelling?
Q3) Would it be possible to do uncertainty sampling to pick out good samples for labelling?
Thank you!