Display meta info in compare view

Hi all, I’ve made several attempts to display meta info in the compare view as well as a custom recipe based on the compare recipe without success. I feel like I’m missing something trivial, is there an easy way to display meta info in the compare view?

Could you post an example of the task JSON?

The compare interface definitely allows meta information (see here in the demo). It should be a dictionary "meta" at the top level of the JSON object. Maybe you accidentally added the meta in the "input"?

The following format should work:

{
    "id": 1,
    "input": { "text": "NLP" },
    "accept": { "text": "Natural Language Processing" },
    "reject": { "text": "Neuro-Linguistic Programming" },
    "meta": {
        "source": "some source",
        "something_else": "hello"
    }
}

Hi @ines, thanks so much for your reply. I’ll give this format a try. We’re using prodigy for several tasks and getting great milage. We’re currently finalising a large gold set after two rounds of manual NER annotation. In anticipation of Scale becoming available we’ve hacked the ner.eval-ab recipe to take two rounds of NER manual annotation, identify examples that differ, then using the compare view select one as the ‘gold’ (by either accepting or rejecting). Ideally it would be nice to also offer the curators a manual style view to edit spans if neither round (a or b) are judged ‘correct’ but that proved a little difficult so we’re just flagging those examples for subsequent manual editing.

That being said the two input files contain ‘standard’ format JSONL that is output from ner.manual. Here’s an example of the input data for one example from one round.

  "_task_hash": -1393059365,
  "label": "REQUIREMENT",
  "_input_hash": 1265722636,
  "text": "Have broad people management skills",
  "meta": {
    "sidx": 21,
    "jobid": 36918006
  },
  "spans": [
    {
      "token_start": 2,
      "label": "OA",
      "start": 11,
      "end": 28,
      "token_end": 3
    }
  ],
  "answer": "accept"
...
}

The recipe takes two files, round 1 and round 2, compares the input hashes to identify common examples, if the spans differ adds the examples to the stream, shuffles and presents the two rounds of annotation in the compare view. The idea being the chosen answer (accept or reject) will correspond to the ‘correct’ annotation that constitutes the gold set.

                loader=None, label=None, exclude=None, unsegmented=False):
    """
    Compare output from two rounds of annotation and select the preferred version.
    """
    log("RECIPE: Starting recipe ner.gold", locals())

    a_stream = list()
    for qn in get_stream(afile):
      task = dict()
      task['id'] = qn.get('_input_hash')
      task['input'] = { 'text': qn.get('text') }
      task['output'] = { 'text': qn.get('text'), 'spans': qn.get('spans') }
      task['meta'] = qn.get('meta')
      a_stream.append(task)
    b_stream = list()
    for qn in get_stream(bfile):
      task = dict()
      task['id'] = qn.get('_input_hash')
      task['input'] = { 'text': qn.get('text') }
      task['output'] = { 'text': qn.get('text'), 'spans': qn.get('spans') }
      task['meta'] = qn.get('meta')
      b_stream.append(task)

    stream = list(get_compare_questions(a_stream, b_stream, True))

    return {
        'view_id': 'compare',
        'dataset': dataset,
        'stream': stream,
        'on_exit': printers.get_compare_printer('round1', 'round2'),
        'exclude': exclude,
    }

All seems to work well except for the meta information not displaying. I’ve even tried adding 'meta': qn.get('meta') to the task object and the task output but no joy. For some reason the meta just isn’t being picked up.

Perhaps there’s an entirely better way to achieve what we’re trying to do but this is what we’ve come up with so far. Any pointers would be greatly appreciated.

Thanks for the details – I always like reading about the custom workflows people put together :blush:

Ahh, I think I know what might be happening here: check out the get_compare_questions function (which is get_questions from compare.py). It takes the input and output from the task, creates the randomised mapping and reconstructs the annotation task using the data. So the meta in the original task isn't actually used.

So you might want to copy that function and edit it to make sure the meta is included. For example, before you yield out the example, set the example["meta"] to a["meta"] (or b["meta"] – shouldn't matter, because both will be qn.get('meta').

Thanks so much @ines, you guys are absolute champions, love your work!

1 Like