Ambiguous NER annotation decisions

Ah okay, sorry! The ner.teach recipe works especially well if you're looking to correct the entity predictions more generally – i.e. with the goal of having your application make less errors overall.

If you want to make more passes over the data and suggest analyses "until Prodigy gets it right", check out the ner.make-gold recipe (see here for details). The recipe helps you create progressively more correct, gold-standard annotations by looping over the data, and suggesting different analyses based on the constraints defined by your previous annotations.

So, in an ideal case, the sequence would look something like this, expressed in the BILUO scheme:

  • "Bill Jerome Holmes", (B-PERSON, L-PERSON, O)REJECT
    (model: "Damn, could have sworn this was a person!")
  • "Bill Jerome Holmes", (U-PERSON, O, O)REJECT
    (model: "Okay, fair enough... how about this?")
  • "Bill Jerome Holmes", (B-PERSON, I-PERSON, L-PERSON)ACCEPT :tada:

Btw, in case you haven't seen it yet, a good way to find out how your model is performing is to use the ner.eval or ner.eval-ab recipes. The examples you see during ner.teach are not always representative, because Prodigy tries to prioritise the ones it's most unsure about, plus the ones that stand out, based on the already collected annotations. (This means it may skip examples with very confident predictions, especially those confirmed by previous annotations).

The binary interface is pretty important to the Prodigy experience and workflow, which is why there's no feature to manually create entity spans and boundaries (for example, by clicking and dragging). So you'd have to do this manually – for example, by adding the correct annotation to your dataset:

{"text": "Bill Jerome Holmes is a person", "spans": [{"start": 0, "end": 18, "label": "PERSON", "text": "Bill Jerome Holmes"}], "answer": "accept"]}

If you're looking for a tool that lets you click/drag/highlight/select, check out Brat. It's more complex, but it'll let you create exact entity spans and boundaries by selecting them. I don't remember what the output format looks like in detail, but you should be able to easily convert it to Prodigy's JSONL format, and add it to your data.

1 Like