Annotating standard form for entities?

I did not figure out how to display and possibly also annotate (correct) standard forms for entities.

Our entities have span, type and standard form (e.g. United States of America for USA, hour for h, etc.). We managed to setup prodigy for annotation of the span and type, but we had no luck in handling standard forms (at least displaying them for each entity, possibly marking them as incorrect, ideally being able to correct them).

Is this possible at all?


Hi! I think this mostly comes down to deciding how you want to display that information in the interface, and how to present it in a smart way that also makes it easy for the annotator to answer. Here's an overview of the main annotation interfaces and the PRODIGY_README.html has more details on the exact data format the expect.

For example, your data could have the following format:

    "text": "The USA did stuff",
    "spans": [{"start": 4, "end": 7, "label": "GPE"}],
    "label": "United States of America"

Using this input, you could use the classification interface, which renders the "label" on top (in this case, the standard form). You could also build your own fully custom HTML interface that shows the entity and the proposed standardisation. (Above? Below? Maybe inline? This really depends on what's easiest to read and makes most sense for your data.) If there are several options, the choice interface could also be useful to ask about the correct one.

You might also want to try pre-sorting your stream and grouping the same / similar entities together. It's much easier to do, say, 20 "USA" annotations in a row than doing them spread out across the whole data. This also makes it easier to spot potential problems and patterns in the data that indicate problems – because you definitely want to find out about that stuff as early as possible.

I'd also recommend splitting the binary feedback and corrections into several steps. First, only ask if the proposed standard form is correct and get the easy ones out of the way. This might already eliminate a majority of the examples. Next, export the dataset, extract the examples that were rejected, review them (to make sure there's no obvious issue) and ask about those examples again, e.g. in the choice interface.

Btw, speaking of standard forms, you might also find this thread interesting, which discusses a similar task and how to add and access the metadata in spaCy later on: