Annotating standard form for entities?

jiri · January 16, 2019, 10:00am

I did not figure out how to display and possibly also annotate (correct) standard forms for entities.

Our entities have span, type and standard form (e.g. United States of America for USA, hour for h, etc.). We managed to setup prodigy for annotation of the span and type, but we had no luck in handling standard forms (at least displaying them for each entity, possibly marking them as incorrect, ideally being able to correct them).

Is this possible at all?

Thanks

ines · January 16, 2019, 12:17pm

Hi! I think this mostly comes down to deciding how you want to display that information in the interface, and how to present it in a smart way that also makes it easy for the annotator to answer. Here's an overview of the main annotation interfaces and the PRODIGY_README.html has more details on the exact data format the expect.

For example, your data could have the following format:

{
    "text": "The USA did stuff",
    "spans": [{"start": 4, "end": 7, "label": "GPE"}],
    "label": "United States of America"
}

Using this input, you could use the classification interface, which renders the "label" on top (in this case, the standard form). You could also build your own fully custom HTML interface that shows the entity and the proposed standardisation. (Above? Below? Maybe inline? This really depends on what's easiest to read and makes most sense for your data.) If there are several options, the choice interface could also be useful to ask about the correct one.

You might also want to try pre-sorting your stream and grouping the same / similar entities together. It's much easier to do, say, 20 "USA" annotations in a row than doing them spread out across the whole data. This also makes it easier to spot potential problems and patterns in the data that indicate problems – because you definitely want to find out about that stuff as early as possible.

I'd also recommend splitting the binary feedback and corrections into several steps. First, only ask if the proposed standard form is correct and get the easy ones out of the way. This might already eliminate a majority of the examples. Next, export the dataset, extract the examples that were rejected, review them (to make sure there's no obvious issue) and ask about those examples again, e.g. in the choice interface.

Btw, speaking of standard forms, you might also find this thread interesting, which discusses a similar task and how to add and access the metadata in spaCy later on:

Topic		Replies	Views
Using a handmade annotation file for model training ner , best-practices	3	1627	June 22, 2018
Store additional information about named entity usage , ner , custom	5	485	March 3, 2020
changing annotations in DB via the interface usage , ner , front-end	2	1180	December 12, 2019
Starting with XML-tagged Corpus usage , ner , solved	2	639	June 28, 2019
annotations imported via db-in not showned ner , done , front-end	2	39	August 31, 2024

Annotating standard form for entities?

Related topics