Annotate for NER and classification at the same time

Hi,
I am working on a project where we are using NER to identify names in the text. To do this, we annotate with Prodigy and form a model from the annotations. Then, once we have those names, we want to classify them into 17 different categories. So we have to spend additional time re-annotating the data for the entity categorization to train another model on them. My question is this:
Is it possible with Prodigy to do both annotations at the same time:

  • Highlight entities for the NER
  • Specify to which category these entities belong

The two models will be trained separately, but simultaneous annotation would save us a lot of time. Do you have any ideas?

Hi! If you have one top-level label with 17 sub-categories, one option would be to express this with a flat label scheme: so you'd have labels like PERSON:SUBCAT and then label both in one go.

That said, one disadvantage of this approach is that it's harder to automate things and you have to make the decision for every single entity and can't easily group things together. For example, let's say you're annotating person names and want to classify the type of person, so for "Barack Obama PERSON", you want to select POLITICIAN. You know that this will be true for all instances of this entity, so it's quite inefficient to select it every time. Instead, you could do a first pass over the data and just label PERSON entities, and in the second pass, group all instances of text + label together, sort them by frequency (so you do the most frequent ones first), and then select the sub-label for all instances. Since the most frequent entities will typically be very common, you'll already have most of your data covered after only very few additional annotations.