How I can catch type and subtype in email triaging using prodigy NER model

hi @Vishal112,

Thanks for your question.

First, please don't ping specific members of the team on posts. We answer questions round-robin and we try our best to reply as quickly as possible. Pinging a specific teammate won't get them to respond faster.

It sounds like you really want to classify the entire document (email body). Why aren't you considering text classification?

NER is best suited for proper nouns and self-contained expressions like person names or products because it predicts single-token-based tags and takes advantage of clear span boundaries. If you're trying to classify the entire email, you'll need text classification.

For your question about two "types": a "type" and "subtype". This is typically rephrased as nested or hierarchical label problems.

We discuss our suggested approach for textcat hierarchical labels in our docs:

If you’re working on a task that involves more than 10 or 20 labels, it’s often better to break the annotation task up a bit more, so that annotators don’t have to remember the whole annotation scheme. Remembering and applying a complicated annotation scheme can slow annotation down a lot, and lead to much less reliable annotations. Because Prodigy is programmable, you don’t have to approach the annotations the same way you want your models to work. You can break up the work so that it’s easy to perform reliably, and then merge everything back later when it’s time to train your models.

If your annotation scheme is mutually exclusive (that is, texts receive exactly one label), you’ll often want to organize your labels into a hierarchy, grouping similar labels together. For instance, let’s say you’re working on a chat bot that supports 200 different intents. Choosing between all 200 intents will be very difficult, so you should do a first pass where you annotate much more general categories. You’d then take all the texts annotated for some general type, such as information, and set up a new annotation task to sort them into more specific subtypes. This lets the annotators study up on that part of the annotation scheme, so they can make more reliable decisions.

There have been multiple questions on this in the past (I've included some for ner and textcat):

Hope this helps!