Annotation Flowchart: Named Entity Recognition

@kevinruder Thank you, that's nice to hear! :blush:

1. Thanks. For us, NER training is tough and is therefore Prodigy's (and Spacy's) central attraction. We want to add our thanks for this flowchart and our strong encouragement for additional ones to support the most critical areas of Prodigy. As both learners and early deployers of NLP-based solutions, we treat the flowchart as the authoritative springboard into the large corpora of Prodigy (and Spacy) documentation.

2. High-level reality check re DOCUMENTATION. We applaud you for the current release of the documentation set, which is much more coherent than before. Are we generally safe to assume that there are no blatant inconsistencies among the various NER-related API and USAGE segments for both Prodigy and SpaCy? Improvements such as the "collapsed" train functionality are not (yet) reflected in the (current) flowchart, but such are only minor. We just hope to continue relying on the generous guidance you're provided with formal definitions, explanations, code examples, warnings, links, . . . We are repurposing all of these as internal cookbooks to support our development and production.

Thanks! :slightly_smiling_face:

Yes, the recommendations here are still accurate and fully backwards/forwards-compatible. v1.9 just provides a bunch of workflow improvements – e.g. instead of ner.match, you can now use ner.manual with --patterns, or instead of ner.batch-train, you can now also use train.

(That said, best practices and strategies can always change over time. Especially now that transfer learning works so well for NLP, we may end up recommending slightly different workflows in the future.)

I'm just looking at Prodigy to help with generating NER labels. The flowchart is super useful!

How do I learn more about workflow improvements like the ner.match to ner.manual with --patterns noted above? Will there be an updated flowchart with the latest best practices and strategies?

The NER docs include more detailed descriptions and examples of the different workflows: Named Entity Recognition · Prodigy · An annotation tool for AI, Machine Learning & NLP

You can also check out the NER recipe documentation here: Built-in Recipes · Prodigy · An annotation tool for AI, Machine Learning & NLP

For now, the flowchart is fully backwards-compatible with earlier versions of Prodigy, and you can still use all the same recipes and workflows. The new version just introduces more convenient versions of them. If there are more breaking changes and new recommendations, we'll definitely update the flowchart.

1 Like

This is awesome. Visual representations helps us to make decisions wisely. Thank you so much.
We would love to get more flowcharts if possible.

1 Like

Hey, @ines! Any chance we could get an updated version of this flowchart (and others, if you've got the time!) reflecting the shift from sub-recipe training to the top level train command? I can follow the changes via the documentation of the recipes, but I'd love to be able to walk through this with a coworker who is slightly newer than me to ML and prodigy.

@baxtersapp Ah yes, I've had this on my list for a while! Maybe the v1.10 release will be a good opportunity to also update the flowchart :slightly_smiling_face: The main changes would be:

  • ner.batch-traintrain
  • ner.make-goldner.correct
  • ner.matchner.manual with patterns / match

Hi @ines - this is really helpful! I was wondering - do you have a workflow designed for doing Multi-Intent Classification and Slot Tagging in Dialog Conversations simultaneously with prodigy with active learning? or anything similar to this? Thanks alot!

You can definitely put a workflow together for something like this in Prodigy – it just depends on the specifics of the tasks and the structured data you want to collect for it. For intent classification, a simple choice UI should work well, and you can then use span highlighting to annotate the slots and combine the two interfaces? The active learning depends on your model, so you'd have to implement the update callback to update your model in the loop. Here are some relevant docs links:

Excellent how -to
I am just wondering why flowchart suggest "Train new model from scratch" if the number of
new entities is > 3?

Thanks!

This is just a very rough rule of thumb but the idea is this: if you're starting off with an existing model, its weights will be based on the entity labels it was trained on. If you're now introducing many new labels, you're constantly "fighting" the existing weights, and there's a high chance you end up with conflicts and are trying to teach the model to suddenly predict something very differently from what it learned before, which can potentially even mess up other entity types. You may end up with very confusing results and weights that are difficult to reason about. So if you're looking to introduce many new labels, it's often cleaner and more efficient to train new weights from scratch.

Thank you for the explanation.

1 Like

Now do Textcat! :wink:

Hi there, any chance the flow chart can be updated to replace the deprecated items? It's been great having a flow chart to follow, but being new to the software it's been a challenge trying to understand what i'm doing wrong when it comes to using ner.teach vs. ner.correct etc.

1 Like

The link is broken :slightly_frowning_face:

Thank you @crtnx! It was an out-dated link. I updated the link.

Also, some good news: be on the lookout for an updated NER workflow very soon on social media or we'll post it back here :grin:

The recommendations will be the same but it'll update the syntax. Also, we we're adding several links from the documentation or Prodigy support issues to provide context for many of the decision nodes.

We also have plans soon to move onto other models like spancat and textcat too!

We just released the updated version of the NER annotation flowchart! :rocket:

We now have both a light and dark version of the flowchart.

1 Like

Just noticed that there is a "typo" in the light version.

screenshot-prodigy-ner-flowchart-v2-0-0-light

When training a new model from scratch you basically always end with the advice that you should stick to rules instead of training a new model =)

So the 2 should be a 1 (export model) as in the dark version.

1 Like

Thank you Benjamin! The light version has been updated.

1 Like