Annotation Flowchart: Named Entity Recognition

Just shared this on Twitter – if you find flowcharts like this useful, I'm happy to make more :smiley:

Updated on October 18, 2022:

New versions of the NER flowcharts! See our Twitter thread for more details.

37 Likes

This is really helpful @ines! Thanks a lot. Would love one for textcat as well.

9 Likes

Really useful. @ines are there any more in this series?

1 Like

When I try to get the flowchart pdf from the following link:

I get this error:

"Failed to load PDF document."

@Ali What happens when you right click > "Save as"? The file is quite big and depending on your browser, it might not display in the browser.

Just stopping by to say that this is amazing! Really helpful!

1 Like

@kevinruder Thank you, that's nice to hear! :blush:

1. Thanks. For us, NER training is tough and is therefore Prodigy's (and Spacy's) central attraction. We want to add our thanks for this flowchart and our strong encouragement for additional ones to support the most critical areas of Prodigy. As both learners and early deployers of NLP-based solutions, we treat the flowchart as the authoritative springboard into the large corpora of Prodigy (and Spacy) documentation.

2. High-level reality check re DOCUMENTATION. We applaud you for the current release of the documentation set, which is much more coherent than before. Are we generally safe to assume that there are no blatant inconsistencies among the various NER-related API and USAGE segments for both Prodigy and SpaCy? Improvements such as the "collapsed" train functionality are not (yet) reflected in the (current) flowchart, but such are only minor. We just hope to continue relying on the generous guidance you're provided with formal definitions, explanations, code examples, warnings, links, . . . We are repurposing all of these as internal cookbooks to support our development and production.

Thanks! :slightly_smiling_face:

Yes, the recommendations here are still accurate and fully backwards/forwards-compatible. v1.9 just provides a bunch of workflow improvements – e.g. instead of ner.match, you can now use ner.manual with --patterns, or instead of ner.batch-train, you can now also use train.

(That said, best practices and strategies can always change over time. Especially now that transfer learning works so well for NLP, we may end up recommending slightly different workflows in the future.)

I'm just looking at Prodigy to help with generating NER labels. The flowchart is super useful!

How do I learn more about workflow improvements like the ner.match to ner.manual with --patterns noted above? Will there be an updated flowchart with the latest best practices and strategies?

The NER docs include more detailed descriptions and examples of the different workflows: Named Entity Recognition · Prodigy · An annotation tool for AI, Machine Learning & NLP

You can also check out the NER recipe documentation here: Built-in Recipes · Prodigy · An annotation tool for AI, Machine Learning & NLP

For now, the flowchart is fully backwards-compatible with earlier versions of Prodigy, and you can still use all the same recipes and workflows. The new version just introduces more convenient versions of them. If there are more breaking changes and new recommendations, we'll definitely update the flowchart.

1 Like

This is awesome. Visual representations helps us to make decisions wisely. Thank you so much.
We would love to get more flowcharts if possible.

1 Like

Hey, @ines! Any chance we could get an updated version of this flowchart (and others, if you've got the time!) reflecting the shift from sub-recipe training to the top level train command? I can follow the changes via the documentation of the recipes, but I'd love to be able to walk through this with a coworker who is slightly newer than me to ML and prodigy.

@baxtersapp Ah yes, I've had this on my list for a while! Maybe the v1.10 release will be a good opportunity to also update the flowchart :slightly_smiling_face: The main changes would be:

  • ner.batch-traintrain
  • ner.make-goldner.correct
  • ner.matchner.manual with patterns / match

Hi @ines - this is really helpful! I was wondering - do you have a workflow designed for doing Multi-Intent Classification and Slot Tagging in Dialog Conversations simultaneously with prodigy with active learning? or anything similar to this? Thanks alot!

You can definitely put a workflow together for something like this in Prodigy – it just depends on the specifics of the tasks and the structured data you want to collect for it. For intent classification, a simple choice UI should work well, and you can then use span highlighting to annotate the slots and combine the two interfaces? The active learning depends on your model, so you'd have to implement the update callback to update your model in the loop. Here are some relevant docs links:

Excellent how -to
I am just wondering why flowchart suggest "Train new model from scratch" if the number of
new entities is > 3?