Annotation Flowchart: Named Entity Recognition

Just shared this on Twitter – if you find flowcharts like this useful, I’m happy to make more :smiley:


This is really helpful @ines! Thanks a lot. Would love one for textcat as well.


Really useful. @ines are there any more in this series?

When I try to get the flowchart pdf from the following link:

I get this error:

"Failed to load PDF document."

@Ali What happens when you right click > "Save as"? The file is quite big and depending on your browser, it might not display in the browser.

Just stopping by to say that this is amazing! Really helpful!

1 Like

@kevinruder Thank you, that's nice to hear! :blush:

1. Thanks. For us, NER training is tough and is therefore Prodigy's (and Spacy's) central attraction. We want to add our thanks for this flowchart and our strong encouragement for additional ones to support the most critical areas of Prodigy. As both learners and early deployers of NLP-based solutions, we treat the flowchart as the authoritative springboard into the large corpora of Prodigy (and Spacy) documentation.

2. High-level reality check re DOCUMENTATION. We applaud you for the current release of the documentation set, which is much more coherent than before. Are we generally safe to assume that there are no blatant inconsistencies among the various NER-related API and USAGE segments for both Prodigy and SpaCy? Improvements such as the "collapsed" train functionality are not (yet) reflected in the (current) flowchart, but such are only minor. We just hope to continue relying on the generous guidance you're provided with formal definitions, explanations, code examples, warnings, links, . . . We are repurposing all of these as internal cookbooks to support our development and production.

Thanks! :slightly_smiling_face:

Yes, the recommendations here are still accurate and fully backwards/forwards-compatible. v1.9 just provides a bunch of workflow improvements – e.g. instead of ner.match, you can now use ner.manual with --patterns, or instead of ner.batch-train, you can now also use train.

(That said, best practices and strategies can always change over time. Especially now that transfer learning works so well for NLP, we may end up recommending slightly different workflows in the future.)