dataset

Is this video still in date? If it's no longer valid, should the video not be removed as it's confusing to have information that's incorrect out there.

python -m prodigy dataset he_people "Seed terms for People"

returns 'Can't find recipe or command 'dataset'.

The workflow shown in the video is definitely still valid – but since the video is from 2017, a few details in the usage of the recipes and their arguments have changed (but the recipe names are all the same latest usage and available arguments are documented in the recipe docs).

The dataset command has since been deprecated and you can just skip this step — Prodigy will create the dataset automatically under the hood.

In terms of best practices, there are a few additional recommendations for NLP workflows that are now easier than they were in 2017: for example, with transfer learning, it's not much more viable to collect a small dataset of NER annotations manually (say 100-200), pretrain a model that can achieve decent accuracy and then improve that model further with more annotations, e.g. using a workflow like ner.teach or ner.correct. This is often more convenient than trying to get over the cold-start problem with the model in the loop and patterns.

Perhaps you could update the YouTube description to point out the differences, because as someone new to this who is blindly following along it's tricky when I'm second guessing whether I'm making an error, or the video is pointing me in the wrong direction and that I should skip something.