The patterns stuff all looks correct. I think the problem is in the data you're annotating, i.e. the Reddit corpus. In the video, we're loading in the pre-extracted data from a directory called train
– sorry if this was slightly confusing. See this thread for an explanation:
The fourth argument of your command is the data you're loading int for annotation. So what currently says train
needs to be a valid data file. In this case, a Reddit comments archive, because you're using --loader reddit
:
python3 -m prodigy ner.teach skills_ner en_core_web_lg /path/to/data.bz2 --loader reddit --label EDU --patterns skill_patterns.jsonl
If you haven't done so already, you can download data from the Reddit comments corpus from this page.