Errors with pos.teach and pos.batch-train.

The patterns stuff all looks correct. I think the problem is in the data you're annotating, i.e. the Reddit corpus. In the video, we're loading in the pre-extracted data from a directory called train – sorry if this was slightly confusing. See this thread for an explanation:

The fourth argument of your command is the data you're loading int for annotation. So what currently says train needs to be a valid data file. In this case, a Reddit comments archive, because you're using --loader reddit:

python3 -m prodigy ner.teach skills_ner en_core_web_lg /path/to/data.bz2 --loader reddit --label EDU --patterns skill_patterns.jsonl

If you haven't done so already, you can download data from the Reddit comments corpus from this page.

1 Like