error while training a new entity type ner.teach

ines · October 10, 2019, 12:24pm

Hi! The argument after the model name is the text source you want to load in and annotate. In the tutorial video, we've used a directory called train containing the Reddit data. From the error, it looks like you're trying to load data from a path train, but that path doesn't exist.

I've explained this in more detail here:

OSError: Can't find file path: train

Sorry if this was confusing in the video – I’m copying over my reply from this thread, which asked the same question:

The fourth argument of the command after the dataset and the model is the data source , i.e. the texts you’re loading in. So the train in the command above is the path to the data we’re loading in – for training, we’ve created a directory /train containing the data files. Iin the video, you’ll see that it’s underlined, because it points to a directory.) Here’s a more explicit version of the command:
ner.teach drugs_ner en_core_web_lg /path/to/reddit/data --loader reddit --label DRUG --patterns drug_patterns.jsonl
The data loaded in is a portion of the Reddit Comments Corpus, which you can download for free. The built-in Reddit loader in Prodigy is available via --loader reddit and can take either a single .bz2 archive (the format the corpus is shipped in), or a directory containing multiple archives, which are then loaded in order.

If you’re following the video example, note that we’ve pre-processed the Reddit data and divided it into a training set, an evaluation set and a test set (to make sure we can actually evaluate the model properly). We’ve also extracted only comments from /r/opiates . However, if you want to try out a similar approach with a broader category (like, slang terms or technology companies or whatever else you come up with), you can also easily stream in texts from all subreddits.

Hope this helps!

Topic		Replies	Views
ner.teach throws an error for twitter dataset usage , solved	1	430	October 11, 2019
ner.batch-train results in KeyError ner , done	2	765	January 2, 2019
Error using synthetic dataset usage , ner , solved	2	666	January 4, 2019
Confusing error for tasks with "text": null enhancement , ner , solved	4	1585	March 9, 2018
I can't run ner.teach usage , ner	1	388	December 12, 2021

error while training a new entity type ner.teach

Related topics