OSError: Can't find file path: train

Benoit · May 11, 2018, 4:18pm

Hello,

I am reproducing "TRAINING A NEW ENTITY TYPE with Prodigy – annotation powered by active learning and everything is perfect till “https://youtu.be/l4scwf8KeIA?t=800”

i have issue at this stage: prodigy ner.teach drugs_ner en_core_web_lg train --loader reddit --label DRUG --patterns drugs_patterns.jsonl.
it delivers “OSError: Can’t find file path: train” when I update localhost:8080

I use Ubuntu 16.04 LTS with prodigy-1.4.2-cp35.cp36-cp35m.cp36m-linux_x86_64.whl

Thanks in advance for advise

Benoit

ines · May 11, 2018, 4:22pm

Sorry if this was confusing in the video – I'm copying over my reply from this thread, which asked the same question:

Using Loaders

The fourth argument of the command after the dataset and the model is the data source, i.e. the texts you're loading in. So the train in the command above is the path to the data we're loading in – for training, we've created a directory /train containing the data files. Iin the video, you'll see that it's underlined, because it points to a directory.) Here's a more explicit version of the command:
ner.teach drugs_ner en_core_web_lg /path/to/reddit/data --loader reddit --label DRUG --patterns drug_patterns.jsonl
The data loaded in is a portion of the Reddit Comments Corpus, which you can download for free. The built-in Reddit loader in Prodigy is available via --loader reddit and can take either a single .bz2 archive (the format the corpus is shipped in), or a directory containing multiple archives, which are then loaded in order.

If you're following the video example, note that we've pre-processed the Reddit data and divided it into a training set, an evaluation set and a test set (to make sure we can actually evaluate the model properly). We've also extracted only comments from /r/opiates. However, if you want to try out a similar approach with a broader category (like, slang terms or technology companies or whatever else you come up with), you can also easily stream in texts from all subreddits.

Hope this helps!

Benoit · May 11, 2018, 4:32pm

Great!!! will try that - Thank you Ines

omrison · July 16, 2019, 1:11pm

i am running the following command (taken from the TRAINING A NEW ENTITY TYPE video session)
prodigy ner.teach drugs_ner en_core_web_lg train --loader reddit --label drug --patterns drugs_patterns.jsonl

and encounter the following error:
OSError: Can’t find file path: train

please advise

the video session is at:

ines · July 16, 2019, 1:18pm

@omrison I’ve merged the two threads, since they’re asking the exact same question. See above for more details.

You might also want to check out the documentation or the recipes overview here. There you’ll see an explanation of the built-in recipe commands and what the arguments mean.

omrison · July 16, 2019, 6:01pm

Thank you. It looks like this is the same issue.
I will try it out.

I suggest adding this to both the manual of the command itself (currently it is not there) and to the getting started documentation.

These were the first 2 places I was searching the answer.

ines · July 16, 2019, 6:10pm

Thanks! So where exactly where you looking? If you check out the documentation for the ner.teach recipe in the README or on the website, it should show that the third argument is the source, i.e. the "Path to a text source". There's also a simple example of the command in the first steps guide.

omrison · July 17, 2019, 5:02am

thank you @ines

i wantedto explore the loader function and got into this:
“A function that loads data and returns a stream of annotation tasks. Prodigy comes with built-in loaders for the most common file types and a selection of live APIs, but you can also create your own functions.”
taken from https://prodi.gy/docs/workflow-first-steps

when pressing the “built-in loaders” link, it does not show the load options. where can i see that?

moreover,
in your comments above, you noted that you are using only opiates data.
in order to finalize your demo (using opiates), what steps should i take? uncompress the files? add a filter to opiates somewhere?

thanks in advance

ines · July 17, 2019, 9:11am

Thanks, I'll fix that link! And check out your PRODIGY_README.html. It really has the fully documentation of all the API of the command-line commands and the Python library.

Yes, you'd have to download the data you want to use and filter it by subreddit. You might also want to just try a different dataset for something relevant to your use case.

Topic		Replies	Views
error while training a new entity type ner.teach usage , ner , solved	1	648	October 10, 2019
Using Loaders usage , solved	8	3583	November 12, 2018
Having erros when loading my own data usage , textcat	3	2193	March 6, 2018
RecipeError: ("Can't find path dataset:ESmodelAnotado", PosixPath('/app/dataset:ESmodelAnotado') usage	2	203	November 17, 2023
Create Custom Loader usage , ner , custom	21	3882	August 14, 2019

OSError: Can't find file path: train

Related topics