OSError: Can't find file path: train


I am reproducing "TRAINING A NEW ENTITY TYPE with Prodigy – annotation powered by active learning and everything is perfect till “https://youtu.be/l4scwf8KeIA?t=800

i have issue at this stage: prodigy ner.teach drugs_ner en_core_web_lg train --loader reddit --label DRUG --patterns drugs_patterns.jsonl.
it delivers “OSError: Can’t find file path: train” when I update localhost:8080

I use Ubuntu 16.04 LTS with prodigy-1.4.2-cp35.cp36-cp35m.cp36m-linux_x86_64.whl

Thanks in advance for advise


Sorry if this was confusing in the video – I’m copying over my reply from this thread, which asked the same question:

Great!!! will try that - Thank you Ines

i am running the following command (taken from the TRAINING A NEW ENTITY TYPE video session)
prodigy ner.teach drugs_ner en_core_web_lg train --loader reddit --label drug --patterns drugs_patterns.jsonl

and encounter the following error:
OSError: Can’t find file path: train

please advise

the video session is at:

@omrison I’ve merged the two threads, since they’re asking the exact same question. See above for more details.

You might also want to check out the documentation or the recipes overview here. There you’ll see an explanation of the built-in recipe commands and what the arguments mean.

Thank you. It looks like this is the same issue.
I will try it out.

I suggest adding this to both the manual of the command itself (currently it is not there) and to the getting started documentation.

These were the first 2 places I was searching the answer.

Thanks! So where exactly where you looking? If you check out the documentation for the ner.teach recipe in the README or on the website, it should show that the third argument is the source, i.e. the “Path to a text source”. There’s also a simple example of the command in the first steps guide.

thank you @ines

i wantedto explore the loader function and got into this:
“A function that loads data and returns a stream of annotation tasks. Prodigy comes with built-in loaders for the most common file types and a selection of live APIs, but you can also create your own functions.”
taken from https://prodi.gy/docs/workflow-first-steps

when pressing the “built-in loaders” link, it does not show the load options. where can i see that?

in your comments above, you noted that you are using only opiates data.
in order to finalize your demo (using opiates), what steps should i take? uncompress the files? add a filter to opiates somewhere?

thanks in advance

Thanks, I’ll fix that link! And check out your PRODIGY_README.html. It really has the fully documentation of all the API of the command-line commands and the Python library.

Yes, you’d have to download the data you want to use and filter it by subreddit. You might also want to just try a different dataset for something relevant to your use case.