I am reproducing "TRAINING A NEW ENTITY TYPE with Prodigy – annotation powered by active learning and everything is perfect till “https://youtu.be/l4scwf8KeIA?t=800”
i have issue at this stage: prodigy ner.teach drugs_ner en_core_web_lg train --loader reddit --label DRUG --patterns drugs_patterns.jsonl.
it delivers “OSError: Can’t find file path: train” when I update localhost:8080
I use Ubuntu 16.04 LTS with prodigy-1.4.2-cp35.cp36-cp35m.cp36m-linux_x86_64.whl
i am running the following command (taken from the TRAINING A NEW ENTITY TYPE video session)
prodigy ner.teach drugs_ner en_core_web_lg train --loader reddit --label drug --patterns drugs_patterns.jsonl
and encounter the following error:
OSError: Can’t find file path: train
@omrison I’ve merged the two threads, since they’re asking the exact same question. See above for more details.
You might also want to check out the documentation or the recipes overview here. There you’ll see an explanation of the built-in recipe commands and what the arguments mean.
Thanks! So where exactly where you looking? If you check out the documentation for the ner.teach recipe in the README or on the website, it should show that the third argument is the source, i.e. the "Path to a text source". There's also a simple example of the command in the first steps guide.
i wantedto explore the loader function and got into this:
“A function that loads data and returns a stream of annotation tasks. Prodigy comes with built-in loaders for the most common file types and a selection of live APIs, but you can also create your own functions.”
taken from https://prodi.gy/docs/workflow-first-steps
when pressing the “built-in loaders” link, it does not show the load options. where can i see that?
moreover,
in your comments above, you noted that you are using only opiates data.
in order to finalize your demo (using opiates), what steps should i take? uncompress the files? add a filter to opiates somewhere?
Thanks, I'll fix that link! And check out your PRODIGY_README.html. It really has the fully documentation of all the API of the command-line commands and the Python library.
Yes, you'd have to download the data you want to use and filter it by subreddit. You might also want to just try a different dataset for something relevant to your use case.