I was going through the example of how to make gold standard datasets (here) and ran into a bug.
When I enter the command from the demo,
prodigy db-in news_headlines_raw news_headlines.jsonl
I get the error
✨ ERROR: Can't find 'news_headlines_raw' in database SQLite.
Ah, unlike the recipes,
db-in expects the dataset to already exist in the database. Have you already set it up using the
dataset command? If not:
prodigy dataset news_headlines_raw "Some description here"
Now that I think about it, it might be a good idea to also let
db-in create the datasets automatically. Originally, we decided against this to make it a more “conscious” decision to create a new dataset, make it easy to add meta data like dataset description and authors (there’s currently no easy way to do this retroactively), and show a warning in the case of typos (e.g. if your planning to add to an existing dataset, but use the wrong name). But maybe this actually makes things inconvenient.
That was the problem. Thanks! I’m guessing that making a dataset usually will be a conscious decision and this is just an issue for people copy and pasting recipes willy-nilly.
Glad to hear you got it working! Might be nice to add something to the error message along the lines of “Maybe you misspelled the name or forgot to add it via the