Streaming data from a dataframe directly

@ines @honnibal
Hey Ines and Honnibal, I have a question regarding the loaders.

I know that we can stream data from a csv file. How do we stream data from a dataframe?
e.g :
"python -m prodigy ner.correct dataset spacy_model df --loader csv --label yes,no --unsegmented"
I know this will not work. Can you help me with how I can read from a dataframe with the column name set to "text" and stream it to prodigy.

Thank you in advance

Hi! You should be able to just write a custom loader for that – either in a custom recipe, or by writing a script that can take any arguments and then pipe its output forward to Prodigy. So your loader loads your dataframe however you want, writes the JSON tasks to standard output, and Prodigy reads them from standard input. This will work with all built-in recipes. You can see examples here: https://prodi.gy/docs/api-loaders#loaders-custom

@ines i have written custom recipes before to stream data. How to write custom recipes that would stream data like ner.correct recipe would?
for example, this recipe below would stream like ner.manual recipe would

How do I write it as a ner.correct recipe?

You can find examples and recipe templates to get started in our prodigy-recipes repo. For example, this one uses a pretrained model to highlight entities and lets you edit them manually, just like ner.correct:

Prodigy also includes the source for its built-in recipes, so you can run prodigy stats to find the location of your Prodigy installation, and look at the files if you want to see how they work under the hood.