Dataset from mysql instead json in 2022

info2000 · December 30, 2021, 3:24pm

How can feed a textcat annotation project from mysql or an API Rest instead from a static json file?

Thanks

ines · December 31, 2021, 1:34pm

Hi! You can always write an entirely custom loader that fetches data from your MySQL database or REST API and yields out dictionaries in Prodigy's format (e.g. {"text": "..."}). You can either integrate it via a custom recipe, or make it a separate script that writes out the dictionaries and then pipe the output forward to any recipe on the command line. See here for examples:

info2000 · January 5, 2022, 1:21pm

awesome @ines
has some example to pass a dinamic stream to the images loader?
on the image.manual recipe had this line,
stream = Images(source)
but if source is the result of a query "select x,z from table limit 0,50", how can I trigger a new load of the source?

Thanks

ines · January 5, 2022, 2:17pm

Are you using a custom recipe? In that case, you can also just modify the recipe itself and add a custom stream generator that loads from your database. Of course, the specific implementation will depend on what your database query returns, but it'll roughly look like this:

def custom_stream():
    data = make_your_database_query_here()  # query your db
    for image_url in data:
        yield {"image": image_url}

# in your recipe
stream = custom_stream()

If you're only loading some queries at a time or you want to make request to a paginated API, you can also do something like this and keep incrementing the page/count/whatever until no data is available anymore:

page = 0
while True:
    data = make_your_database_query_here(page)
    for image_url in data:
        yield {"image": image_url}
    page += 1

If you're using the built-in image.manual recipe, it will be able to also read from standard input because it uses Prodigy's get_stream helper instead of Images to load the stream. So you can have a script that loads your data and writes it to standard output:

# image_loader.py
data = make_your_database_query_here()
for image_url in data:
        print({"image": image_url})

You can then pipe the output forward to the recipe and set the source to - to read from stdin:

python image_loader.py | prodigy image.manual dataset - --label FOO,BAR

info2000 · January 16, 2022, 3:43pm

awesome, you always explaining all the details.

Topic		Replies	Views
Prodigy input stream as MySQL usage , solved	2	499	February 26, 2019
Custom JSONL output usage , solved	6	1266	March 13, 2020
Use database as source	1	282	May 5, 2022
Sample code for streaming examples from a database? usage , custom , solved	2	1045	September 9, 2018
Loading a dataset from the DB instead of from disk/api? usage , solved	4	1972	March 6, 2018

Dataset from mysql instead json in 2022

Related topics