How can feed a textcat annotation project from mysql or an API Rest instead from a static json file?
Thanks
How can feed a textcat annotation project from mysql or an API Rest instead from a static json file?
Thanks
Hi! You can always write an entirely custom loader that fetches data from your MySQL database or REST API and yields out dictionaries in Prodigy's format (e.g. {"text": "..."}
). You can either integrate it via a custom recipe, or make it a separate script that writes out the dictionaries and then pipe the output forward to any recipe on the command line. See here for examples:
awesome @ines
has some example to pass a dinamic stream to the images loader?
on the image.manual recipe had this line,
stream = Images(source)
but if source is the result of a query "select x,z from table limit 0,50", how can I trigger a new load of the source?
Thanks
Are you using a custom recipe? In that case, you can also just modify the recipe itself and add a custom stream generator that loads from your database. Of course, the specific implementation will depend on what your database query returns, but it'll roughly look like this:
def custom_stream():
data = make_your_database_query_here() # query your db
for image_url in data:
yield {"image": image_url}
# in your recipe
stream = custom_stream()
If you're only loading some queries at a time or you want to make request to a paginated API, you can also do something like this and keep incrementing the page/count/whatever until no data is available anymore:
page = 0
while True:
data = make_your_database_query_here(page)
for image_url in data:
yield {"image": image_url}
page += 1
If you're using the built-in image.manual
recipe, it will be able to also read from standard input because it uses Prodigy's get_stream
helper instead of Images
to load the stream. So you can have a script that loads your data and writes it to standard output:
# image_loader.py
data = make_your_database_query_here()
for image_url in data:
print({"image": image_url})
You can then pipe the output forward to the recipe and set the source to -
to read from stdin:
python image_loader.py | prodigy image.manual dataset - --label FOO,BAR
awesome, you always explaining all the details.