Examples not loading - ECS

Hey! I have deployed prodigy to an ECS instance on AWS. Now, I'm trying to load data onto this instance from my local machine. How do I go about doing this? I tried loading data in the same way I did when Prodigy was running locally, but the new data doesn't appear. Do I have to get into the ECS box and insert from there? If it helps, I'm storing the data in a Postgres RDS instance.
Thanks so much!

I'm not a container expert myself (I actually had to double-check the ECS acronym, which I guess is a bad sign!), so bear with me if this takes a few steps to figure out.

Which step is going wrong: streaming examples from a file, or reading from (and writing to) the database?

If the former, I would guess it's a question of getting access to the file inside the container. It probably isn't very elegant to require the source data to be built into the container. You might instead prefer to have your local Prodigy instance and the container both connecting to the same database. You would write prodigy db-in from your local machine, and then in your container, you can use prodigy db-out to fetch the data. You could then make your RUN command write the data to disk, or even just pipe it forward (most recipes can read from stdin). You could also write a custom loader script: https://prodi.gy/docs/cookbook#loaders

If the problem is connecting to the database, first a little bit of background. The choice of DB is configured either via environment variable, inside the recipe or in the prodigy.json file. The default is to use the SQLite database engine, which reads from a local file. Obviously if you're running in a container, keeping the database on the container's file-system won't be the best approach.

So my first thought is: have you checked that inside the container, Prodigy is indeed connecting to your Postgres instance? Inside your recipe you can do something like:

import prodigy.components.db
DB = prodigy.components.db.connect()
print(DB.db_id, DB.db_name)

This should help you figure out whether it's connecting correctly. If it's not, then there's some setting that's not getting passed through. Perhaps you need to include a prodigy.json in the container? An environment variable might also be an elegant way to pass forward the configuration.

If it's connecting to your DB correctly, then I'm much more puzzled. I can't see why it should matter that Prodigy's in a container, as far as talking to the DB is concerned.

Thanks for your response! Whenever I push data to the postgres RDS, that data isn't reflected in the container-ized prodigy UI (aka it doesn't show up as annotatable). Right now, I am having to bake the data into the container as it's built, but (ideally) I would be able to push new data into Postgres (from my local machine) and it would show up in the Prodigy UI on the container. So, I guess my question is how can I add to a Prodigy dataset from my local machine and have it show up in a container-ized version of Prodigy (when I know both are talking to the same database)?

Also, from your message, it sounds like "db-in" may be the solution, but the documentation makes it look like "db-in" is only used for loading things that have already been annotated ("existing annotations"), not data to be annotated. Am I wrong about this?