Hi! Having a single shared remote database is probably the most straightforward solution. It doesn't have to be in the cloud – if everyone has access to the same shared drive, you could set the
PRODIGY_HOME environment variable so everyone uses the same config and writes to the same SQLite database file.
Is your main motivation for sharing datasets to make sure that examples aren't annotated multiple times, and so you can exclude examples if they're already in the dataset?
That's definitely an option, yes. If passing around files is too messy, a more elegant solution would be to use the database API in Python and write a script that syncs your annotations.
By default, Prodigy expects to write the examples to one database. However, you could write your own custom
Database class or implement your own logic to save to a second locations.
You could even have a custom recipe that uses the
update callback to send completed annotations (or just their task hashes!) to a remote database. If you're just storing the hashes, you'll likely won't have any data privacy issues – but you can still use them to filter examples and detect whether something is a duplicate or not.