id is an automatically generated integer ID that’s used internally and the name is the human-readable string name like
"my_cool_dataset". Basically, the dataset-related methods need to be able to create and retrieve datasets by their string names – how you implement this under the hood is up to you.
Yes, exactly. In the built-in implementation, the same example can be part of more than one dataset and it’ll only be stored once. So when deleting a dataset, we only want to delete the examples that are only present there and not in any other set. Not sure how well this logic translates to Firestore – if you implement a unique record for each example, the
drop_dataset method could also just dump everything and be done with it.
This is fired only if it exists and at the end of an annotation session when the user exits the server. A better name for the method would probably have been
save_and_exit or something like that.
If you need to trigger any final actions like closing the connections or confirming stashed changes etc., that’s where you would do it.
If you wrap your custom database in a Python package and expose an entry point
prodigy_db, Prodigy should recognise it by its string name and the logs should say something like “DB: Added X connector(s) via entry points”. You can then also edit your
prodigy.json and add
"db": "firestore" there.