Finished files

Hey there

My name is Roy and i'm a data engineer.
We are starting to use prodogy in out departnment.
I was wondering how file handeling iis managed, i.e is there any mechanism to move files that are allready checked completly to another location or mark them as checked ?

Hi! Prodigy doesn't really have a concept of a file being "finished" – whether or not you're done with it depends on a lot of factors that you decide. Maybe you want to re-annotate a file, collect more annotations from a different annotator, or experiment with a different label scheme. Or maybe you're using a workflow with a model in the loop that selects some examples from your file and skips others. So it's not always very useful to think of a file as being "finished".

That said, if you want to check whether all examples in a file have been annotated, you can export your annotations or load them via the Database API and use the _input_hash to indentify unique examples referring to the same input (e.g. text). This lets you identify the examples that have been annotated. This also happens internally when Prodigy decides whether to show an example for annotation. If no examples are available anymore, all data has been annotated.

You can also stream in JSON data with your own internal IDs added to the examples. This will be passed through and saved with the annotations in the database. Based on the collected annotations, you can then check whether you have annotated all examples in a given file, how many annotations you have for each example, and so on.