I had a use case where I wanted to use the "review" recipe to review once more of what I have reviewed before, namely a second review. But I found that I couldn't simply feed in the dataset of my first review. (I can run "review" recipe on "ner.manual" datasest, but I can't run "review" on "review" dataset.)
Is there a neat way to solve this problem? Or do I have to format the "review" dataset and use db-in to feed into Prodigy?
Hi! Since the reviewed dataset is already merged and has the right format, you should be able to just stream in the data, with no modifications, and render it with the review UI. For example, using prodigy mark and something like this should work. Instead of a file, dataset:... loads in the examples from an existing dataset:
prodigy mark new_dataset dataset:old_dataset --view-id review
Or, in a custom recipe, all you need to do is: load the data from your existing dataset using db.get_dataset, return the result as the stream and set the view_id to review
This needs to be explained a bit more. Once a dataset is reviewed it goes into a "merged" state and cannot be reviewed again unless copied to to new dataset which is unmerged is what I'm reading. Where is this information of merged vs unmerged stored ? I can't find it in the Database . To the newbie like me one assumes you can review and keep reviewing until done. Starting a second review leads one to a hung UI on the localhost:8080 port . This is a bug IMHO. It should not even start and give one an error saying the dataset is merged.
Sorry about the confusion! Prodigy will never modify an existing dataset, so your data will never change if you review it. After rewiewing, you typically end up with two datasets: the original data with potential duplicates and conflicts that you load in, and the reviewed dataset with one final version of each annotation that you save your reviews to. The goal of the review workflow is to show you all versions of a given example grouped together, and let you have the final decision. The resulting data saved to the new dataset will be one final annotation for each input example. You typically want to save this to a new dataset, and then use it to train your model.
You should always be able to start and stop the review workflow, or keep adding more reviewed final decisions to your data as new annotations come in. Just keep in mind that by default, Prodigy will skip examples that are already in the dataset you're saving to: so if you've already reviewed the same example before, you won't be asked about it again.
(Btw, this thread is mostly about looking at your previous reviewed examples, which seems to be slightly different from the use case you describe? It sounds like you probably just want to keep running review with the same input and output dataset until you're done and have reviewed all annotations.)