I'm working on a span categorization task to be carried out by a number of annotators which will include at least 2 review stages. It would be highly useful for text-input review reasoning to be associated with the spans as they are processed in a way which would be visible to subsequent reviewers. I've looked at the videos and tutorials related to custom recipes and can understand how a text input box can be added to a review screen (though implementation has not yet been successful) but not if or how it can be subsequently displayed.
Hi! This is a cool idea and you should be able to make this work with a small modification to the review recipe:
Use "view_id": "blocks" instead of "view_id": "review" so you can use multiple blocks.
Set up multiple blocks in the config: one review block and as many text_input blocks as you need.
Use the field_id of the text_input to specify the key the text input is saved to in the JSON. So using "field_id": "tim" will save what you type in as the key "tim" in the JSON.
As a quick and dirty solution, you could just edit the recipes/review.py in your Prodigy installation. You can run prodigy stats to find the location. If you want it to be more elegant, you can also make a copy of the recipe.
After the first review session, the examples to review are all created and saved in the database, with your notes. You then wouldn't have to use the review recipe anymore and you could just stream in the examples straight from the dataset, including the existing notes. If notes for a given field ID are already available in the data, the field will be pre-populated with it, so subsequent reviewers can see what you wrote.
Here's a simple example of what a re-review recipe that streams in reviewed annotations from a dataset and adds text inputs for different reviewers:
import prodigy
from prodigy.components.db import connect
@prodigy.recipe("re-review")
def re_review(dataset: str, source_dataset: str):
db = connect()
stream = db.get_dataset(source_dataset)
return {
"dataset": dataset,
"stream": stream,
"view_id": "blocks",
"config": {
"blocks": [
{"view_id": "review"},
# You can also make this more elegant and pass in the IDs
# as arguments on the CLI
{"view_id": "text_input", "field_id": "tim"},
{"view_id": "text_input", "field_id": "ines"},
]
}
}
Hi, Ines. Thank you very much for getting back to me. I got the interface looking how I wanted (by means of copying the original recipe, then modifying the recipe name and the returned blocks as appropriate). I was also able to input text into the text box and save on the first iteration, but ran into browser hang and an infinite "Loading..." message when I tried to use the modified dataset from the first iteration during the second.
My method was as follows:
I created the dummy test dataset using spans.manual, which is the format the final annotations will be in. I then loaded a modified review recipe with a modified name and
in the return section. I used a new dataset for the output (what I will call "first iteration") and the dummy spans.manual dataset as the input. I was able to see the text input field, input text, and save the newly created database without issue.
However, I then attempted a second iteration with a new name in the modified review recipe and an additional text_input, using the first iteration dataset as the input and a second iteration dataset for the output. I encountered the "Loading..." issue at this point, and it seemed to have significantly slowed the browser.
The same result was present when I attempted to repeat the first iteration with additional data. Is there some issue perhaps with how I'm attempting to save and/or access the text_inputs?
When you ran the second iteration, did you use the modified review recipe or the re-review code I posted above? If you just re-ran the review recipe, the problem could be that it's trying to re-generate the merged examples for review based on the already merged dataset. But the dataset is already merged, so you should be able to just stream it in as-is (e.g. like in the re-review recipe).
I was having trouble with both methods, but found my issue with the implementation of the recipe method you posted above. That is working as expected as to the text input. However, the presentation of the annotations using the recipe above is difficult to read for our purposes, which is why I was also attempting the other method.
I put together a dummy clause and marked it up with some simple labels in ways likely to interact while annotating real data. There are many instances where markings should overlap, and that relationship is easily seen in the modified review interface below.
I apologize as I was likely unclear before as to what I was attempting to do. Is it possible to retain the stock review-style markups while also adding the notes?
So I have come up with a solution that will work for the purposes of the multiple stages of review, so am posting it here should it be useful for others. I wasn't able to get the notes section to work, but seeing all of the annotations from previous rounds is equally as good, if not better.
After utilizing the review recipe, the trick is to pull each user's answer to its own dataset, and then simply load the new one in along with the previously reviewed sets for the next round.
A simple method of extraction is below. Hope it helps and I appreciate the help given. Thank you.
with open('./jsonl-file-name.jsonl', 'r') as json_file:
json_list = list(json_file)
userlist = ["user1", "user2"]
for user in userlist:
with open(f"{user}_review.jsonl", "w") as f:
for json_str in json_list:
result = json.loads(json_str)
versions = result["versions"]
for version in versions:
if f"test_dataset-{user}" in version["sessions"]:
f.write(json.dumps(version) + "\n")
f.close()