Bug when using choice and review interface

Hello,

I am using Prodigy 1.9.7, and i'm trying to manually annotate 50 texts with 8 different labels. Somehow I keep running into this message:


Here are all the tests I made:

First I went with the textcat.manual recipe:
prodigy textcat.manual new_50 data/new_50.json --label labels.txt.
There wasn't any error message, but the interface shows No tasks available. Then I tried with one label to see if there was something wrong with the input file:
prodigy textcat.manual mail_new_50 data/mail_new_50.json --label NO_NEW.
This time it worked.

Then I tried using the mark recipe as an alternative:
prodigy mark mail_new_50 data/mail_new_50.json --label label/labels.txt --view-id choice
That's where I first had the Something went wrong message. And again I tried using one single label:
prodigy mark mail_new_50 data/mail_new_50.json --label NO_NEW --view-id classification
And again it worked. I noticed that every time I used the choice interface there was a problem; not sure if that's actually the reason tho.

I went on to create a custom recipe which I got the inspiration from this post on multi-class textcat with patterns (thanks a lot to all contributors btw !). A few slight changes were made to adapt it to my project, but the general operation was the same. The recipe worked well and was successfully used in a multi-annotator campaign.

So I obtained several annotation sets, and I wanted to review those annotations. First I called the review recipe this way:
prodigy review mail_new_gold mail_new_50
This lead to an error No None found in the example followed by the suggestion of adding --view-id option. Which I thought was odd because I used Prodigy 1.9.7 for the whole project and therefore, as I understand it, no need to specify an interface. But I added the option anyway:
prodigy review mail_new_gold mail_new_50 --view-id choice
Where for the second time I ran into the something went wrong message. And finally I decided to specify session names instead of dataset name:
prodigy review mail_new_gold mail_new_50-danrun,mail_new_50-danielle --view-id choice
This time it worked for a while (13 examples reviewed) before the something went wrong message came up.

The same .json and .txt files have been used with other recipes and it worked pretty well, so I'm guessing maybe something's wrong with my local installation ? Btw it was installed on a docker container, I don't know if this could be a contributing fact. Do you have an idea how to fix this ?

Thanks in advance ! :slight_smile:

Hi! I think there are several things going on here. First, do you have validation enabled ("validate": true)? Because some of these problems should have been caught by the data validation, which should give you clearer feedback about what the underlying problem is, before starting the server :thinking:

The problem here is that the mark recipe will render exactly what comes in with the data, using a given interface. The choice interface expects an incoming task to have one or more "options". If those are not present, Prodigy can't render the data.

Where does the mail_new_50 data come from and how was it created? And which interface was used? Somehow the dataset ended up with an example without a "_view_id" value. Prodigy will add this automatically, so it's definitely strange that this happened. Did you ever import data to this dataset from a file etc.?

This sounds like you might have ended up with a mix of data in different formats created with different interfaces in one or more of the datasets. The choice interface expects the examples to have "options" and a list of "accept": [], containing the selected options. If you're trying to review datasets where some examples have options and others don't, that's not going to work.

An easy way to test this would be to write a script that load each dataset and then loops over the tasks and prints/logs examples that do not have "options" or do not have a "_view_id" set. Then you can figure out where they come from.

What's in your new_50 dataset? The most likely explanation here is that the dataset already contains annotations for the texts in new_50.json. By default, Prodigy will skip examples that are already annotated, so you're not asked the same question twice.

If the dataset already contains annotations with multiple-choice options for the given texts, but none yet for text plus single "label", that'd also explain why running textcat.manual with only one label showed you questions to annotate.

I checked prodigy.json, the validation is indeed enabled. Strange :thinking:

I tried adding options to the original data file and it worked ! Thanks !

The dataset is initialized with mail_new_50.json, in which I have 50 texts without other properties. It is then annotated by several people using choice interface. I checked every one's session and every entry has a _view_id value. The only entries without a _view_id are the originals. That explains why it worked when I specified which sessions to review, since the original texts are no longer reviewed. But I still don't understand why it worked for 13 examples then stopped ? They all have "options" and "accept".

It's actually mail_new_50 and mail_new_50.json, I copied the wrong command. Sorry about that. And by adding "options" to the original it worked too ! Just to be sure tho: basically I added all options to the texts, and in the cmd I still need to specify labels, is this expected ?

Thanks for checking – this definitely makes sense then :slightly_smiling_face: I'll check if we can add more data validation here, because everything related to tasks having incorrect data formats is stuff we can easily catch early.

You don't have to upload any data to your dataset before annotating – the datasets are only for the collected annotations and to provide the input data, just set the argument on the command line. If you pre-import the raw data, it's going to be treated as annotations and used during training, the review process etc., which is typically not what you want.

I think what might have happened is that example 1 to 13 were annotations created with Prodigy and the choice interface. Then came example 14, which was one of the raw unannotated examples without "options" that you had imported before, so the interface couldn't render it.

textcat.manual will take care of adding "options" to each incoming example, based on the --labels you set on the command line. So you don't have to do that in your original data.

I think the problem in your case was that the dataset also contained raw unannotated examples, which were the ones that ended up causing all the problems.

OK it all makes sense now. Thank you so much !

1 Like