Api vs view_id key name yield different results

My dataset is a stream of:

{
     'text': d['text'],
     'tokens': tokenizer(d['text']),
     'meta': {
         'id': d['id'],
         'time': d['time']
     }
 }

My recipe returns:

return {
    'dataset': dataset,
    'view_id': "ner_manual",
    'stream': stream,
    'config': {'label': "BOOP"}
}

There are a few issues:

  1. this yields the error “Oops. Something went wrong” in the UI
  2. changing the key name ‘view_id’ to ‘api’ removes the view_id printout in the upper left of the UI, but now everything renders the same way it does with the ner view_id. I presume this is a default kicking in, but I can’t find any doc on it.
  3. passing a list as the value to the ‘label’ param in the config ([‘BOOP’, ‘BLIP’]) gets concatenated in the UI (“BLOOPBLIP”)
  4. Some places in the doc the key name is ‘label’ and other times ‘labels’; when I use ‘label’, it prints in the upper left of the UI, and when I use the plural it does not

Piping in works fine but using a recipe does not; I followed the topic in here for that, saw the label requirement, and it unfortunately didn’t fix my problem.

Is there a precise spec for what this output json can be? Also it’d be cool as you have time to have some error handling around ppl using unexpected keys or values.

Thanks so much for your help!

Sorry about the suboptimal error handling here – we’re definitely working on that! Prodigy will then use a schema to validate the first batch of tasks before it goes out, which should catch the majority of problems upfront.

'tokens': tokenizer(d['text']),

What exactly does this function produce? The taks format of tokenized text should look like this:

{
    "text": "Hello Apple",
    "tokens": [
        {"text": "Hello", "start": 0, "end": 5, "id": 0},
        {"text": "Apple", "start": 6, "end": 11, "id": 1}
    ]
}

Each token is a dictionary with a start and end index, as well as a token index (you can find more details on this in the “Annotation task formats” section of the docs). It’s possible that the error is caused by a small difference in the formatting and that Prodigy fails to render the tokens correctly.

The view_id needs to be one of the built-in interface IDs. The Prodigy app should show you an additional error along the lines of “invalid view ID”, though, if it doesn’t recognise the ID (at least in the latest version – if not, let me know). If no view ID is specified, Prodigy will try to guess the best-matching view from the content of the task and the available properties. So text with spans will be rendered in the NER view, images and spans in the image view etc.

Sorry if the naming here is a bit confusing – the manual NER recipe needs a set of labels , which can be specified as the 'labels' key in the config. (We should have probably named this 'label_set' or something more explicit.) The label set should be a list of strings, and those will be the labels shown as options above the text.

Ahh, ok, we figured out what was going on here. Thanks for the help!

Our tokenizer wasn’t producing indices in our custom recipe (I’m not sure why I ever thought that was a brilliant idea…). For what it’s worth, specifying an incorrect key for view_id (“api”, in my case) activated some default that set the view_id to ner, reading from the “text” field, which allowed it to render in the UI (as opposed to the “tokens” field, which was of the wrong type). Omg, someday I’ll be able to go back to scala and haskell and I’ll never have to deal with weakly typed languages evar again. :smiley:

I’m super excited to hear about the schema moving forward - that’ll help me catch loooots of my errors, I imagine.

Thanks again!

Thanks for updating with your solution, glad it worked! :+1:

Yes, I really hope we can get this into the next release!

(Props to @justindujardin btw who initially suggested the JSON schemas, likely after a similar frustration – I don’t want to take all the credit for that :stuck_out_tongue:)

1 Like

@hannahlindsley Just released Prodigy v1.5.0, which now includes JSON schemas :tada: The first batch of your stream will be validated before the server starts, and the error message will include details on what’s wrong (missing required values, wrong type etc).

(There’s also a prodigy.get_schema util function that takes a view ID, e.g. 'ner' and returns the JSON schema, so you can see what your tasks are validated against.)

1 Like