Prodigy fails to start with empty stream

alejandro.mesa · August 28, 2018, 11:46pm

Hello,

We have nightly process that loads new tasks that will be processed in prodigy. After this is done, we restart prodigy so that new tasks get loaded and displayed to the annotators. Sometimes, this nightly process does no load new tasks, and when prodigy restarts, it fails with error Error while validating stream: no first batch. This likely means that your stream is empty. The problem is that we’re running prodigy in docker, so the container keeps restarting and that same exception is shown every time. I was wondering if it’s possible to run prodigy even if there are no tasks, and have the web interface show No tasks available.

Thanks

ines · August 29, 2018, 8:32am

Ah, that’s an interesting edge case. We specifically added the validation in one of the recent versions to make Prodigy fail more gracefully, because in 99% of the cases, an empty stream indicates a problem with the input data and is not what the user wants.

However, if you know that your formats are correct, you can set "validate": false in your config, which will disable all of those checks (stream validation, task validation etc.)

alejandro.mesa · August 29, 2018, 3:22pm

Hi Ines,

Thanks for your response. That seems to work, but it seems a bit dangerous to disable validation. I was wondering if this is something that could be implemented in a future release? Also, is there a function that we can call to manually perform the validation?

Thanks

ines · August 29, 2018, 3:40pm

Yes! Internally, the validaton is implemented via a JSON schema, so the most flexible way to perform your own is to use the prodigy.get_schema method, which takes a view_id (for example, 'ner') and returns the schema. You can find more details and documentation in your PRODIGY_README.html. There are various libraries for all kinds of languages that implement the JSON schema validation.

Alternatively, you can also use the validate component – this is currently internals, so not officially documented. But it’s pretty straightforward:

from prodigy.components.validate import Validator

# create the validator
validator = Validator('ner')

def validate_stream(stream):
    for task in stream:
        # validate a task object
        validator.check(task)

alejandro.mesa · August 29, 2018, 4:33pm

Great, thanks. One last question: what’s the return type of that validator.check function? True/False? Or does it raise an exception if the validation fails?

Thanks

ines · August 29, 2018, 4:37pm

Ah, sorry, forgot to include that. It only raises with a formatted error message showing the exact path and fields that didn’t pass the check. If you only need a boolean value, you can use validator.is_valid(task) instead.

(Under the hood, we use the jsonschema.Draft4Validator btw and get the errors using the iter_errors method.)

alejandro.mesa · August 29, 2018, 4:48pm

Thanks

Topic		Replies	Views
Stream Reset error handling bug	1	24	August 27, 2024
Application says "No task available" usage , solved , streams	3	439	September 28, 2021
Number of tasks doesn't match number of items in input file solved , streams	8	1022	November 15, 2019
Incomplete annotations with textcat.manual textcat , streams	4	432	May 4, 2020
Ner teach not working ner	5	563	November 16, 2022

Prodigy fails to start with empty stream

Related topics