prodigy train tagger not working

I am using prodigy to train a pos tagger. However when I try to run the train command I get the following results:

Created and merged data for 0 total examples
Using 0 train / 0 eval (split 50%)
ValueError: not enough values to unpack (expected 2, got 0)

The file was created using the prodigy pos.correct recipe and when I run the db-out command I see the data formatted correctly.

{"text":"cfos","_input_hash":-1934297327,"_task_hash":1222700193,"tokens":[{"text":"cfos","start":0,"end":4,"id":0,"ws":false}],"spans":[{"start":0,"end":4,"token_start":0,"token_end":0,"label":"NNS"}],"_session_id":null,"_view_id":"pos_manual","answer":"ignore"}

Any clue or if this is a bug?

Hi! Which version of Prodigy are you using and what's the exact command you're running? I just tried reproducing it by annotating a few examples with pos.correct and then training a model, but it all ran as expected :thinking:

Hi @ines - thanks for the quick reply! Using the latest version of prodigy (1.10.4 ) and here is the exact command I ran:

prodigy train tagger pos_oct_21_fine_grained en_core_web_md -o ./test

If I do prodigy db-out pos_oct_21_fine_grained I see the examples correctly printed in terminal. I am only using 13 examples if that helps debug.

If I run the command with coarse-grained training and batch_train it works correctly, but if I run the command with fine-grained tags with the all-purpose train command I get the above error. If I try using the coarse grained tags i.e. VERB it complains that new labels can't be added to the model.

Thanks for the details and sorry about the delay! I've been trying to reproduce this but it always trains as expected for me :thinking: I used the example you provided above (and changed "answer" to "accept"), and also used it for evaluation.

Are you sure the dataset you're using is provided correctly and contains annotations (that are annotated with "answer": "accept")? I just double-checked and the "0 total examples" reported on the CLI is the number of examples in the dataset that were accepted and contain "tokens".

Btw, one small change I made for the upcoming release: train now fails more gracefully if no training or evaluation examples are available, so you're not getting a cryptic traceback anymore.