Problem with training

An error appears when I apply the train ner prodigy recipe. I am not sure why.

Please find below the relevant information and perhaps you could help?

My data --train and evaluation datasets-- have been saved in jsonl files.

The format of the examples are:

`[(text, {'entities' : [(start, end, label), ..., (start, end, label)], 'answer' : 'accept'}), ..., (text, {'entities' : [(start, end, label), ..., (start, end, label)], 'answer' : 'accept'})]`

The steps I followed can be seen below:

python -m prodigy dataset train_30-9-20
Successfully added 'train_30-9-20' to database SQLite
python -m prodigy dataset eval_30-9-20
Successfully added 'eval_30-9-20' to database SQLite
python -m prodigy db-in train_30-9-20 train_30-9-20.jsonl --rehash --dry
Imported 2700 annotations to 'train_30-9-20' (session 2020-09-30_21-05-24) in database SQLite
Found and keeping existing "answer" in 2700 examples
python -m prodigy db-in eval_30-9-20 eval_30-9-20.jsonl --rehash --dry
Imported 520 annotations to 'eval_30-9-20' (session 2020-09-30_21-05-41) in database SQLite
Found and keeping existing "answer" in 520 examples
python -m prodigy train ner train_30-9-20 en_core_web_lg --n-iter 30 --dropout 0.5 --eval-id eval_30-9-20

After loading the model, I got the following output:

Created and merged data for 0 total examples
Created and merged data for 0 total examples
Using 0 train / 0 eval (from 'eval_30-9-20')
Component: ner | Batch size: compounding | Dropout: 0.5 | Iterations: 30
[...]
ValueError: not enough values to unpack (expected 2, got 0)

Am I doing something wrong in the way I format the training examples? Thank you very much in advance!

Hi! Not sure where you found that format, but the tuple style is definitely not whats expected, and the keys Prodigy creates are also different. This is why all of your examples are skipped during training. You can find an example of the JSON format here: https://prodi.gy/docs/api-interfaces#ner_manual

Thank you. So I followed/adapted this template:

The examples now have the following format:

{'text' : some_text, 'spans' : [{ 'start' : some_integer, 'end' : some_integer,  'label' : label_string}, ..., { 'start' : some_integer, 'end' : some_integer,  'label' : label_string}], 'answer' : 'accept'}

Nonetheless, I am getting the exact same error message.

In formating the examples I followed/adapted this template:

My examples now have the following format:

{'text' : some_text, 'spans' : [{ 'start' : some_integer, 'end' : some_integer,  'label' : label_string}, ..., { 'start' : some_integer, 'end' : some_integer,  'label' : label_string}], 'answer' : 'accept'}

Nonetheless, I am getting the following error message:

Created and merged data for 0 total examples
Created and merged data for 0 total examples
Using 0 train / 0 eval (from 'eval_30-9-20')
Component: ner | Batch size: compounding | Dropout: 0.5 | Iterations: 30
[...]
ValueError: not enough values to unpack (expected 2, got 0)

Am I doing something wrong in the way I format the training examples? I think I follow the template.
Thank you!

It's usually not very helpful to post the same comment multiple times and open new threads – this just makes it harder for us to keep track of the questions and it'll take us longer to answer.

The problem here seems to be that there are no examples in the dataset that you're trying to train from. When you run the training with PRODIGY_LOGGING=basic, is there anything suspicious in the logs? Anything about examples being skipped etc.?

(Btw, if you have your data outside of Prodigy and you just want to train, are you sure you don't want to use spaCy directly? This gives you much more control. Prodigy's train command is really just a wrapper around spaCy's training command that lets you load datasets directly.)