Problem with training

Roberto · September 30, 2020, 6:27pm

An error appears when I apply the train ner prodigy recipe. I am not sure why.

Please find below the relevant information and perhaps you could help?

My data --train and evaluation datasets-- have been saved in jsonl files.

The format of the examples are:

`[(text, {'entities' : [(start, end, label), ..., (start, end, label)], 'answer' : 'accept'}), ..., (text, {'entities' : [(start, end, label), ..., (start, end, label)], 'answer' : 'accept'})]`

The steps I followed can be seen below:

python -m prodigy dataset train_30-9-20
Successfully added 'train_30-9-20' to database SQLite
python -m prodigy dataset eval_30-9-20
Successfully added 'eval_30-9-20' to database SQLite
python -m prodigy db-in train_30-9-20 train_30-9-20.jsonl --rehash --dry
Imported 2700 annotations to 'train_30-9-20' (session 2020-09-30_21-05-24) in database SQLite
Found and keeping existing "answer" in 2700 examples
python -m prodigy db-in eval_30-9-20 eval_30-9-20.jsonl --rehash --dry
Imported 520 annotations to 'eval_30-9-20' (session 2020-09-30_21-05-41) in database SQLite
Found and keeping existing "answer" in 520 examples
python -m prodigy train ner train_30-9-20 en_core_web_lg --n-iter 30 --dropout 0.5 --eval-id eval_30-9-20

After loading the model, I got the following output:

Created and merged data for 0 total examples
Created and merged data for 0 total examples
Using 0 train / 0 eval (from 'eval_30-9-20')
Component: ner | Batch size: compounding | Dropout: 0.5 | Iterations: 30
[...]
ValueError: not enough values to unpack (expected 2, got 0)

Am I doing something wrong in the way I format the training examples? Thank you very much in advance!

ines · September 30, 2020, 9:33pm

Hi! Not sure where you found that format, but the tuple style is definitely not whats expected, and the keys Prodigy creates are also different. This is why all of your examples are skipped during training. You can find an example of the JSON format here: Annotation interfaces · Prodigy · An annotation tool for AI, Machine Learning & NLP

Roberto · September 30, 2020, 10:15pm

Thank you. So I followed/adapted this template:

The examples now have the following format:

{'text' : some_text, 'spans' : [{ 'start' : some_integer, 'end' : some_integer,  'label' : label_string}, ..., { 'start' : some_integer, 'end' : some_integer,  'label' : label_string}], 'answer' : 'accept'}

Nonetheless, I am getting the exact same error message.

Roberto · October 1, 2020, 12:30pm

In formating the examples I followed/adapted this template:

My examples now have the following format:

{'text' : some_text, 'spans' : [{ 'start' : some_integer, 'end' : some_integer,  'label' : label_string}, ..., { 'start' : some_integer, 'end' : some_integer,  'label' : label_string}], 'answer' : 'accept'}

Nonetheless, I am getting the following error message:

Created and merged data for 0 total examples
Created and merged data for 0 total examples
Using 0 train / 0 eval (from 'eval_30-9-20')
Component: ner | Batch size: compounding | Dropout: 0.5 | Iterations: 30
[...]
ValueError: not enough values to unpack (expected 2, got 0)

Am I doing something wrong in the way I format the training examples? I think I follow the template.
Thank you!

ines · October 1, 2020, 4:08pm

It's usually not very helpful to post the same comment multiple times and open new threads – this just makes it harder for us to keep track of the questions and it'll take us longer to answer.

The problem here seems to be that there are no examples in the dataset that you're trying to train from. When you run the training with PRODIGY_LOGGING=basic, is there anything suspicious in the logs? Anything about examples being skipped etc.?

(Btw, if you have your data outside of Prodigy and you just want to train, are you sure you don't want to use spaCy directly? This gives you much more control. Prodigy's train command is really just a wrapper around spaCy's training command that lets you load datasets directly.)

Topic		Replies	Views
train ner dataset -> ValueError: too many values to unpack ner , done	6	2626	January 10, 2020
Error after a while of using ner.teach ner , done	1	649	November 3, 2017
Training broken after importing training data ner , done , solved	3	449	April 10, 2019
Debugging NER - batch_train with custom dataset ner	5	589	October 16, 2019
Does Prodigy load pre-annotated data? usage , ner , solved	23	2642	October 25, 2018

Problem with training

Related topics