When Example objects are not created - E930

sofiejb · March 7, 2023, 12:17pm

Hi, I am planning to do some text classification. I have created two datasets, and run the following command to train a model:

python3 -m prodigy train baseline_model --textcat games_train_shuffled,eval:games_dev --base-model "nb_core_news_sm"

The first strange thing to happen is that I get the following message in the terminal:

Components: textcat
Merging training and evaluation data for 1 components
 - [textcat] Training: 1671 | Evaluation: 238 (from datasets)
Training: 0 | Evaluation: 0
Labels: textcat (0)

The datasets appear to be correctly loaded with the correct number of examples, but then it says there are no training examples and no evaliation examples? Also why no labels?
I then get a E930-error: Received invalid get_examples callback in TextCategorizer.initialize. Expected function that returns an iterable of Example objects but got:

I believe the file i load into the datasets are correct since there are no error messages when i use "db-in".
The examples come from a jsonl-file where each line has the form {"text": "...", "accept": ["..."]}

Any suggestions as to why no Example objects are created?

ryanwesslen · March 7, 2023, 1:31pm

hi @sofiejb!

Thanks for your question.

So this error comes up typically when you aren't passing data in the right format (but you may think you are).

Can you provide me your data (or even a sample) of it? Unfortunately as it sounds like the problem is somewhere in your data it's hard for me to diagnose your problem without seeing it.

Where did the data for games_train_shuffled and games_dev come from?

Were these annotations ever created in Prodigy? Were all of them imported (i.e., from some different source)? Or were some created in Prodigy but then merged with annotations from a different source.

Just because you didn't receive an error message for db-in doesn't mean there couldn't be an error message. While we have tried to put tests on there, there's always certain manipulations we never thought of to check for.

Also, are you trying to run a binary classification or multi-class (if so, is it mutually exclusive or not categories)?

By running --textcat this is assuming you're running binary classification. Typically, Prodigy creates annotations like this with textcat.manual with a slightly different format:

{"text":"some text", ... ,"answer":"reject","label":"LABEL1",...} # Negative example
{"text":"some different text", ... ,"answer":"accept","label":"LABEL1",...} # Positive example

If you're certain you're handling your data correctly for binary vs. multi-class, then run db-out and look at the data. If you don't spot anything obvious, can you try to create a sample of your data -- say first 10 records -- and then try to reload it db-in and see if it'll run.

If it does run on some of your data, this is important information as it says not all of your data has issues. Then it immediately means: how can you find which records are having problems? And once you find those records, what's different with them versus your other records.

If it doesn't run for the first 10 records, you can try again on random 2-3 records but I suspect likely it may not.

A related alternative is to run data-to-spacy and export out your data as .spacy bin files and a default spaCy config. You can then try to train with spacy train (it should provide instructions on how to run). But alternatively, you may want to run spacy debug data on your config file.

I tried something similar yesterday and if you get a ValueError: [E913] Corpus path can't be None. error then add the explicit file path to your train.spacy and dev.spacy files in your config file under:

[paths]
train = path/to/spacy.train
dev = path/to/dev.train
...

Topic		Replies	Views
Textcat - teach to train. usage , textcat	2	553	September 1, 2022
Finding rare positive examples for textcat usage , textcat , solved	1	705	December 31, 2017
v1.9.7 train with --eval-id gives error textcat	3	1173	April 24, 2020
Leveraging negative examples collected through Prodigy with transformer model (textcat) usage , textcat , transformers	0	481	December 15, 2020
Understanding textcat.teach from PyData Berlin 2018 talk textcat , solved	3	635	October 11, 2018

When Example objects are not created - E930

Related topics