train --spancat questions

mhlucero · January 22, 2022, 11:25pm

Hello, This is my first post in the forum. Sorry if I make mistakes, so here goes.

I'm using prodigy 1.11.6, spacy 3.1.4, spacy-transformers 1.0.6 and python 3.7.3 for this project.

I begin with some questions about training with spancat

python -m prodigy train dest_path --spancat dataset -m en_core_web_lg --gpu-id 0

Why LOSS TOK2VEC column values are always zero? I'm missing something?

here the model I'm loading is

When I load a model trained using transformers get this error

python -m prodigy train dest_path --spancat dataset -m en_core_web_lg --gpu-id 0

nlp_span =spacy.load(path)
ValueError: Cannot deserialize model: mismatched structure

Please , some help will be most appreciated
!

Thanks so much, best!

ljvmiranda921 · January 24, 2022, 9:48am

Hi @mhlucero , welcome to Prodigy!

For the first question, I'm curious if you're combining models in some way? Is there anything in your pipeline
that you're customizing in particular?

For the second question, can you try upgrading your spacy-transformers version? Perhaps the error is coming from an incompatible deserealization of an older model. Upgrading to v1.1.x should work!

mhlucero · January 26, 2022, 11:05am

Hello Miranda! Thanks so much for your answer!

First Question: Nop, I'm using the standard train spancat, with no modifications on the model or config.
Second, about transformers. I updated the spacy-transformers library and did a retraining, with the same result.
Here is the output. As you can see the first column values are always zero (using en_core_web_trf or en_core_web_lg)

>python -m prodigy train ./spans --spancat spans_adq -m en_core_web_trf --gpu-id 0
ℹ Using GPU: 0

========================= Generating Prodigy config =========================
ℹ Auto-generating config with spaCy
Using 'spacy.ngram_range_suggester.v1' for 'spancat' with sizes 1 to 14 (inferred from data)
ℹ Using config from base model
✔ Generated training config

=========================== Initializing pipeline ===========================
[2022-01-26 01:54:36,369] [INFO] Set up nlp object from config
Components: spancat
Merging training and evaluation data for 1 components
  - [spancat] Training: 5604 | Evaluation: 1400 (20% split)
Training: 5247 | Evaluation: 1376
Labels: spancat (4)
[2022-01-26 01:54:37,147] [INFO] Pipeline: ['transformer', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner', 'spancat']
[2022-01-26 01:54:37,148] [INFO] Resuming training for: ['transformer']
[2022-01-26 01:54:37,154] [INFO] Created vocabulary
[2022-01-26 01:54:37,155] [INFO] Finished initializing nlp object
[2022-01-26 01:54:38,254] [INFO] Initialized pipeline components: ['spancat']
✔ Initialized pipeline

============================= Training pipeline =============================
Components: spancat
Merging training and evaluation data for 1 components
  - [spancat] Training: 5604 | Evaluation: 1400 (20% split)
Training: 5247 | Evaluation: 1376
Labels: spancat (4)
ℹ Pipeline: ['transformer', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer',
'ner', 'spancat']
ℹ Frozen components: ['tagger', 'parser', 'attribute_ruler', 'lemmatizer',
'ner']
ℹ Initial learn rate: 0.0
E    #       LOSS TRANS...  LOSS SPANCAT  SPANS_SC_F  SPANS_SC_P  SPANS_SC_R  SCORE
---  ------  -------------  ------------  ----------  ----------  ----------  ------
  0       0           0.00       7624.78        1.00        0.50       52.42    0.01
 16    1000           0.00    1308630.16       50.70       80.95       36.91    0.51
 32    2000           0.00      92991.15       66.33       77.73       57.84    0.66
 49    3000           0.00      74833.84       71.21       79.33       64.59    0.71
 65    4000           0.00      64684.00       73.33       80.59       67.27    0.73
 81    5000           0.00      57694.04       74.62       81.72       68.66    0.75
 98    6000           0.00      52803.86       75.42       82.19       69.68    0.75
114    7000           0.00      48827.18       76.11       82.89       70.34    0.76
130    8000           0.00      45509.48       76.48       83.13       70.82    0.76
147    9000           0.00      42590.01       76.94       83.36       71.44    0.77
163   10000           0.00      39692.86       77.30       83.40       72.03    0.77
180   11000           0.00      36962.59       77.37       82.78       72.62    0.77
196   12000           0.00      34527.88       77.95       82.69       73.72    0.78
212   13000           0.00      32684.88       78.36       82.35       74.74    0.78
229   14000           0.00      30843.14       78.41       81.80       75.29    0.78
245   15000           0.00      29531.76       78.61       81.81       75.66    0.79
261   16000           0.00      28393.27       78.80       81.86       75.95    0.79

Topic		Replies	Views
mismatched structure when using tranformers model to train textcat (en_core_web_trf) textcat , spacy , transformers	16	1349	March 29, 2023
Unable to use train and run data-to-spacy recipes for spancat on prodigy 1.11.10 solved , spancat	4	878	May 4, 2023
Transform annotations to match tokenization required for SpanBERT/BERT spacy , transformers , spancat	19	1611	July 30, 2023
Spancat is not trained spancat	12	1113	July 27, 2022
Spancat Give scores 0	5	528	February 20, 2024

train --spancat questions

Related topics