rel.manual training: Invalid data for component 'ner' & ValueError: Could not find gold transition

apatel415 · July 2, 2022, 12:57am

I created a dataset with the joint entity and relation annotations (rel.manual). I attempted to train a model with the dataset I created via rel.manual recipe and received the error shown below. I assume the span labels should be creating valid data for the 'ner' component, so unsure what could be causing this issue. Would appreciate any support, thanks!

=========================== Initializing pipeline ===========================

[2022-07-01 20:28:54,391] [INFO] Set up nlp object from config

Components: ner, parser

Merging training and evaluation data for 2 components

[ner] Training: 686 | Evaluation: 171 (20% split)

✘ Invalid data for component 'ner'

spans -> 0 -> start field required

spans -> 0 -> end field required

Here is the sample data:
{'text': 'The statement credit benefit applies to the Global Entry, TSA Pre Check or NEXUS programs.', 'meta': {'source': 'Aeroplan Card | Chase.com'}, '_input_hash': 1972453691, '_task_hash': -354522162, '_is_binary': False, 'spans': [{'label': 'OtherPrograms'}, {'start': 44, 'end': 56, 'token_start': 7, 'token_end': 8, 'label': 'OtherPrograms'}, {'start': 58, 'end': 71, 'token_start': 10, 'token_end': 12, 'label': 'OtherPrograms'}, {'start': 75, 'end': 80, 'token_start': 14, 'token_end': 14, 'label': 'OtherPrograms'}], 'tokens': [{'text': 'The', 'start': 0, 'end': 3, 'id': 0, 'ws': True, 'disabled': False}, {'text': 'statement', 'start': 4, 'end': 13, 'id': 1, 'ws': True, 'disabled': False}, {'text': 'credit', 'start': 14, 'end': 20, 'id': 2, 'ws': True, 'disabled': False}, {'text': 'benefit', 'start': 21, 'end': 28, 'id': 3, 'ws': True, 'disabled': False}, {'text': 'applies', 'start': 29, 'end': 36, 'id': 4, 'ws': True, 'disabled': False}, {'text': 'to', 'start': 37, 'end': 39, 'id': 5, 'ws': True, 'disabled': False}, {'text': 'the', 'start': 40, 'end': 43, 'id': 6, 'ws': True, 'disabled': False}, {'text': 'Global', 'start': 44, 'end': 50, 'id': 7, 'ws': True, 'disabled': False}, {'text': 'Entry', 'start': 51, 'end': 56, 'id': 8, 'ws': False, 'disabled': False}, {'text': ',', 'start': 56, 'end': 57, 'id': 9, 'ws': True, 'disabled': False}, {'text': 'TSA', 'start': 58, 'end': 61, 'id': 10, 'ws': True, 'disabled': False}, {'text': 'Pre', 'start': 62, 'end': 65, 'id': 11, 'ws': True, 'disabled': False}, {'text': 'Check', 'start': 66, 'end': 71, 'id': 12, 'ws': True, 'disabled': False}, {'text': 'or', 'start': 72, 'end': 74, 'id': 13, 'ws': True, 'disabled': False}, {'text': 'NEXUS', 'start': 75, 'end': 80, 'id': 14, 'ws': True, 'disabled': False}, {'text': 'programs', 'start': 81, 'end': 89, 'id': 15, 'ws': False, 'disabled': False}, {'text': '.', 'start': 89, 'end': 90, 'id': 16, 'ws': False, 'disabled': False}], '_view_id': 'relations', 'relations': , 'answer': 'accept', '_timestamp': 1656535942}

After manually fixing the annotations, I tried training the model again and received this error on the relation side:

ValueError: Could not find gold transition - see logs above.

koaning · July 4, 2022, 11:41am

Could you expand on what model you tried to train? Did you try to train a NER model or do you have a custom model for the relationships?

apatel415 · July 4, 2022, 2:21pm

Yeah I tried training NER and parser. After a lot of digging, I realized to train entity relations I need to create a custom model.

I am relatively new to NLP so wanted to see if you have any good resources to train NER and Relation Extraction at the same time? My initial thought is to:

use prodigy rel.manual recipe to create the training/testing data
use the rel_component model provided in the spacy entity relation extraction component tutorial
modify the config files to include NER components as well
train the model

Are there any suggestions or steps I may be missing?

Thanks!

apatel415 · July 4, 2022, 3:54pm

@koaning after attempting the steps above, I got the following error. Can't seem to find support on this matter online.

=========================== Initializing pipeline ===========================
✘ Config validation error
Bad value substitution: option 'width' in section 'components.ner.model.tok2vec' contains an interpolation key 'components.tok2vec.model.encode.width' which is not a valid option name. Raw value: '${components.tok2vec.model.encode.width}

This is my config file:

[paths]
train = null
dev = null
raw = null
init_tok2vec = null

[system]
seed = 342
gpu_allocator = null

[nlp]
lang = "en"
pipeline = ["tok2vec", "ner", "relation_extractor"]
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
batch_size = 1000

[components]

[components.ner]
factory = "ner"
incorrect_spans_key = null
moves = null
update_with_oracle_cut_size = 100

[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = true
nO = null

[components.ner.model.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
width = ${components.tok2vec.model.encode.width}
upstream = "*"

[components.tok2vec]
factory = "tok2vec"

[components.tok2vec.model]
@architectures = "spacy.HashEmbedCNN.v1"
pretrained_vectors = null
width = 96
depth = 2
embed_size = 2000
window_size = 1
maxout_pieces = 3
subword_features = true

[components.relation_extractor]
factory = "relation_extractor"
threshold = 0.5

[components.relation_extractor.model]
@architectures = "rel_model.v1"

[components.relation_extractor.model.create_instance_tensor]
@architectures = "rel_instance_tensor.v1"

[components.relation_extractor.model.create_instance_tensor.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
width = ${components.tok2vec.model.width}

[components.relation_extractor.model.create_instance_tensor.pooling]
@layers = "reduce_mean.v1"

[components.relation_extractor.model.create_instance_tensor.get_instances]
@misc = "rel_instance_generator.v1"
max_length = 20

[components.relation_extractor.model.classification_layer]
@architectures = "rel_classification_layer.v1"
nI = null
nO = null

[initialize]

[initialize.components]

[corpora]

[corpora.dev]
@readers = "Gold_ents_Corpus.v1"
file = ${paths.dev}

[corpora.train]
@readers = "Gold_ents_Corpus.v1"
file = ${paths.train}

[training]
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
patience = 1600000
max_epochs = 0
max_steps = 10000
eval_frequency = 500
frozen_components = []
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
before_to_disk = null
logger = {"@loggers":"spacy.ConsoleLogger.v1"}

[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2

[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001
learn_rate = 0.001

[training.score_weights]
rel_micro_p = 0.0
rel_micro_r = 0.0
rel_micro_f = 1.0

koaning · July 5, 2022, 6:35am

You seem to refer to "the spacy entity relation extraction component tutorial". Could you share a link to this tutorial?

Also, are you using spaCy to train your model or are you using Prodigy? Could you share the command that you ran to train the model?

As general advice, how did you decide which components to add for NER? In general I recommend making a config via the quickstart widget.

apatel415 · July 6, 2022, 12:58am

Hi, I was able to resolve the issue. Was a simple error in the config file where I didnt need to include 'encode'. Thanks.

Topic		Replies	Views
Rel training usage , relations , training	7	1273	May 22, 2023
Train recipe for parser: ValueError: Could not find gold transition enhancement , spacy , dep	5	1311	February 23, 2023
Training a relation extraction component solved , relations , training	84	5709	June 27, 2023
Training NER and relations extraction (RE) together usage , spacy , relations	9	4603	June 10, 2022
rel.manual to train ner and dependency ner , done , solved , dep , relations	15	2049	September 7, 2020

rel.manual training: Invalid data for component 'ner' & ValueError: Could not find gold transition

Related topics