rel.manual training: Invalid data for component 'ner' & ValueError: Could not find gold transition

I created a dataset with the joint entity and relation annotations (rel.manual). I attempted to train a model with the dataset I created via rel.manual recipe and received the error shown below. I assume the span labels should be creating valid data for the 'ner' component, so unsure what could be causing this issue. Would appreciate any support, thanks!

=========================== Initializing pipeline ===========================

[2022-07-01 20:28:54,391] [INFO] Set up nlp object from config

Components: ner, parser

Merging training and evaluation data for 2 components

  • [ner] Training: 686 | Evaluation: 171 (20% split)

✘ Invalid data for component 'ner'

spans -> 0 -> start field required

spans -> 0 -> end field required

Here is the sample data:
{'text': 'The statement credit benefit applies to the Global Entry, TSA Pre Check or NEXUS programs.', 'meta': {'source': 'Aeroplan Card | Chase.com'}, '_input_hash': 1972453691, '_task_hash': -354522162, '_is_binary': False, 'spans': [{'label': 'OtherPrograms'}, {'start': 44, 'end': 56, 'token_start': 7, 'token_end': 8, 'label': 'OtherPrograms'}, {'start': 58, 'end': 71, 'token_start': 10, 'token_end': 12, 'label': 'OtherPrograms'}, {'start': 75, 'end': 80, 'token_start': 14, 'token_end': 14, 'label': 'OtherPrograms'}], 'tokens': [{'text': 'The', 'start': 0, 'end': 3, 'id': 0, 'ws': True, 'disabled': False}, {'text': 'statement', 'start': 4, 'end': 13, 'id': 1, 'ws': True, 'disabled': False}, {'text': 'credit', 'start': 14, 'end': 20, 'id': 2, 'ws': True, 'disabled': False}, {'text': 'benefit', 'start': 21, 'end': 28, 'id': 3, 'ws': True, 'disabled': False}, {'text': 'applies', 'start': 29, 'end': 36, 'id': 4, 'ws': True, 'disabled': False}, {'text': 'to', 'start': 37, 'end': 39, 'id': 5, 'ws': True, 'disabled': False}, {'text': 'the', 'start': 40, 'end': 43, 'id': 6, 'ws': True, 'disabled': False}, {'text': 'Global', 'start': 44, 'end': 50, 'id': 7, 'ws': True, 'disabled': False}, {'text': 'Entry', 'start': 51, 'end': 56, 'id': 8, 'ws': False, 'disabled': False}, {'text': ',', 'start': 56, 'end': 57, 'id': 9, 'ws': True, 'disabled': False}, {'text': 'TSA', 'start': 58, 'end': 61, 'id': 10, 'ws': True, 'disabled': False}, {'text': 'Pre', 'start': 62, 'end': 65, 'id': 11, 'ws': True, 'disabled': False}, {'text': 'Check', 'start': 66, 'end': 71, 'id': 12, 'ws': True, 'disabled': False}, {'text': 'or', 'start': 72, 'end': 74, 'id': 13, 'ws': True, 'disabled': False}, {'text': 'NEXUS', 'start': 75, 'end': 80, 'id': 14, 'ws': True, 'disabled': False}, {'text': 'programs', 'start': 81, 'end': 89, 'id': 15, 'ws': False, 'disabled': False}, {'text': '.', 'start': 89, 'end': 90, 'id': 16, 'ws': False, 'disabled': False}], '_view_id': 'relations', 'relations': , 'answer': 'accept', '_timestamp': 1656535942}


After manually fixing the annotations, I tried training the model again and received this error on the relation side:

ValueError: Could not find gold transition - see logs above.

Could you expand on what model you tried to train? Did you try to train a NER model or do you have a custom model for the relationships?

Yeah I tried training NER and parser. After a lot of digging, I realized to train entity relations I need to create a custom model.

I am relatively new to NLP so wanted to see if you have any good resources to train NER and Relation Extraction at the same time? My initial thought is to:

  1. use prodigy rel.manual recipe to create the training/testing data
  2. use the rel_component model provided in the spacy entity relation extraction component tutorial
  3. modify the config files to include NER components as well
  4. train the model

Are there any suggestions or steps I may be missing?

Thanks!

@koaning after attempting the steps above, I got the following error. Can't seem to find support on this matter online.

=========================== Initializing pipeline ===========================
✘ Config validation error
Bad value substitution: option 'width' in section 'components.ner.model.tok2vec' contains an interpolation key 'components.tok2vec.model.encode.width' which is not a valid option name. Raw value: '${components.tok2vec.model.encode.width}

This is my config file:

[paths]
train = null
dev = null
raw = null
init_tok2vec = null

[system]
seed = 342
gpu_allocator = null

[nlp]
lang = "en"
pipeline = ["tok2vec", "ner", "relation_extractor"]
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
batch_size = 1000

[components]

[components.ner]
factory = "ner"
incorrect_spans_key = null
moves = null
update_with_oracle_cut_size = 100

[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = true
nO = null

[components.ner.model.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
width = ${components.tok2vec.model.encode.width}
upstream = "*"

[components.tok2vec]
factory = "tok2vec"

[components.tok2vec.model]
@architectures = "spacy.HashEmbedCNN.v1"
pretrained_vectors = null
width = 96
depth = 2
embed_size = 2000
window_size = 1
maxout_pieces = 3
subword_features = true

[components.relation_extractor]
factory = "relation_extractor"
threshold = 0.5

[components.relation_extractor.model]
@architectures = "rel_model.v1"

[components.relation_extractor.model.create_instance_tensor]
@architectures = "rel_instance_tensor.v1"

[components.relation_extractor.model.create_instance_tensor.tok2vec]
@architectures = "spacy.Tok2VecListener.v1"
width = ${components.tok2vec.model.width}

[components.relation_extractor.model.create_instance_tensor.pooling]
@layers = "reduce_mean.v1"

[components.relation_extractor.model.create_instance_tensor.get_instances]
@misc = "rel_instance_generator.v1"
max_length = 20

[components.relation_extractor.model.classification_layer]
@architectures = "rel_classification_layer.v1"
nI = null
nO = null

[initialize]

[initialize.components]

[corpora]

[corpora.dev]
@readers = "Gold_ents_Corpus.v1"
file = ${paths.dev}

[corpora.train]
@readers = "Gold_ents_Corpus.v1"
file = ${paths.train}

[training]
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
patience = 1600000
max_epochs = 0
max_steps = 10000
eval_frequency = 500
frozen_components = []
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
before_to_disk = null
logger = {"@loggers":"spacy.ConsoleLogger.v1"}

[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2

[training.batcher.size]
@schedules = "compounding.v1"
start = 100
stop = 1000
compound = 1.001

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001
learn_rate = 0.001

[training.score_weights]
rel_micro_p = 0.0
rel_micro_r = 0.0
rel_micro_f = 1.0

You seem to refer to "the spacy entity relation extraction component tutorial". Could you share a link to this tutorial?

Also, are you using spaCy to train your model or are you using Prodigy? Could you share the command that you ran to train the model?

As general advice, how did you decide which components to add for NER? In general I recommend making a config via the quickstart widget.

Hi, I was able to resolve the issue. Was a simple error in the config file where I didnt need to include 'encode'. Thanks.

1 Like