rel.manual to train ner and dependency

Hi Prodigy team,

I have upgraded the prodigy to 1.10 and annotated using rel.manual receipe around 500 annotations for both spans and relations, now i do not know how shall i train for ner using the same. Whenever i try to train for ner using :

CMD: prodigy train ner rel_compdict_v1 ./ner_v1 -es 0.2 -o ./ner_teach_v3

where 'rel_compdict_v1' is the dataset annotated usinf rel.manual receipe

it shows me followin error:

✘ Invalid data for component 'ner'

spans -> 16 -> start field required
spans -> 16 -> end field required

and when i train for parser using:
prodigy train parser rel_compdict_v1 ./ner_v1 -es 0.2 -o ./ner_teach_v3

i get:

Created and merged data for 522 total examples
Using 418 train / 104 eval (split 20%)
Component: parser | Batch size: compounding | Dropout: 0.2 | Iterations: 10
:information_source: Baseline accuracy: 0.000

=========================== :sparkles: Training the model ===========================

:heavy_check_mark: Saved model: /home/sahil/py/matterhorn/sahil/ner_teach_v3

but the model is not trained.

Please help..

Regards,
Sahil

Hi! That's strange – this would indicate that somewhere in your data, it ended up with a span that doesn't specify a start and end :thinking: Are you able to find this example in your data and if so, can you share what it looks like?

Hi Ines,
Thanks for quick reply.
Yes i could figure it out, i do not konw how but a span with no 'start' and 'end' was present in data. So now i could train NER.

but stuck in training dependency parser with entity relations. Any help there?

Regards,
Sahil

How did you create the data? Did you ever import anything manually, or did the source data maybe include any pre-defined spans? If not and you only used the rel.manual on raw data, that's also something we should look into because it could indicate a bug with how the spans are set.

I still need to look into that. Can you export your dataset with data-to-spacy and if so, what does the result look like? Does it contain dependency labels and heads?

No i never added anything manually, the dataset was annotated through rel.manual only. To solve it i exported dataset using do-out cleaned the annotations and re imported.

I tried data-to-spacy and i got annotations like:

"tokens":[
              {
                "id":31,
                "orth":"a",
                "head":0,
                "dep":""
              },
              {
                "id":32,
                "orth":"Maryland",
                "head":-3,
                "dep":"juri"
              },
              {
                "id":33,
                "orth":"corporation",
                "head":0,
                "dep":""
              },
              {
                "id":34,
                "orth":"(",
                "head":0,
                "dep":""
              },
              {
                "id":35,
                "orth":"\u201c",
                "head":0,
                "dep":""
              },
              {
                "id":36,
                "orth":"Ashford",
                "head":0,
                "dep":""
              },
              {
                "id":37,
                "orth":"Select",
                "head":-8,
                "dep":"nick"
              },
              {
                "id":38,
                "orth":"\u201d",
                "head":0,
                "dep":""
              },
              {
                "id":39,
                "orth":")",
                "head":0,
                "dep":""
              },
              {
                "id":40,
                "orth":",",
                "head":0,
                "dep":""
              },
              {
                "id":41,
                "orth":"ASHFORD",
                "head":0,
                "dep":""
              },
              {
                "id":42,
                "orth":"HOSPITALITY",
                "head":0,
                "dep":""
              },
              {
                "id":43,
                "orth":"SELECT",
                "head":0,
                "dep":""
              },
              {
                "id":44,
                "orth":"LIMITED",
                "head":0,
                "dep":""
              },
              {
                "id":45,
                "orth":"PARTNERSHIP",
                "head":0,
                "dep":""
              },
              {
                "id":46,
                "orth":",",
                "head":0,
                "dep":""
              },
              {
                "id":47,
                "orth":"a",
                "head":0,
                "dep":""
              },
              {
                "id":48,
                "orth":"Delaware",
                "head":-3,
                "dep":"juri"
              },.......

Okay, that definitely indicates that there's potentially a bug that causes some spans to be added incorrectly without the start/end, which is strange :thinking: I'll look into this.

Do you have an example of the dependencies you've annotated? Are they all between single tokens?

Yes i am attaching a screenshot herewith

Thanks for sharing! This at least partly explains things. If this is your annotation scheme, training a regular dependency parser is not going to work well, as it expects to predict dependencies between single tokens, not entity spans. So you probably want to export your annotations and use a different model implementaion for general-purpose relation prediction, not a syntactic dependency parser.

At the moment, Prodigy will filter out all relations that are not between two single tokens, because the parser can't be updated with those. We should probably show at least one warning like "Excluding X relation annotations" when training a parser, so you know that there's a problem.

Thank you Ines for insights.

Can you help me with your suggestions on two things as i am new to NLP (but have worked on Neural Networks for Image and Video Processing):

  1. can i reannotate the above example to depend on single tokens? if yes can you specify an example.

  2. Any model suggestions that would be good for my usecase.