Training spacy dependency parser using relations custom recipe

I want to use the new relations interface to train the spacy dependency parser and i have a few questions:

  1. If i annotate a given sentence sparsely (not annotating every relation found in the sentence but only a few randomly selected ones), would that affect the training of the spacy model, as compared to annotating each sentence fully (annotating every relationship possible in every sentence in the dataset).
  2. If i create custom relations between multi-token entities, will the spacy model learn to identify these multi-token entities and the relations. For example: There is a cyst in the liver measuring 4 x 4 cm. The relation being between 'cyst', and the measurement '4 x 4 cm'
  3. When the spacy model is being trained, is each relation in a given sentence treated as an independent event, i.e. the prediction by the model of a relationship between two tokens is not dependent on any other token or relationship present in the sentence.

Hoping to hear from you soon!

Sure – you'll typically always get better results with more complete annotations. spaCy can learn from sparse annotations and if you already have an existing model you want to improve (e.g. a pretrained dependency parser), you can still get good results for the labels you care about if you update it with sparse annotations only. But more is usually better.

No, a dependency parser is typically designed to predict syntactic relationships between individual tokens. It's probably not a good choice if you want to predict relations between entities. You can use the entity annotations to train a named entity recognizer with spaCy, and then use the named entity labels and the relations as features in the next step to predict relationships. But you'd have to bring your own implementation for that, depending on what you need.

If you are training a dependency parser, the parser will be constrained to only predict valid parses. For instance, each sentence can only have one root, and dependencies cannot cross sentence boundaries. This makes sense, because those analyses would always be invalid as a dependency parse. That's another reason why syntactic dependency parsing probably isn't a good way to frame your specific problem.