Training Dependency parsing with sparse annotations

I am trying to build a dependency parser on short phrases which do not necessarily grammatical like natural language style sentence. I have some annotations on the phrases for dependencies which I transferred from similar longer sentences and some from prodi.gy annotations. The way I have them is in pairs of dependencies and does not necessarily guarantee a ROOT for the phrase every time. I have couple of questions --

  1. With just pairs of dependencies, what is the best way to convert them in the spacy training json format ?
  2. Would these incomplete sparse training pairs (without necessarily a ROOT) work with spacy parser retraining (I am planning to fine-tune existing spacy parser).

Hey,

In theory this could work, but I think it'll be pretty difficult, and I wouldn't give much guarantee of results. spaCy's parser does let you specify that some of the dependencies are missing, by setting them to None.

During training the parser compares the predicted action against the reference parse, and calculates how many gold-standard arcs would be newly unreachable if the predicted action is taken. If the predicted action results in a worse parse than could be reached following some other action, the prediction is regarded as incorrect. The weights are then updated such that all incorrect predictions would receive a lower score next time, while the correct predictions would receive a higher score.

So in theory you might not have to do anything else than pass in your data with the missing values specified correctly. That said, this style of training will definitely only work if you're fine-tuning an existing model. If you start from scratch with such sparse updates, the model will probably not navigate the massive search space of the possible syntactic parses effectively. I would also suggest mixing some fully annotated examples in with your sparse updates, so that the parser model doesn't drift too far away from the original weights you're fine-tuning.

Again, I'm not sure how well this will really work. Let me know how you go, I guess?