tl;dr: Can you give a sample of the text with the heads and dependencies you’ve assigned as the gold standard?
From a theoretical perspective, your approach does make sense. However the implementation of the parser is designed towards syntactic relationships, which are normally between words fairly close together. So, the features in the parser might not work well for your task, and it might be unexpectedly slow.
The other thing about the parsing algorithm is that it’s fairly complicated. If you haven’t seen it yet, I would suggest you have a look at this blog post: https://explosion.ai/blog/parsing-english-in-python . The post is old, but the parser still uses the same transition-based approach supervised by a “dynamic oracle”. It’s just that we use a neural network to optimise, instead of this simpler ML algorithm.
These two sections are especially relevant:
The gist of this is that we’re setting up an initial state that has a stack, a queue of words, and a set of dependency arcs. We then define some fixed set of actions that we’ll use to transition from one state to another. This lets us map the parsing task to the task of predicting a set of actions that end in with a desirable parse tree.
The important thing to understand for your error is that the training algorithm requires us to assign a “cost” to each potential action, where the cost is the number of additional errors that action would introduce. In other words: What’s the score of the best parse we can make from this state? Okay, what’s the score of the best parse we can make if we apply this action to this state? The cost of the action is the difference between the two.
So, your error is saying that given the state you’re in and the gold-parse you’ve assigned to the sentence, none of the actions result in zero cost. This might occur if there’s no way to derive the gold-standard you assigned, given the actions the parser has (for instance if it doesn’t have a label in your gold-standard).
Another thing to keep in mind is that the parser has to build a tree that covers every token. It’s possible to underspecify the tree, in which case the parser will have no guidance for some of the arcs. In your case, if you’re trying to only extract relations between some entities in a whole document, you might end up with the vast majority of tokens underspecified. This will probably be very hard for the parser to learn from.