Hi everyone,
I’m trying to use spaCy and prodigy to read resumes, by identifying its entities and the relationships between the entities as a dependency tree, which of course will be custom semantics. I’m using spaCy’s named entity recognizer bootstrapped with around 400 annotated resumes, that also have the relationships annotated. Due to the nature of my domain I should only have one tree per resume, making the entire resume the sentence. In order to do this, my model’s pipeline includes the named entity recognizer, a custom component for merging the entities as single tokens, a custom component for making the whole document a single sentence, and finally the parser.
I’m having trouble setting up this environment so that I can then keep training it with prodigy.
I’ve tried two approaches.
In one I tried having it all as part of a single Language instance, and use that model for both the ner and the dependency recipes, but the python kernel keeps dying and restarting when it’s supposed to parse. The details of how I attempted this can be found here.
In my second approach I tried having two different Language instances, one for the ner, and one for the parser. I’d then train each model with the appropriate recipe. I would then use a function that processes the text with the ner model, merges the entities, and leaves the doc as a single sentence as the make_doc of the parser Language instance. The error I am now getting with this approach is “Could not find a gold-standard action to supervise thedependency parser. The GoldParse was projective. The transition system has 207 actions.”. The details of this implementation can be found here.
My question is, am I approaching the problem wrong or are spaCy’s models simply not appropriate for the problem I’m working on? And if they’re not, would it be possible for me to create a custom dependency parsing model and train it with prodigy?
Thank you