Hi! So I guess you’re referring to the spaCy example of training an intent parser using spaCy’s dependency parser, right?
The good news is, we’re currently working on getting Prodigy v1.4.0 ready (coming this week!), which will include an experimental built-in interface for dependency annotation, as well as
dep.batch-train recipes. Those will work with any spaCy model – so you can use it to improve the default syntactic dependency parser, but also any customised version of it with different labels, like the intent parser shown in the example above The interface will look like this and will focus on one dependency at a time:
In the meantime, you might also find this thread on annotating dependencies and relations useful. I’m outlining a few solutions for how to render dependency annotations using custom HTML interfaces.
In order to get over the “cold start problem”, you’ll still need some initial annotations to pre-train the model, so it can start making meaningful suggestions. You could bootstrap those by repurposing the manual annotation interface – similar to the manual POS tag annotation:
Instead of POS tags, you’d simply use your intents as the label set defined via
--label. If your label set is quite small, you could also create two labels per dependency – for example,
PLACE_CHILD. In your custom theme settings, you can also define your own colours for those labels.
When annotating in manual mode, I’d recommend only focusing on one label at a time – even if this means making several passes over your data. So you’d start off by doing all
PLACE dependencies and then re-start the server and annotate all
QUALITY dependencies. This way, your brain only has to focus on one concept at a time, which makes annotation faster and more efficient (and also less prone to human error). The manual annotation interface also “snaps” your selection to the token boundaries – this means you won’t have to worry about highlighting exact characters, and you can even double-click on single-token spans to highlight them.
The annotations you collect are stored in a simple JSON format, with a list of
"spans" containing the highlighted spans of text, their indices and the label. So once you’re done, you can export the dataset and convert it to the format you need for pre-training your model. You’ll find more details on the formats and other specifics in the
PRODIGY_README.html, available for download with Prodigy.