Recipe for custom intent parser to train spaCy

enhancement
done
spacy

(Gurudev) #1

I need to train my own intent parser for spaCy and being able to do so using prodigy would be great and would save me a lot of time.

Please let me know if there is a recipe for the same. If not, is there any quick work around?

Thanks for this great tool.


(Ines Montani) #2

Hi! So I guess you’re referring to the spaCy example of training an intent parser using spaCy’s dependency parser, right?

The good news is, we’re currently working on getting Prodigy v1.4.0 ready (coming this week!), which will include an experimental built-in interface for dependency annotation, as well as dep.teach and dep.batch-train recipes. Those will work with any spaCy model – so you can use it to improve the default syntactic dependency parser, but also any customised version of it with different labels, like the intent parser shown in the example above The interface will look like this and will focus on one dependency at a time:

dep_parser

In the meantime, you might also find this thread on annotating dependencies and relations useful. I’m outlining a few solutions for how to render dependency annotations using custom HTML interfaces.

In order to get over the “cold start problem”, you’ll still need some initial annotations to pre-train the model, so it can start making meaningful suggestions. You could bootstrap those by repurposing the manual annotation interface – similar to the manual POS tag annotation:

pos

Instead of POS tags, you’d simply use your intents as the label set defined via --label. If your label set is quite small, you could also create two labels per dependency – for example, PLACE_HEAD and PLACE_CHILD. In your custom theme settings, you can also define your own colours for those labels.

When annotating in manual mode, I’d recommend only focusing on one label at a time – even if this means making several passes over your data. So you’d start off by doing all PLACE dependencies and then re-start the server and annotate all QUALITY dependencies. This way, your brain only has to focus on one concept at a time, which makes annotation faster and more efficient (and also less prone to human error). The manual annotation interface also “snaps” your selection to the token boundaries – this means you won’t have to worry about highlighting exact characters, and you can even double-click on single-token spans to highlight them.

The annotations you collect are stored in a simple JSON format, with a list of "spans" containing the highlighted spans of text, their indices and the label. So once you’re done, you can export the dataset and convert it to the format you need for pre-training your model. You’ll find more details on the formats and other specifics in the PRODIGY_README.html, available for download with Prodigy.


Training dependency parser
(Gurudev) #3

Thank you so much. v1.4 will be godsend for me.

Thanks for the detailed explanation and tips as well. spaCy made NLP easy, Prodigy is making it quick.


(Ines Montani) #4

Just released v1.4.0, which comes with a dependency annotation interface and (still experimental) dep.teach, dep.batch-train and dep.train-curve recipes! :tada: See here for a demo of the new interface.


(Gurudev) #5

Thanks so much.