I followed Ines's tutorial (Training a NAMED ENTITY RECOGNITION MODEL with Prodigy and Transfer Learning - YouTube) to train Prodigy models to assign labels to expressions in text. I'm wondering about two things - as I understand it, Prodigy uses artificial neural networks (ANNs) to assign labels. Is this accurate? Second, when the Prodigy models assign labels to expressions, do they use the surrounding context (other parts of the text), or do they use only the shape of expressions themselves?
as I understand it, Prodigy uses artificial neural networks (ANNs) to assign labels
Prodigy natively supports spaCy models to pre-highlight text, but it's always the users who assigns the annotation to the dataset. Pre-highlighting can be incredibly useful because it speeds up data annotation, and you can also use active learning techniques to update spaCy models as you label.
That said, Prodigy is scriptable and you can also use any technique you like. You can also use rule based techniques (as opposed to deep learning ones) like the PatternMacther or 3rd party web APIs if you like.
Second, when the Prodigy models assign labels to expressions, do they use the surrounding context (other parts of the text), or do they use only the shape of expressions themselves?
This depends a bit on the model that you're using as well as the task that you're training towards. If you're using spancat then it uses a suggestor method to find potentially interesting substrings first, more information on this can be found in our blogpost here:
On the other hand, if you're doing named entity recognition then there also a variety of spaCy models you could consider. These use transformer models or convolution layers to find entities in text. I think it's fair to say that both approaches use both the "surrounding context" as well as the tokens that are taken into consideration. However, "surrounding context" can mean a lot of things, so if you're curious to understand it better I might suggest watching this tutorial video:
The video is a bit dated, but it does go into a fair amount of depth and might help you understand the underlying approach better.