Spanish NER by context

ines · January 23, 2019, 7:00pm

Prodigy is an annotation tool that lets you create training data for machine learning models. Whether the models will learn what you want them to learn, and whether your problem can be solved depends on how you decide to break down your problem, the data you're labelling and how you're training the model.

Prodigy can help you label data more efficiently, and if you want to use spaCy (which Prodigy integrates with out-of-the-box), it can also help you run training experiments faster. But legal NLP isn't trivial and you'll likely have to run a lot of experiments and try out different things until you end up with a system that works for you. It's also totally possible that after your experiments, you'll find out that a machine learning system currently isn't able to beat your regular expressions

If you have a set of regular expressions that's working well, you can use those to bootstrap training data and create suggestions. So instead of labelling everything by hand, you'll only need to correct what your rules got wrong.

You also want to keep an eye on the local context around the entities you're interested in – especially the surrounding tokens on each side. Those will be most relevant to make the decision. If the local context doesn't have enough clues, the model may struggle to learn the distinctions. For cases like this, a mix of rules and a statistical model might be a better fit. This thread has more details and examples:

You might also find @honnibal's talk on structuring NLP projects helpful, which also shows some examples of spaCy and Prodigy:

Topic		Replies	Views
annotating entities in text documents usage , ner , solved	15	9932	November 28, 2017
I'm new to python and NLP. I would like to evaluate Prodigy and need guidance on getting started. usage , best-practices	3	563	February 16, 2021
Prodigy to Spacy Guide ner , spacy , best-practices	4	5332	January 13, 2020
Sentence fragments in context for classification labeling task. ner , textcat , front-end	1	436	September 8, 2020
Annotating custom entities in job descriptions usage , custom , hr	9	1159	June 2, 2019

Spanish NER by context

Related topics