Spanish NER by context

joaquin · January 23, 2019, 2:45pm

Hi, I need help classifying entities on contracts in spanish. I’ve classified them and also obtained the type of clauses, and entities. But I need to extract the context and it’s been hard. Can prodigy help me with this? how many I need to manually tag, if I have some regex that are currently working, at least for the entities. Is this tool good enough to get the context on spanish contracts?

Please let me know or contact me by email

joaquin · January 23, 2019, 5:49pm

Regarding my question, is prodigy good to extract the context of the entity? who is the owner? what is that date and this other? etc

ines · January 23, 2019, 7:00pm

Prodigy is an annotation tool that lets you create training data for machine learning models. Whether the models will learn what you want them to learn, and whether your problem can be solved depends on how you decide to break down your problem, the data you're labelling and how you're training the model.

Prodigy can help you label data more efficiently, and if you want to use spaCy (which Prodigy integrates with out-of-the-box), it can also help you run training experiments faster. But legal NLP isn't trivial and you'll likely have to run a lot of experiments and try out different things until you end up with a system that works for you. It's also totally possible that after your experiments, you'll find out that a machine learning system currently isn't able to beat your regular expressions

If you have a set of regular expressions that's working well, you can use those to bootstrap training data and create suggestions. So instead of labelling everything by hand, you'll only need to correct what your rules got wrong.

You also want to keep an eye on the local context around the entities you're interested in – especially the surrounding tokens on each side. Those will be most relevant to make the decision. If the local context doesn't have enough clues, the model may struggle to learn the distinctions. For cases like this, a mix of rules and a statistical model might be a better fit. This thread has more details and examples:

You might also find @honnibal's talk on structuring NLP projects helpful, which also shows some examples of spaCy and Prodigy:

joaquin · January 24, 2019, 1:18pm

Thanks Ines!,

So does Prodigy and Spacy can help me with getting context from the near tokens or just the annotation part?

Regards,
Joaquin

ines · January 24, 2019, 1:24pm

spaCy is an open-source library for Natural Language Processing that can help you train your own models for named entity recognition. Named entity recognition models can predict "real-world objects" like persons or organisations, based on the surrounding context. See here for details:

Prodigy is an annotation tool that can help you create training data for those models. See here for details:

joaquin · January 24, 2019, 1:58pm

Ines, is there a way I can test a demo with spanish? I’ll like to try that before. I didn’t found a way to try that on the live demo

ines · January 24, 2019, 2:01pm

Yes, the live demo only really shows the UI. The back-end is a bit harder to demo online, as it’s a command-line app and Python library.

We normally do trials by hosting a VM for you that you can log into. This lets you get the full experience of the tool, including the scriptable back-end. You can email us at contact@explosion.ai

joaquin · January 24, 2019, 2:02pm

Thanks! I really need this to boost my clasification of the data

Topic		Replies	Views
Named Entities(manual) usage , ner , solved	4	803	May 11, 2018
annotating entities in text documents usage , ner , solved	15	9923	November 28, 2017
spaCy, prodigy, annotation usage , ner , solved	2	722	February 8, 2019
NER model from scratch (strange behaviour) usage , ner , spacy	7	451	October 13, 2020
sequence labelling with prodigy ? usage	2	628	February 27, 2018

Spanish NER by context

Related topics