Spanish NER by context

Hi, I need help classifying entities on contracts in spanish. I’ve classified them and also obtained the type of clauses, and entities. But I need to extract the context and it’s been hard. Can prodigy help me with this? how many I need to manually tag, if I have some regex that are currently working, at least for the entities. Is this tool good enough to get the context on spanish contracts?

Please let me know or contact me by email

Regarding my question, is prodigy good to extract the context of the entity? who is the owner? what is that date and this other? etc

Prodigy is an annotation tool that lets you create training data for machine learning models. Whether the models will learn what you want them to learn, and whether your problem can be solved depends on how you decide to break down your problem, the data you're labelling and how you're training the model.

Prodigy can help you label data more efficiently, and if you want to use spaCy (which Prodigy integrates with out-of-the-box), it can also help you run training experiments faster. But legal NLP isn't trivial and you'll likely have to run a lot of experiments and try out different things until you end up with a system that works for you. It's also totally possible that after your experiments, you'll find out that a machine learning system currently isn't able to beat your regular expressions :wink:

If you have a set of regular expressions that's working well, you can use those to bootstrap training data and create suggestions. So instead of labelling everything by hand, you'll only need to correct what your rules got wrong.

You also want to keep an eye on the local context around the entities you're interested in – especially the surrounding tokens on each side. Those will be most relevant to make the decision. If the local context doesn't have enough clues, the model may struggle to learn the distinctions. For cases like this, a mix of rules and a statistical model might be a better fit. This thread has more details and examples:

You might also find @honnibal's talk on structuring NLP projects helpful, which also shows some examples of spaCy and Prodigy:

Thanks Ines!,

So does Prodigy and Spacy can help me with getting context from the near tokens or just the annotation part?

Regards,
Joaquin

spaCy is an open-source library for Natural Language Processing that can help you train your own models for named entity recognition. Named entity recognition models can predict "real-world objects" like persons or organisations, based on the surrounding context. See here for details:

Prodigy is an annotation tool that can help you create training data for those models. See here for details:

Ines, is there a way I can test a demo with spanish? I’ll like to try that before. I didn’t found a way to try that on the live demo

Yes, the live demo only really shows the UI. The back-end is a bit harder to demo online, as it’s a command-line app and Python library.

We normally do trials by hosting a VM for you that you can log into. This lets you get the full experience of the tool, including the scriptable back-end. You can email us at contact@explosion.ai

Thanks! I really need this to boost my clasification of the data