Hi, I need help classifying entities on contracts in spanish. I’ve classified them and also obtained the type of clauses, and entities. But I need to extract the context and it’s been hard. Can prodigy help me with this? how many I need to manually tag, if I have some regex that are currently working, at least for the entities. Is this tool good enough to get the context on spanish contracts?
Prodigy is an annotation tool that lets you create training data for machine learning models. Whether the models will learn what you want them to learn, and whether your problem can be solved depends on how you decide to break down your problem, the data you're labelling and how you're training the model.
Prodigy can help you label data more efficiently, and if you want to use spaCy (which Prodigy integrates with out-of-the-box), it can also help you run training experiments faster. But legal NLP isn't trivial and you'll likely have to run a lot of experiments and try out different things until you end up with a system that works for you. It's also totally possible that after your experiments, you'll find out that a machine learning system currently isn't able to beat your regular expressions
If you have a set of regular expressions that's working well, you can use those to bootstrap training data and create suggestions. So instead of labelling everything by hand, you'll only need to correct what your rules got wrong.
You also want to keep an eye on the local context around the entities you're interested in – especially the surrounding tokens on each side. Those will be most relevant to make the decision. If the local context doesn't have enough clues, the model may struggle to learn the distinctions. For cases like this, a mix of rules and a statistical model might be a better fit. This thread has more details and examples:
You might also find @honnibal's talk on structuring NLP projects helpful, which also shows some examples of spaCy and Prodigy:
spaCy is an open-source library for Natural Language Processing that can help you train your own models for named entity recognition. Named entity recognition models can predict "real-world objects" like persons or organisations, based on the surrounding context. See here for details:
Prodigy is an annotation tool that can help you create training data for those models. See here for details:
Yes, the live demo only really shows the UI. The back-end is a bit harder to demo online, as it’s a command-line app and Python library.
We normally do trials by hosting a VM for you that you can log into. This lets you get the full experience of the tool, including the scriptable back-end. You can email us at contact@explosion.ai