I would like to evaluate Prodigy before buying. I had a bad experience with wit.ai.
My texts to analyse are customers asking for quotes, for example
"I want the price of 100 square meters of red shingles for my roof."
I need to extract entities like the amount of square meters, the product and the color.
Do you suggest me to start with SpaCy to see if it would work with my use case before hoping into Prodigy? Or would Prodigy offer a more beginner-friendly approach?
Hi! This was also my background when I started getting into Python and NLP
Yes, this sounds like the best approach in my opinion! The thing about applied NLP that can be difficult, especially if you're just starting out, is getting a good feeling for how to structure and break down a goal (e.g. solve a specific business problem) into solvable machine learning problems that a model can learn effectively (e.g. predict label X with NER component, augment with rules, predict label Y with a text classifier over a whole text, then put everything together).
Prodigy is really designed as a developer tool and help you try out different approaches and create different types of datasets to find out what works best – and while it's definitely easy to use and get started with, I think you'll be able to make much better use of it and feel much more in control once you've had a bit of experience with NLP, e.g. using a library like spaCy, know what components are available, what the different trade-offs are, what data you want to experiment with, and so on.
Btw, in case you haven't seen it yet, we actually have an interactive online course that walks you through all the basics, all the way to creating a simple dataset and training a first model: https://course.spacy.io
Thank you for your guidance.
I'm already taking the spaCy course. It's super well done
Will SpaCy also detect intents in a sentence? Is that what is Text Classification for?
I would like to know if a customer's message is asking for pricing, or if they are just asking for general info.
Thanks, that's nice to hear!
Yes, framing this problem as a text classification task would be a good approach for this IMO. So your model would predict one label
PRICING, and in your training data, you'd then annotate whether the message is about pricing or not.
(This is actually a great example of the decomposition problem I mentioned in my previous post: there are different ways to approach this problem from a machine learning perspective and some of them are likely going to work better than others. For instance, someone else could come up with the idea to approach this as a span/entity prediction problem and annotate words and phrases that express the "pricing intent". There's a high chance that this would be less effective than framing it as text classification: the phrase boundaries here are often vague, so it's easy to end up with incosistent annotations. Plus, the model implementations typically have a narrower context window, so they wouldn't be able to take the whole message into account. So these are the types of decisions that will be most important, and will also become more intuitive as you work on more NLP problems.)