Can Prodigy be used for quotation detection?

Hi, I haven’t purchased Prodigy yet and I have a question about whether I can use it for my use case.

I have a corpus of news articles and I want to train a model to detect quotations in those articles. Additionally I want to train a model to detect who the previously detected quotations are attributed to.

It seems similar to the NER detection as a problem but it’s not clear from the documentation if I’ll be able to use Prodigy for this. I’m relatively new to the NLP field but I’m an experienced programmer. Could someone knowledgeable please advise? Thanks.

I'm not a seasoned NLP practitioner, but I think Prodigy can definitely help with your project. :+1:

When I started working with Prodigy, I was a seasoned developer that didn't know much about building ML models, and my idea was about as developed as yours. I was able to use Prodigy as a way to learn about the data gathering and annotation processes involved in most machine learning projects, without having to start from scratch and build functional models that perform adequately.

Prodigy not only provides a set of state-of-the-art models to start from (via spaCy) but it also has training recipes that are able to update your model "in-the-loop" as you annotate, so you get better results as it improves from your current session. No matter what kind of NLP problem you're working on, you're probably going to need to annotate data, and Prodigy is great for that. Plus if your problem outgrows the prodigy recipes, you can plug in a custom model to suit your needs, so it grows with you.

This sounds like a great use for spaCy/prodigy. You could start by using the Part-of-speech tags that come with the built-in models to find open/close quote chunks and extract the text from between them. You can see a list of the POS tags that come built-in here: Data formats · spaCy API Documentation

I'm sure there are other ways to approach the problem, so you'll want to experiment. Speaking of workflows and experimentation, @honnibal gave a talk about how to build successful NLP projects in which he talks about iterative development practices with NLP and Prodigy: https://youtu.be/jpWqz85F_4Y?t=137

If you're rushed and can't spare ~30 minutes, he gives an example workflow and problem with prodigy here: https://youtu.be/jpWqz85F_4Y?t=698

Good luck with your project!

Hi Robert! I'm looking to do exactly the same - extract and attribute quotes from news articles. First wanted to check if this was possible with Prodigy and saw your post. I'm wondering if it worked out for you. Could you share any tips, please?