Can Prodigy be used for quotation detection?

RobFC · February 28, 2019, 5:57pm

Hi, I haven’t purchased Prodigy yet and I have a question about whether I can use it for my use case.

I have a corpus of news articles and I want to train a model to detect quotations in those articles. Additionally I want to train a model to detect who the previously detected quotations are attributed to.

It seems similar to the NER detection as a problem but it’s not clear from the documentation if I’ll be able to use Prodigy for this. I’m relatively new to the NLP field but I’m an experienced programmer. Could someone knowledgeable please advise? Thanks.

justindujardin · February 28, 2019, 11:50pm

I'm not a seasoned NLP practitioner, but I think Prodigy can definitely help with your project.

When I started working with Prodigy, I was a seasoned developer that didn't know much about building ML models, and my idea was about as developed as yours. I was able to use Prodigy as a way to learn about the data gathering and annotation processes involved in most machine learning projects, without having to start from scratch and build functional models that perform adequately.

Prodigy not only provides a set of state-of-the-art models to start from (via spaCy) but it also has training recipes that are able to update your model "in-the-loop" as you annotate, so you get better results as it improves from your current session. No matter what kind of NLP problem you're working on, you're probably going to need to annotate data, and Prodigy is great for that. Plus if your problem outgrows the prodigy recipes, you can plug in a custom model to suit your needs, so it grows with you.

This sounds like a great use for spaCy/prodigy. You could start by using the Part-of-speech tags that come with the built-in models to find open/close quote chunks and extract the text from between them. You can see a list of the POS tags that come built-in here: Data formats · spaCy API Documentation

I'm sure there are other ways to approach the problem, so you'll want to experiment. Speaking of workflows and experimentation, @honnibal gave a talk about how to build successful NLP projects in which he talks about iterative development practices with NLP and Prodigy: https://youtu.be/jpWqz85F_4Y?t=137

If you're rushed and can't spare ~30 minutes, he gives an example workflow and problem with prodigy here: https://youtu.be/jpWqz85F_4Y?t=698

Good luck with your project!

aviss · December 30, 2020, 3:34pm

Hi Robert! I'm looking to do exactly the same - extract and attribute quotes from news articles. First wanted to check if this was possible with Prodigy and saw your post. I'm wondering if it worked out for you. Could you share any tips, please?

Topic		Replies	Views
How The Guardian is using spaCy and Prodigy to identify quotes in text project	3	757	December 22, 2021
Using Prodigy to annotate data and train a tokenizer, or to fix the default tokenizer. spacy , custom	4	1339	March 11, 2020
Could Prodigy work for detecting code-switching in text? usage	6	392	April 15, 2021
I'm new to python and NLP. I would like to evaluate Prodigy and need guidance on getting started. usage , best-practices	3	562	February 16, 2021
model extraction from ( prodigy command vs custom model_train code ) and usage of it. done , spacy	1	480	June 25, 2018

Can Prodigy be used for quotation detection?

Related topics