I’m not a seasoned NLP practitioner, but I think Prodigy can definitely help with your project.
When I started working with Prodigy, I was a seasoned developer that didn’t know much about building ML models, and my idea was about as developed as yours. I was able to use Prodigy as a way to learn about the data gathering and annotation processes involved in most machine learning projects, without having to start from scratch and build functional models that perform adequately.
Prodigy not only provides a set of state-of-the-art models to start from (via spaCy) but it also has training recipes that are able to update your model “in-the-loop” as you annotate, so you get better results as it improves from your current session. No matter what kind of NLP problem you’re working on, you’re probably going to need to annotate data, and Prodigy is great for that. Plus if your problem outgrows the prodigy recipes, you can plug in a custom model to suit your needs, so it grows with you.
This sounds like a great use for spaCy/prodigy. You could start by using the Part-of-speech tags that come with the built-in models to find open/close quote chunks and extract the text from between them. You can see a list of the POS tags that come built-in here: https://spacy.io/api/annotation#pos-tagging
I’m sure there are other ways to approach the problem, so you’ll want to experiment. Speaking of workflows and experimentation, @honnibal gave a talk about how to build successful NLP projects in which he talks about iterative development practices with NLP and Prodigy: https://youtu.be/jpWqz85F_4Y?t=137
If you’re rushed and can’t spare ~30 minutes, he gives an example workflow and problem with prodigy here: https://youtu.be/jpWqz85F_4Y?t=698
Good luck with your project!