How The Guardian is using spaCy and Prodigy to identify quotes in text

ines · November 26, 2021, 10:59am

A great article on how The Guardian is using spaCy and Prodigy to build a model to indentify quotes in new articles Highlights include: why it's so important to iterate on your data and carefully develop custom annotation schemes to deal with ambiguity in language, plus a very pretty custom UI theme for Prodigy!

The main challenge in building the training dataset was navigating the ambiguity of different journalistic styles. For several days, we discussed dozens of cases where it was difficult to make the right choice.

How should we treat song lyrics or poems? What about messages on placards? What if someone quotes their thoughts, something that has not been said aloud?

The first batch of our annotations turned out to be quite noisy and inconsistent but we were getting better and better with each iteration.

Collectively we experienced the same teaching process we were putting our model through. The more examples we looked at, the better we became at recognising different cases. Yet the question remained – if it is difficult for a human to make these decisions, can we teach a machine to cope with this task?

lievcin · November 26, 2021, 1:42pm

Love that they also customised the UI to the brand

dandrade · December 2, 2021, 2:12pm

Excellent. The article has given me a better idea of what to do with the project I am into. Our quotations will be many and perhaps large, inside court decisions:

Large text of other court decisions. Precedents: They refer to a court decision that is considered as authority for deciding subsequent cases involving identical or similar facts.
Short text of doctrine from authors.
Short text of articles in any particular law.

We are building an AI Legal Assistant on Constitutional Law and decided to combine OpenAI GPT3 and Prodigy.

pkras · December 22, 2021, 2:04pm

Awesome!

Congratulations on the team for your excellent tools (both spaCy and Prodigy), glad to see them get the attention they deserve!

If you have any other similar articles or mentions of Prodigy in the industry, please share them as well.
Keep up the great work!
Cheers

Topic		Replies	Views
Can Prodigy be used for quotation detection? usage	2	532	December 30, 2020
Prodigy Case Study: Posh Custom Prodigy Cloud Service best-practices	0	358	February 16, 2023
Sentiment of single words/phrases usage , textcat , spacy , solved	2	1032	May 2, 2019
NER or PhraseMatcher? ner , spacy , best-practices	17	6091	September 20, 2018
I'm new to python and NLP. I would like to evaluate Prodigy and need guidance on getting started. usage , best-practices	3	562	February 16, 2021

How The Guardian is using spaCy and Prodigy to identify quotes in text

Related topics