Annotating sentence in text?

I am looking to see if Prodigy is the appropriate annotation tool.

I have text ranging in the 25 to 2500 sentences per document. I am interested in developing a model that will be used to classify the text and to improve the signal to noise, I want to first train a model to prick out 1-2 sentences of each document. There are patterns in placement of the sentences of interest and some of the verbiage. Think of it as variant of sentiment analysis but looking at only a fraction of the document.

Is Prodigy the appropriate tool for this two stage model building? First determine the sentences to look at and then analyze those sentences. I might use other techniques for the second part of the flow. So the ask is can I use Prodigy to develop a model to pick the sentences.

Hi! This definitely sounds like a pretty classic use case for Prodigy :slightly_smiling_face:

One way you could solve this would be to model it as a sentence classification task: you stream in each sentence and annotate whether it's "interesting" or not: This should be pretty quick to do, and you'll get lots of datapoints for training a text classifier, and you can run it over individual sentences at runtime. Later on, you could also consider a workflow that uses a model you already trained to select the sentences to annotate – e.g. the ones with the most uncertain scores or the ones with the highest scores to make sure the model is making the correct predictions.