Can we train abstractive summarization in prodigy?

This repository contains abstractive summary datasets for different languages.

Can we train the abstractive summarization on these datasets using prodigy?

hi @mirfan923!

Thanks for your question. This looks like an interesting repo.

It's important to remember that Prodigy is an annotation tool to acquire more annotated data, not necessarily a tool for model training.

Prodigy does have thetrain recipe, but it's just a wrapper for spacy train. Since spaCy doesn't haven't a built-in summarization component, it's not possible to train abstractive summarization out-of-the-box with prodigy train.

Since you mentioned the "datasets" in the repo - are you only interested in training or using the datasets and model in the repo, and creating a "model-in-the-loop" workflow to acquire more annotated data?

If you're only interested in training with those data and not getting any additional annotated data, then I'm not sure Prodigy would help.

However, if you wanted a model-in-the-loop workflow, then yes, Prodigy could help if you wrote a custom recipe. Custom recipes are essentially Python functions (written as a Python script) that can be run through the command line. So in this way, it may be possible you could write a custom recipe to do abstractive summarization with another model framework (e.g., the seq2seq training module used in the repo you posted).

Alternatively, if you only wanted additional annotated data for summarization (no model in the loop), you could create a custom recipe like this:

You could also create a custom interface based on what annotation task you were looking for:

Hope this helps!