Prodigy 1.12 alpha release: LLM-assisted workflows, prompt engineering & fully custom task routing for multi-annotator scenarios.

Hey everyone!
We are really excited to share that we have just released the alpha version of Prodigy v1.12! (v1.12a1).
This release is available for download for all v.1.11.x license holders and includes:

  • New recipes for LLM-assisted annotations and prompt engineering: the LLM assisted workflows we have announced a while ago are now fully integrated with Prodigy and available out of the box. For v1.12a1 you'd still be restricted to OpenAI API to use them, but by v1.12 we definitely want to leverage spacy-llm to enable more flexibility, notably using open source large language models. Please check out the alpha docs for the details and examples.

  • A new, exciting workflow for prompt engineering which allows you to compare more than 2 prompts in a tournament-like fashion. You can find out details on the algorithm and the workflow itself here.

  • Extended, fully customizable support for multi-annotator workflows. You can now customize what should happen when a new annotator joins an ongoing annotation project, how tasks should be allocated between existing annotators, and what should happen when one annotator finishes their assigned tasks before others. For common use-cases, you can use the options feed_overlap, annotations_per_task and allow_work_stealing (see the updated configuration documentation for details).
    Custom recipes can specify session_factory and task_router callbacks for full control. For example, you can now route tasks based on model confidence.

The full guide on task routing can be found here.
These changes required significant reimplementation of the Controller class. We think we've tested it quite thoroughly, but if something seems wrong, please don't hesitate to report.

  • New Stream and Source classes. Previously, Prodigy recipes would return an opaque generator of tasks (the stream) in their components. This was convenient in some ways, but limited in others. One notable limitation was that if you're reading source data from a file and then you have some function that produces tasks from it, the information about that underlying data source would be lost, making it difficult to provide progress feedback. The new Stream class works as a generator, so it's fully compatible with existing recipes. But it's also a class, and it can be initialised with a Source object that tracks progress through some data being read, so that progress callbacks can easily ask how far along you are. With this refactor we have prepared the ground for better progress feedback available in our next alpha soon. We also took the opportunity to improve some of our data readers, and to add a new Source class for Parquet files. Please let us know if there are more Source classes you'd like to request.

To install the latest alpha version run:

pip install --pre prodigy -f https://XXXX-XXXX-XXXX-XXXX@download.prodi.gy

Please note that this release supports macOS, Linux and Windows and can be installed on Python 3.8 and above.

All the links to the documentation come from our alpha docs, which are available here.

We would love you to take the task router and the OpenAI recipes for a test drive! Looking forward to any feedback you might have!

4 Likes

Hi everyone,

We have realized earlier today that Pydantic 1.10.7, on which our package is dependent, breaks with the latest release of typing-extensions (4.6.0) (Cannot use Literal when I use typing-extension==4.6.0 · Issue #5821 · pydantic/pydantic · GitHub). This causes a runtime error for Prodigy v1.12a1. We'll wait for half a day to see if Pydantic issue is fixed and, if not, we'll release v1.12a2 with a < 4.6.0 pin on typing-extensions.

UPDATE: we have just released Prodigy v1.12a2 that mitigates the Pydantic issue. We also fixed an error with missing jinja2 templates.

To install run:

pip install prodigy==1.12a2 -f https://XXXX-XXXX-XXXX-XXXX@download.prodi.gy

UPDATE: We have just released v1.12a3 which fixes an import issue with ner.openai.correct and textcat.openai.correct recipes.
To install:

pip install prodigy==1.12a3 -f https://XXXX-XXXX-XXXX-XXXX@download.prodi.gy

Hi, im not able to download the version, i have the 1.11.8 version, can you please tell me how to download it

Hi @Dmg02,

Welcome to the forum! Could you share the command that you're running and the error that you're getting? Thanks!

I figured it out, thanks!

UPDATE: v1.12a4 has been released which fixes issues with the task router and session factory.
It also adds a new progress estimator which is based on the relative position in the source object. The motivation for this new way of estimating progress was to provide a more reliable estimate when the actual total target is unknown while working with the stream of data.
Since it is very different from the progress based on the number of annotated examples, we now distinguish in the UI between target progress (based on annotated examples if total_examples_target is set), source progress (the new source based progress) and progress (for custom progress functions in active learning).
Please check the docs for the details on how to interpret the new progress bars, especially in multi-user scenarios.
Looking forward to hearing what you think :slightly_smiling_face:

To install:

pip install --pre prodigy -f https://XXXX-XXXX-XXXX-XXXX@download.prodi.gy

Documentation UPDATE!

The documentation for the new future release has just received an important update: we now have documentation on Prodigy deployment!

You can find these new docs hosted here:

The guide has a big section on Docker deployments, and while these certainly differ per cloud provider and per cloud service, the guide should be general enough to highlight the things that you want to pay attention to.

Once we release the full version of v1.12, these docs will also become available on the main site.

1 Like