Does prodigy support pre-labeling by keyword matching?

I will be able to massively speed up the annotation process (and thus reduce costs) by pre-labeling text using regex/keyword matching. Is this something prodigy supports?

Hi! What you describe sounds pretty much exactly like the the workflow using patterns to semi-automate annotation: https://prodi.gy/docs/named-entity-recognition#manual-patterns

The patterns also let you take the keyword matching one step further: you can provide keywords, but also more abstract descriptions of the spans you're looking for using token attributes like part-of-speech tags or dependency labels.

Prodigy also lets you implement any custom logic in Python, so you can have a function that streams in your examples, adds your pre-annotations as "spans" to the data and sends out the examples. So you can use more complex regular expressions, or even logic that incorporates a model, a remote API or whatever else you need. Here's a simple example: https://prodi.gy/docs/named-entity-recognition#custom-model

1 Like