Does prodigy support pre-labeling by keyword matching?

jacknnn · August 5, 2021, 9:48pm

I will be able to massively speed up the annotation process (and thus reduce costs) by pre-labeling text using regex/keyword matching. Is this something prodigy supports?

ines · August 6, 2021, 12:43am

Hi! What you describe sounds pretty much exactly like the the workflow using patterns to semi-automate annotation: https://prodi.gy/docs/named-entity-recognition#manual-patterns

The patterns also let you take the keyword matching one step further: you can provide keywords, but also more abstract descriptions of the spans you're looking for using token attributes like part-of-speech tags or dependency labels.

Prodigy also lets you implement any custom logic in Python, so you can have a function that streams in your examples, adds your pre-annotations as "spans" to the data and sends out the examples. So you can use more complex regular expressions, or even logic that incorporates a model, a remote API or whatever else you need. Here's a simple example: https://prodi.gy/docs/named-entity-recognition#custom-model

Topic		Replies	Views
prelabel data using regex and how to use the active learning functionality and get the model usage , ner , spacy	3	544	October 14, 2021
Do Prodigy models use the surrounding context to assign labels?	1	208	February 25, 2023
custome Named Entity Recognition tags usage , ner	1	539	June 12, 2019
constituency parsing, dependency parsing and semantic role labeling usage , dep	9	1008	May 10, 2020
Does Prodigy load pre-annotated data? usage , ner , solved	23	2637	October 25, 2018

Does prodigy support pre-labeling by keyword matching?

Related topics