Prodigy - text extraction from forms

Hi @Innes, Is it possible to extract text from forms with Prodigy. We have a large number of printed order forms with text in the boxes. Each box is in exactly the same place in each form but the text can vary.

I think my question is can I fix and a bounding box on a specific part of an image and give it a field label so Prodigy can extract the text in detects in that box?

Hi! You can definitely label data for this type of task, but you'd have to decide which OCR solution or tool you want to use to go from image + bounding boxes to text.

For example, you can use a workflow like image.manual and draw the boxes: If the boxes are mostly in the same place and have similar pixel coordinates, you could even pre-populate them in the data so you only have to adjust the box if the position changed. The data you export gives you the image and pixel coordinates of the box (and the label you assigned). You can then feed that forward into an OCR tool and extract the text – the image and x/y/width/height of the region should typically be all you need for this.

(If you're using your own OCR model, you can also use Prodigy to improve it, e.g. by streaming in images + the extracted text and correcting the text if the model makes mistakes. You can then update it with more examples to improve it.)