Annotating PDFs by drawing bounding box around fields

stackoverflowed · February 27, 2019, 2:35am

Hi ,
I am brand new to Prodigy and was wondering if there is a way to annotate PDFs by drawing boudin boxes around certain fields? The training data I need contains text, which I already have with an OCR, and the location of the field that I am interested in relative to the top left corner of the page. I know this can easily be done by converting the pdf to an image.
I was wondering if there is a way to do it without the conversion? Also, would be awesome if I can get a demo to install on my local machine before I commit to buying?
Thanks

ines · February 27, 2019, 10:38am

Hi! With the image.manual recipe, this should pretty much work out-of-the-box. The annotated data will include the image and a list of "spans", defining the label and [x, y] pixel coordinates of the bounding box, so basically, relative to the top left corner.

Is there a specific reason you don't want to convert the PDFs to images? Images are much easier to render natively in any browser, you can scale and compress them to make them faster to load and easier to work with, and in terms of pixels and pixel positions, they're consistent and seem to be better fit as the "single source of truth"? You could do the conversion programmatically in Python (or any other language) and even feed the converted stream of images straight into Prodigy if you want to.

Sure! Could you send us an email to contact@explosion.ai? We normally do trials by hosting a VM for you that you can log into. This lets you test all capabilities of the tool without limitations, and also makes it easy for us to help if you get stuck.

Topic		Replies	Views
Prodigy - text extraction from forms usage , image	1	617	December 3, 2020
Prepopulating the image.manual .jsonl data	1	84	May 28, 2024
Image classification on prodigy-pdf	1	113	June 10, 2024
Adding a helper image textcat , custom , front-end	4	423	November 10, 2022
Editing annotations derived from model predictions with `image.manual` usage , image	16	800	January 17, 2022

Annotating PDFs by drawing bounding box around fields

Related topics