Adding a helper image

ryanwesslen · November 9, 2022, 10:34pm

Have you seen my colleague @ljvmiranda921's recent blog where he created a PDF processing workflow with Prodigy?

There is also an accompanying GitHub repo with a spaCy project and custom Prodigy recipes.

I have only dabbled with the project so I don't know all of the details, but it has been very popular with many Prodigy users. What's cool about the project is that it also uses HuggingFace's LayoutLMv3 that combines both text and image masking and fine tunes the model. The project uses the FUNDU dataset so likely to adapt this you'll need to learn how that dataset is structured and mimic it for your own data.

While this may not be a perfect solution, hopefully it provides a concrete idea of an approach. As you've probably seen, we typically recommend (see below) for pdfs either to OCR text and use that text in Prodigy or treat them as images.

Hope this helps and definitely keep us informed on whatever direction you go!

Topic		Replies	Views
Usecase of Prodigy-PDF ner	1	337	February 8, 2024
Annotating PDFs by drawing bounding box around fields usage , front-end	1	2628	February 27, 2019
PDF OCR Image annotation metadata - feature suggestion? usage , best-practices	3	210	May 13, 2024
Taking a Computer Vision Approach (leveraging image.manual) to build a custom NER model on PDFs usage , ner , image	3	579	July 28, 2022
Document Images - Textual Images Labeling	1	319	April 20, 2022

Adding a helper image

Related topics