Prodigy UI Customization

ines · January 31, 2022, 2:52pm

Hi! If you're working with PDFs and are looking to do train text-based models, the first step would be to extract the raw text from the PDF files. There are different libraries and solutions for this, and here's an example using PyPDF to extract a JSON file in Prodigy's format: Using prodigy with PDF documents - #2 by ines

If you want a more UI-based workflow, you could put a mini service in front of Prodigy that includes an upload field and then starts Prodigy in a subprocess on the server using the path to the uploaded / converted document. One thing to keep in mind here is that you typically want some more constraints on the annotation, the data being annotated and the model being used, since this is all extremely important to the results of the updated model. It's not something you'd typically want to leave up to the annotator – otherwise, you can easily end up with inconsistent data and a model that doesn't necessarily perform better than before.

Topic		Replies	Views
ner.train on data not annotated by Spacy? ner	3	1148	June 11, 2018
Annotation UI supporting typed text input with ner.make-gold or other recipes custom , front-end	1	942	October 26, 2018
Using prodigy with PDF documents usage	3	4764	February 20, 2018
Legal Documents - Process to read raw PDF and extract paragraphs into jsonl format ner , textcat	6	140	January 14, 2025
Detailed evaluation of NER model trained from Prodigy annotations usage , ner , training	6	716	December 14, 2021

Prodigy UI Customization

Related topics