Prodigy UI Customization

Hi! If you're working with PDFs and are looking to do train text-based models, the first step would be to extract the raw text from the PDF files. There are different libraries and solutions for this, and here's an example using PyPDF to extract a JSON file in Prodigy's format: Using prodigy with PDF documents - #2 by ines

If you want a more UI-based workflow, you could put a mini service in front of Prodigy that includes an upload field and then starts Prodigy in a subprocess on the server using the path to the uploaded / converted document. One thing to keep in mind here is that you typically want some more constraints on the annotation, the data being annotated and the model being used, since this is all extremely important to the results of the updated model. It's not something you'd typically want to leave up to the annotator – otherwise, you can easily end up with inconsistent data and a model that doesn't necessarily perform better than before.

1 Like