I have some pdfs and I can do NER using spacy model. Now, I want to make the task user friendly. For this, a user can upload pdf in the prodigy UI, can select trained model. After this, the highlighted NER will be shown to user in the prodigy UI to accept / reject / correct the NER. How can I do the process?
Hi! If you're working with PDFs and are looking to do train text-based models, the first step would be to extract the raw text from the PDF files. There are different libraries and solutions for this, and here's an example using PyPDF to extract a JSON file in Prodigy's format: Using prodigy with PDF documents - #2 by ines
If you want a more UI-based workflow, you could put a mini service in front of Prodigy that includes an upload field and then starts Prodigy in a subprocess on the server using the path to the uploaded / converted document. One thing to keep in mind here is that you typically want some more constraints on the annotation, the data being annotated and the model being used, since this is all extremely important to the results of the updated model. It's not something you'd typically want to leave up to the annotator – otherwise, you can easily end up with inconsistent data and a model that doesn't necessarily perform better than before.