Hello Prodigy Community,
I’m considering purchasing Prodigy for a client that would like to classify pages of pdf documents into about 30 classes. Model would be operating on string content of pages w/ some other derived features to help it (ie, pixel saturation by page section). They have plenty of training data and a reliable number of labels for half of those classes, the remainder of which will have to be manually annotated by some interns. I’m not sure how good a fit prodigy is for this and am trying to evaluate.
I can use mupdf and pymupdf to turn pages into images and do the classification through prodigy’s image classification, but am not sure how much customizability that has from the Multiple Choise (Image) live demo. But it seems like a lot of work to implement as I’ll have to not only create a pipeline for forming the images, but one for tracking which strings corresponded to those annotations when I’m feeding back to my tf model.
So I guess the question I’m asking is, is my project the right use case for prodigy, or will I be better off just making my own ad hoc annotation client in something like pysimplegui?
Thank you for your input and I apologize if this question has been asked before, the closest I could find were the two threads linked below and neither was able to answer my question.