Thanks for the background.
If it would help, we've had similar requests for annotating invoices with Prodigy and/or spaCy. I wrote a recent post with examples, including LJ's nice project/blog showing how to annotate with Prodigy with a Huggingface model-in-the-loop:
You can also clone his project repo. If you try this, I would recommend trying first with Prodigy v1.11.14 (i.e., run pip install prodigy==1.11.14 -f https:// XXXX-XXXX-XXXX-XXXX @download.prodi.gy
, where xxxx
is your license key). The reason is there could be breaking changes in v1.12 or v1.13.
Although, this is an intermediate-to-advanced Prodigy project as you'll need to read up on some basics about spaCy projects and setting up tesseract
for OCR and huggingface
for the model-in-the-loop. But it may be helpful to see what can be done with Prodigy, especially if this is one of your first NLP projects. Let me know if you want to try this out and I can coach you on how to try to reproduce LJ's project first before adding in your data.