hi @ankitladva11 !
If your invoices are pdfs, my colleague @ljvmiranda921 wrote an excellent and detailed blog on creating a document processing workflow with Prodigy and fine-tuning HuggingFace's LayoutLMv3 :
FYI there's an incredibly helpful new blog post by @ljvmiranda921 on extracting PDFs using prodigy as well.
And also helpful accompanying GitHub repo .
[prodigy_correct (1)]
Also, a lot of Matt's suggestions require a good knowledge of spaCy. If your questions are spaCy-specific, you'll be better off posting questions on the spaCy GitHub discussions forum. That's where the spaCy core team answers questions and they have answered several related posts on invoice parsing:
Hi, I recently came across Spacy when searching for a ML model to extract data from invoices. We have many different formats (300+) of invoices, so I was looking to see if there is a way to train a...
Hi Team, I'm trying to create a custom NER model to parse informations like Invoice number ,date, time and amount from invoices. I have used around 50K+ records to train the model, but after traini...
Hi, I have a complex situation with text from invoices, where the statistical model (NER) not always finds all entities (mostly when the given text changes massive its internal structure). I found ...
Hope this helps!
2 Likes