Annotation for Excel

I have an interesting case which I have not been able to successfully complete. I have multiple excel sheets which consists of multiple tables. ETL doesn't work on these type of sheets because of huge variation. Thus I am looking for more better ways to get this done.

Take a look at the samle here.

Some points to keep in mind

  1. I always get this data in excel sheets.
  2. I want to extract "Dishes" against the restaurants which can have "Take away" or "Dine -in".
  3. At times Dish is same but the name could be slightly different

As mentiond earlier ETL doenst work on this becuase of complex sheets which I get. The headers are messy and often I do get a lot of information which makes it hard to form a strucuture which can be ingested by any ETL tool.
What are you thoughts on this? How can I annotate it?

So, I was thinking if I could train an excel sheet, the way we can do for Texts or PDFs.

Hi @Wassupkenny , welcome to Prodigy!

One solution you can try here is to use a computer-vision tool like tesseract (or some cloud-provided OCR service like AWS Textract / Google Cloud vision) and correct that using Prodigy. You can then use the corrected samples as your final output or perhaps as training inputs if you're modelling. You can refer to this blogpost for some ideas.

However, it might be necessary to evaluate the effort it takes to do the one above. Perhaps it might be better to write business rules instead (they don't even have to be accurate, just work 80% of the time), then correct them using Prodigy?

Goodluck with your project!