i try to use prodigy/spacy to train a NER model to extract data from invoices which contains tables and stuff like that. To my surprise it worked pretty well (not much training examples and bad formated data).
In the ner_manual UI it just renders the documents plain text which is a pain in the ass to annotate as you can imagine.
My idea would be as i have the position information for every character in the document to use this position information with the token informations from prodigy and generate coordinates where the tokens should be placed inside the annotation layout (add position information to the span elements). I could provide the chararacter position information in the meta data or another property in the task itself.
One solution would be to just use prodigys API and build my own annotation interface but thats pretty time consuming and prodigy nearly offers everything i need.
Third solution: own script as in solution 2 and fetch position information from an external source and modify spans aswell as in 2.
Edit: as with #2 und #3 i have just noticed that the spans lose their id after annotation that could possible be a problem with my approach.
In my opinion all three solutions could maybe work but they feel like a hacky workaround and not a clean solution to me.
Do you have any other suggestions how i can tackle this problem?