Hi,
i try to use prodigy/spacy to train a NER model to extract data from invoices which contains tables and stuff like that. To my surprise it worked pretty well (not much training examples and bad formated data).
In the ner_manual UI it just renders the documents plain text which is a pain in the ass to annotate as you can imagine.
My idea would be as i have the position information for every character in the document to use this position information with the token informations from prodigy and generate coordinates where the tokens should be placed inside the annotation layout (add position information to the span elements). I could provide the chararacter position information in the meta data or another property in the task itself.
My problem is now: in ner_manual its currently not possible to use javascript where in html UI i can get all the task information (tokens, meta infos …) i need but its not possible to annotate the text/tokens.
I have read in this thread: Is it possible to customize annotation UI? that you think about adding the use of javascript/html to UIs other then html?
One solution would be to just use prodigys API and build my own annotation interface but thats pretty time consuming and prodigy nearly offers everything i need.
Another solution would be to add my own javascript in the index.html and then access the spans through the span ids and add the position infromation. But how could i access the task meta data in my own script as its not exposed like in window.prodigy (or is there a way to access it? - in ner_manual UI)
Third solution: own script as in solution 2 and fetch position information from an external source and modify spans aswell as in 2.
Edit: as with #2 und #3 i have just noticed that the spans lose their id after annotation that could possible be a problem with my approach.
In my opinion all three solutions could maybe work but they feel like a hacky workaround and not a clean solution to me.
Do you have any other suggestions how i can tackle this problem?
best regards,
pat