javascript in views other then html - add position information to span elements

pat · November 5, 2018, 6:07pm

Sure, I’m happy to. I have attached a random invoice example from google picture search (it doesn’t reflect or domain, but its good enough for illustration purpose).

Sample input: could be a picture or real pdf

prodigy: some output text (only about 1/4 of the whole) how its rendered in via ner.manual in prodigy
As you can imagine someone who has to annotate this will hate me

some poc how I would like it to render for the people who have to annotate.
the tokens does currently not exactly match those tokens generated from prodigy but that’s ok so far as well does it not show newline chars.
some tokens are already selected with selection handler and that stuff (marked in yellow)
Its by far not perfect as you can clearly see in the first line but someone who has to annotate should be way faster then with a wall of text.

What i want to archive: picture -> ocr -> annotate or pdf -> convert -> annotate. it nearly doesn’t matter if its image or pdf, the intermediate after ocr/converting is nearly in the same for both.

Thought about that as well. It would be cool to analyse the content of the document and better detect tables, letterheads ... for example I have read this post [tables and text(Tables and Text) in the forum where honnibal said it could be possible to use the image annotation tools which prodigy provides.

I think this is a really cool idea but 1.) my lack of knowledge is an issue 2.) it currently works well with the wall of text. So i think its better to focus now to ship the product asap with maybe a not perfect, but reasonable extraction accuracy and refine it later witch is definitely on the to-do list

Ah no sorry, i meant the html span tags in the annotation interface. I have attached 2 images, one with the mentioned ids and one without them after the tokens are selected.

Spans with id:
span_with_id

Spans without id:
span_without_id

That would be great! Maybe it would be even possible to create on annotation interfaces like diff/html and so on for specific tasks, e.g. annotate PDFs and expose even more internal functions or allow some sort of plugin system? (I recently stumbled over gatsy.js and short experience with it the plugin system is great - https://www.gatsbyjs.org/plugins/ maybe something like this would be also possible? - in the far future of course )

Topic		Replies	Views
NER manual on view id HTML usage , ner , custom	1	869	May 16, 2019
ner.train on data not annotated by Spacy? ner	3	1148	June 11, 2018
Does Prodigy support HTML annotation for NER usage , ner	3	1212	December 1, 2022
Re-use UI elements usage , front-end	8	965	February 18, 2019
Text spans and Image spans simultaneously enhancement , ner , done , image , front-end	10	758	December 20, 2024

javascript in views other then html - add position information to span elements

Related topics