javascript in views other then html - add position information to span elements




i try to use prodigy/spacy to train a NER model to extract data from invoices which contains tables and stuff like that. To my surprise it worked pretty well (not much training examples and bad formated data). :slight_smile:

In the ner_manual UI it just renders the documents plain text which is a pain in the ass to annotate as you can imagine.

My idea would be as i have the position information for every character in the document to use this position information with the token informations from prodigy and generate coordinates where the tokens should be placed inside the annotation layout (add position information to the span elements). I could provide the chararacter position information in the meta data or another property in the task itself.

My problem is now: in ner_manual its currently not possible to use javascript where in html UI i can get all the task information (tokens, meta infos …) i need but its not possible to annotate the text/tokens.

I have read in this thread: Is it possible to customize annotation UI? that you think about adding the use of javascript/html to UIs other then html?

One solution would be to just use prodigys API and build my own annotation interface but thats pretty time consuming and prodigy nearly offers everything i need.

Another solution would be to add my own javascript in the index.html and then access the spans through the span ids and add the position infromation. But how could i access the task meta data in my own script as its not exposed like in window.prodigy (or is there a way to access it? - in ner_manual UI)

Third solution: own script as in solution 2 and fetch position information from an external source and modify spans aswell as in 2.

Edit: as with #2 und #3 i have just noticed that the spans lose their id after annotation :confused: that could possible be a problem with my approach.

In my opinion all three solutions could maybe work but they feel like a hacky workaround and not a clean solution to me.

Do you have any other suggestions how i can tackle this problem?

best regards,

(Ines Montani) #2

Thanks for sharing your use case! Just to make sure I understand what you’re trying to do, could you share an example of the data you want to feed in?

And is there any way you can do more preprocessing on the server side to extract only the text you’re looking for, annotate that in smaller chunks in the manual interface and then put it back together based on the character positions and meta? This means you won’t have to rely on both the front-end and back-end to do data transformation, which introduces a much higher error potential and risks if they’re out-of-sync.

Btw, I also discuss some of the difficulties with using manual interfaces and formatted content in this thread, which might be relevant. Although, it sounds like you already have a good plan for how to solve all of this.

Do you mean pre-defined spans in the input data? If so, it’s possible that the manual interface currently doesn’t retain custom metadata on spans, as it assumes they’ll be added/removed/edited anyways. But I’ll need to double-check this!

This is a good point – I definitely want to make all interfaces expose the task data and callbacks if custom JavaScript is enabled, and also allow executing custom scripts across interfaces.


Sure, I’m happy to. I have attached a random invoice example from google picture search (it doesn’t reflect or domain, but its good enough for illustration purpose).

Sample input: could be a picture or real pdf

prodigy: some output text (only about 1/4 of the whole) how its rendered in via ner.manual in prodigy
As you can imagine someone who has to annotate this will hate me :sweat_smile:

some poc how I would like it to render for the people who have to annotate.
the tokens does currently not exactly match those tokens generated from prodigy but that’s ok so far as well does it not show newline chars.
some tokens are already selected with selection handler and that stuff (marked in yellow)
Its by far not perfect as you can clearly see in the first line but someone who has to annotate should be way faster then with a wall of text.

What i want to archive: picture -> ocr -> annotate or pdf -> convert -> annotate. it nearly doesn’t matter if its image or pdf, the intermediate after ocr/converting is nearly in the same for both.

Thought about that as well. It would be cool to analyse the content of the document and better detect tables, letterheads … for example I have read this post [tables and text( in the forum where honnibal said it could be possible to use the image annotation tools which prodigy provides.

I think this is a really cool idea but 1.) my lack of knowledge is an issue :wink: 2.) it currently works well with the wall of text. So i think its better to focus now to ship the product asap with maybe a not perfect, but reasonable extraction accuracy and refine it later witch is definitely on the to-do list :slight_smile:

Ah no sorry, i meant the html span tags in the annotation interface. I have attached 2 images, one with the mentioned ids and one without them after the tokens are selected.

Spans with id:

Spans without id:

That would be great! Maybe it would be even possible to create on annotation interfaces like diff/html and so on for specific tasks, e.g. annotate PDFs :wink: and expose even more internal functions or allow some sort of plugin system? (I recently stumbled over gatsy.js and short experience with it the plugin system is great - maybe something like this would be also possible? - in the far future of course :wink: )

(Ines Montani) #4

Thanks for the detailed info – this is definitely an interesting problem!

I think one of the trickiest parts for providing a generalisable solution for this is that we’ll need some way to telling Prodigy what to tokenize / make editable, and how to relate that info back to the original text. It’s actually quite similar to the problem I describe in the thread I linked above. We can’t just render the HTML, because that won’t let us easily recover the original positions into the text.

One idea I’ve had for a while now is to come up with some kind of building block system, ideally described in a way that serializes to JSON (or something similar). You could then build you layout and add a container that you position wherever – and set it to "ner_manual" to make it editable. Each “block” is a separate unit (almost like a “task within a task”), so the the data you get back will reflect that and relating the annotations back to the original input will be simple.

Here’s a rought example of what I’m thinking of:

    "blocks": [
            "type": "spans_manual",
            "text": "Some text to label",
            "tokens": [...],
            "style": {
                "position": "absolute",
                "top": 10,
                "right": 15
            "type": "html",
            "html": "<strong>hello world</strong>"
            "type": "choice",
            "options": [
                {"id": 0, "text": "good"}, 
                {"id": 1, "text": "bad"}

This would also solve the problem we’d otherwise have with a plugin system for the web application: Prodigy ships with the precompiled bundle, and even if we included the source, making changes to it would always require to recompile the web application. That’s fine if you’re used to working with React and JavaScript build tools – but it can easily be discouraging for users who aren’t. The current HTML + Vanilla JS solution is nice that way, because it just lets you write code and add it to your recipe – but it’s also limited in terms of composing existing interfaces etc.

Another advantage of the JSON-serializable (or similar) building block system would be that it could seamlessly integrate with the upcoming Prodigy Scale. While the user will be running the cluster providing the data, we will be serving the web app and managing the user accounts, so annotators can log in from wherever they are, and you won’t have to worry about managing their work queues etc. But this also means that we can’t just execute arbitrary JavaScript on the front-end. However, we can render and validate building blocks expressed as JSON – so users would be able to port over their fully custom interfaces when they scale up their annotatiin projects.

If you do have the image -> text bit covered, the image-based approach could be to use the image_manual interface and let the annotator draw boxes over the respective information. This will give you a list of [x, y] pixel coordinates, relative to the original image. You can then cut out those sections and extract the text from them afterwards. However, I agree that it makes sense not to start all over and try to reinvent your process if you’ve already found an approach that’s working well.