Thanks for the detailed info – this is definitely an interesting problem!
I think one of the trickiest parts for providing a generalisable solution for this is that we'll need some way to telling Prodigy what to tokenize / make editable, and how to relate that info back to the original text. It's actually quite similar to the problem I describe in the thread I linked above. We can't just render the HTML, because that won't let us easily recover the original positions into the text.
One idea I've had for a while now is to come up with some kind of building block system, ideally described in a way that serializes to JSON (or something similar). You could then build you layout and add a container that you position wherever – and set it to "ner_manual"
to make it editable. Each "block" is a separate unit (almost like a "task within a task"), so the the data you get back will reflect that and relating the annotations back to the original input will be simple.
Here's a rought example of what I'm thinking of:
{
"blocks": [
{
"type": "spans_manual",
"text": "Some text to label",
"tokens": [...],
"style": {
"position": "absolute",
"top": 10,
"right": 15
}
},
{
"type": "html",
"html": "<strong>hello world</strong>"
},
{
"type": "choice",
"options": [
{"id": 0, "text": "good"},
{"id": 1, "text": "bad"}
]
}
]
}
This would also solve the problem we'd otherwise have with a plugin system for the web application: Prodigy ships with the precompiled bundle, and even if we included the source, making changes to it would always require to recompile the web application. That's fine if you're used to working with React and JavaScript build tools – but it can easily be discouraging for users who aren't. The current HTML + Vanilla JS solution is nice that way, because it just lets you write code and add it to your recipe – but it's also limited in terms of composing existing interfaces etc.
Another advantage of the JSON-serializable (or similar) building block system would be that it could seamlessly integrate with the upcoming Prodigy Scale. While the user will be running the cluster providing the data, we will be serving the web app and managing the user accounts, so annotators can log in from wherever they are, and you won't have to worry about managing their work queues etc. But this also means that we can't just execute arbitrary JavaScript on the front-end. However, we can render and validate building blocks expressed as JSON – so users would be able to port over their fully custom interfaces when they scale up their annotatiin projects.
If you do have the image -> text bit covered, the image-based approach could be to use the image_manual
interface and let the annotator draw boxes over the respective information. This will give you a list of [x, y]
pixel coordinates, relative to the original image. You can then cut out those sections and extract the text from them afterwards. However, I agree that it makes sense not to start all over and try to reinvent your process if you've already found an approach that's working well.