Ah yeah, it's kinda lumped it with the ner
interface at the moment – I was thinking about writing a simple demo script that mimics the highlighting behaviour, though, to showcase it better (similar to the demo posted in the first post of this thread).
The problem with using mark
and the ner_manual
interface is that Prodigy needs a model or at least a tokenizer to split the text into tokens. So you'll either need to feed in tasks that already have a "tokens"
property set (see the example data I posted above), or just use ner.manual
instead.
Yesss, we'd love to make this happen – but it's pretty difficult to get right. And if we do it, we want it to be actually good and useful. The "boundaries"
interface sort of went in that direction, but it came with all kinds of other problems. But we'll keep experimenting.
Either that, or you could add your own tokenization rules, for example, if you need to handle certain characters or punctuation differently. It might take you 20 minutes to write a few regular expressions and add them to spaCy's tokenizer – but that's still a lot more efficient overall than adding 5 more seconds to each individual annotation decision.
Yes, that's definitely on the roadmap. Our current idea is to use a simplified, displaCy-style interface and a workflow similar to NER annotation. Edit: Forgot to add – in the meantime, here are some ideas and strategies for how to make dependency / relation annotation work with the current interfaces.