Custom recipe for Annotating Overlapping Spans

Is there a way to write a custom recipe to annotate overlapping spans in text possibly with different labels?


The manual interface for labelling spans was primarily designed for sequence tagging tasks where the spans are represented as a sequence of token-based tags with one tag per token, which is how most NER implementations are designed. Allowing overlapping spans in the annotation interface would easily be confusing and misleading, because you wouldn't be able to use the data collected this way for the most common use cases. Aside from this, it'd also make the UI much more complex and there's not really a satisfying answer for visualizing multiple nested overlapping spans while still making the interface efficient and intuitive to use.

If you need overlapping spans, you could make multiple passes over the data – this works especially well if you have a hierarchical label scheme, because you can start with the top-level categories and then stream in the examples again with the more fine-grained labels.

Alternatively, you could also write a stream generator that keeps sending out the same example so you can label it multiple times until you reject it or send it back empty (which means all spans are added). If you set "instant_submit": true, the answer will sent back immediately and your recipe's update function will be called before the new task is sent out. So you could use that to check if the example needs more spans or if you can move on to the next one. Examples with the same task will all have the same _input_hash, so it should be pretty easy to merge the spans in the data afterwards.