ner.manual - custom text formatting

Apologies if this is a duplicate of another question-- I've searched the forum and can't seem to find an answer that applies to this situation!

I want to annotate few articles where i want the annotator to read the title and paragraphs. Is there a way where i can show the title and the paragraphs of the article in the same page and make title text bigger?

Thanks, as always, for your excellent work!

Hi and thanks! :slightly_smiling_face:

In general, the ner.manual interface tries to avoid any formatting and focuses on plain text, because formatting can easily lead to confusion about what's actually being annotated, whether the formatting is part of the underlying plain text that the model is going to see (see here for more details).

However, I do see the point in adding emphasis to headlines, and the fact that part of a text is the headline is also something that you could use as a feature in your model. Or maybe the headline is just there for reference and doesn't need to be annotated?

If you don't want to highlight things within the headline, the solution could be pretty straightforward: you could have a custom interface with two blocks: a html block with the headline using any styling you need, followed by an ner_manual block for the regular text.

If you want to highlight in the headline, the only solution I can think of right now is a slightly unintuitive but effective CSS hack :sweat_smile: Basically, if you know the index of the last token of the headline, you can do:

.prodigy-content {
    font-size: 30px; /* the large font size */
}

.prodigy-content span[id="3"] ~ span, .prodigy-content span[id="3"] ~ mark {
    font-size: 20px;  /* the small font size */
}

So you're making all tokens following the span with the ID 3 small. The only edge case where this doesn't work is if the last token of the headline is highlighted as an entity.

1 Like

Thank you for the reply!

I was able to figure out the custom block recipe :slight_smile: I'm facing a different problem now. My custom recipe looks like this,

from prodigy.components.preprocess import add_tokens
from prodigy.components.loaders import JSONL
import spacy


@prodigy.recipe(
    "component",
    dataset=("d1", "positional", None, str),
    file_path=("f1.jsonl", "positional", None, str))
def component(dataset, file_path, lang="de"):
    blocks = [
        {"view_id": "html"},
        {"view_id": "ner_manual"},
    ]

    nlp = spacy.blank(lang)           # blank spaCy model for tokenization
    stream = JSONL(file_path)         # set up the stream
    stream = add_tokens(nlp, stream)  # tokenize the stream for ner_manual

    return {
        "dataset": dataset,
        "view_id": "blocks",         # set the view_id to "blocks"
        "stream": stream,            # the stream of incoming examples
        "config": {
            "labels": ["l1","l2"]
            "blocks": blocks         # add the blocks to the config
        }
    }

I'm also trying to use multiple labellers to annotate the data. So, I set the env variable PRODIGY_ALLOWED_SESSIONS=a1,a2. It was working fine before i.e, it was creating datasets like d1-a1, d2-a2.

But now when i run the custom recipe using prodigy component d1 f1.jsonl -F recipe.py. I'm able to open only one session at a time. For example, if i start annotating at http://localhost:8080/?session=a1, the sessionhttp://localhost:8080/?session=a2 does not have any item there to annotate. Only when i save my a1 session and close it, i can start annotating in a2

My prodigy.json looks like this,

  "feed_overlap": false,
  "custom_theme": {"cardMaxWidth": 1500},
  "global_css": ".prodigy-title label { font-size: 18px }",
  "instructions": "instructions.html",
  "ui_lang": "de",
}

Thanks again in advance!

Hi @g.padmanaban,

If you're seeing the correct behavior with ner.manual but not in your custom recipe, you could try setting force_stream_order=True so that questions repeat like the ner.manual recipe. Here's a modified recipe that allowed me to annotate in both windows.

import prodigy
from prodigy.components.preprocess import add_tokens
from prodigy.components.loaders import JSONL
import spacy


@prodigy.recipe(
    "component",
    dataset=("d1", "positional", None, str),
    file_path=("recipes/3271.jsonl", "positional", None, str))
def component(dataset, file_path, lang="de"):
    blocks = [
        {"view_id": "html"},
        {"view_id": "ner_manual"},
    ]

    nlp = spacy.blank(lang)           # blank spaCy model for tokenization
    stream = JSONL(file_path)         # set up the stream
    stream = add_tokens(nlp, stream)  # tokenize the stream for ner_manual

    return {
        "dataset": dataset,
        "view_id": "blocks",         # set the view_id to "blocks"
        "stream": stream,            # the stream of incoming examples
        "config": {
            "labels": ["l1","l2"],
            "blocks": blocks,         # add the blocks to the config
            "feed_overlap": False,
            "force_stream_order": True
        }
    }

You should know that when using feed_overlap=False if two annotators are working at the same time they can see the same question. This is because prodigy doesn't know which questions have been sent to annotators but are not answered. Since it doesn't know that, it sends both clients the last known question that has been answered, which can be the same question. In practice this means that two people could annotate the same example, but the server will only accept the first one that comes back.

Hopefully this helps!
-Justin