block with classifier and ner

Superscope · January 8, 2020, 10:39pm

the blocks feature in your latest release - awesome!

I sometimes have tasks where I need to classify, and label entities. Would like to do that at the same time. Problem is in the config setting of my recipe how do I specify which labels are for the ner block and which ones are for the classification block?

thinking something like the following:

return {
        DATASET: dataset,
        VIEW_ID: BLOCKS,
        STREAM: stream,
        "config": {
            "blocks": [
                {"view_id": "ner_manual"},
                {"view_id": "classification"}
            ],
            "ner_labels": [
                V1, ARG1, 
                V2, ARG2,
                V3, ARG3, 
                ARG12, ARG13, ARG23, 
                V12, V13, V23
            ],
           "cls_labels":  ["TRUE", "FALSE"]
        }
    }

Any ideas?

ines · January 9, 2020, 11:18am

Thanks, that's nice to hear

You can set "labels" on the block (just like html_template – see the table in this section. However, this will be equivalent to returning "config": {"labels": [...]} by your recipe, which is what you'd to for ner_manual or image_manual.

The classification interface only needs a "label" property on each task. Or, if you're using the choice interface, you can add a list of "options". So there shouldn't really be a conflict here and you could buuld something similar to the cat facts example here, just without the input field. Or maybe I'm misunderstanding your use case?

Superscope · January 9, 2020, 1:31pm

Missed the the cat facts example and that sorts me out; thanks much @ines.

Just to share where I goto and the use case incase there are better ways or this helps someone else. Here is what I have:

My Recipe:

@prodigy.recipe(
    'srl',
    dataset=("Dataset target", "positional", None, str),
    model=('Language model', 'positional', None, str),
    source=("path to data", "positional", None, str)
)
def srl(dataset, model, source):

    nlp = spacy.load(model)

    def get_tasks(source):
        with jsonlines.open(source, 'r') as rdr:
            for eg in rdr:
                yield {
                    TEXT: eg[TEXT],
                    TOKENS: eg[TOKENS],
                    OPTIONS: [
                        {ID: BUY, TEXT: BUY},
                        {ID: SELL, TEXT: SELL}
                    ],
                    HTML: displacy.render(nlp(eg[TEXT]), style='dep', page=True)
                }

    stream = get_tasks(source)

    return {
        DATASET: dataset,
        VIEW_ID: BLOCKS,
        STREAM: stream,
        CONFIG: {
            BLOCKS: [
                {VIEW_ID: NER_MANUAL},
                {VIEW_ID: CHOICE},
                {VIEW_ID: HTML}
            ],
            LABELS: [
                V1, ARG1, 
                V2, ARG2,
                V3, ARG3, 
                ARG12, ARG13, ARG23, 
                V12, V13, V23
            ],
        }
    }

A screenshot of the resulting interface:

Small remaining issue is if I am using keyboard shortcuts (the only way to fly!), then if I hit 1 both V1 in the ner labels and BUY in the choice labels are selected. This is not really a big deal; if I do my ner tagging before my choice tagging, everything works fine...just wanted to point it out in case there is abetter way to implement.

Also just to share the full use case. I am using prodigy to harvest data on semantic dependencies. I will use the dataset with the spacy parser model (your chat semantics recipie), as well as some other implementations as I don't think the spacy dependancy parser can represent some of the nuances I am encountering. For example the V* are predicate heads, and the ARG* are predicate arguments. The reason I have ARG12 is for elliptical cases where a single argument is acting in two structures. For example here is a typical sentence in my data:

"I paid $50 for the Ken Griffey Jr Card ... I am offered now at $65"

Here the "Ken Griffey Jr Card" is an argument for both the predicate "paid" and the predicate "offered". So I would annotate "paid" as V1 and "offered" as V2; while the chunk "Ken Griffey Jr Card" gets the ARG12, meaning its an argument for V1 and V2. I can translate that into CoNLL format latter which can represent overlapping spans, where the Spacy parser cannot (I could be wrong about this?).

ines · January 9, 2020, 2:58pm

This looks cool, thanks for sharing! Love the displaCy integration

You could use the "keymap_by_label" config setting (requires v1.9.4) to set custom shortcuts for your labels – or custom shortcuts for your options. For example, {"1": "b", "2": "s"} would change the option keys to b for "buy" and s for "sell".

How are you representing the spans in the CoNLL format and what are you training with it?

spaCy currently doesn't have a built-in component that just predicts arbitrary non-entity sequences – only a named entity recognition model. If you're training a named entity recognizer, a token can only be part of one entity – and the BIO and BILUO scheme can also only represent one label per token, so the data format expects each token to have one entity label.

Superscope · January 9, 2020, 3:57pm

perfect! Thx.

Regarding the conll representation. Its a hack so that I can force my problem in the AllenNLP SRL model. The CoNLL-2012 format allows for an arbitrary number of columns between the entity column and the co-reference column to represent more than one verb clause. So using my sentence above I am going to try representing as follows:

http://conll.cemantix.org/2012/data.html

Notice in the grey boxes that the span "Ken Griffey Jr Card" is an argument in two columns each representing a span; so overlapping spans, which I suspect should be avoided, but its all over my datasets of dialogue.

I am not sure if the the AllenNLP architecture will accommodate the implied overlapping span there. Assuming I get something reasonable for sentences like the above I will then include that model towards the end of a spacy pipeline. If I am well off course please let me know, otherwise I will definitely post how it goes, specifically how that process deals with the example sentence above.

Topic		Replies	Views
Combine NER and doc classification in annotation process usage , ner , textcat , solved	5	754	July 20, 2021
2 separate text classification blocks in the same recipy (one single choice, the other multi-choice) usage , front-end	1	117	February 1, 2024
Labels not being served, usage , custom	1	370	February 21, 2020
Trouble with blocks annotation api usage , custom , front-end	1	468	June 30, 2020
Best approach for using ner manual and mark usage , ner , solved	22	2345	January 20, 2020

block with classifier and ner

Related topics