updating training pipline of NER from spacy 2 to spacy 3

robertto · October 30, 2020, 9:45pm

first, congratulation on spacy 3, it looks very cool !

I've read a custom pipeline for training a custom NER in spacy 2, how can I integrate that to spacy 3

here is my code in spacy 2

@plac.annotations(
    model=("Model name. Defaults to blank 'en' model.", "option", "m", str),
    output_dir=("Optional output directory", "option", "o", Path),
    n_iter=("Number of training iterations", "option", "n", int),
)
def main(model=None, output_dir=None, n_iter=100):
    """Load the model, set up the pipeline and train the entity recognizer."""
    if model is not None:
        nlp = spacy.load(model)  # load existing spaCy model
        print("Loaded model '%s'" % model)
    else:
        nlp = spacy.blank("en")  # create blank Language class
        print("Created blank 'en' model")

    # create the built-in pipeline components and add them to the pipeline
    # nlp.create_pipe works for built-ins that are registered with spaCy
    if "ner" not in nlp.pipe_names:
        ner = nlp.create_pipe("ner")
        nlp.add_pipe(ner, last=True)
    # otherwise, get it so we can add labels
    else:
        ner = nlp.get_pipe("ner")

    # add labels
    for _, annotations in TRAIN_DATA:
        for ent in annotations.get("entities"):
  #          print(f"ent {ent}")
            ner.add_label(ent[2])

    # get names of other pipes to disable them during training
    pipe_exceptions = ["ner", "trf_wordpiecer", "trf_tok2vec"]
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]
    # only train NER
    with nlp.disable_pipes(*other_pipes), warnings.catch_warnings():
        # show warnings for misaligned entity spans once
        warnings.filterwarnings("once", category=UserWarning, module='spacy')

        # reset and initialize the weights randomly – but only if we're
        # training a new model
        if model is None:
            nlp.begin_training()
        for itn in range(n_iter):
            random.shuffle(TRAIN_DATA)
            losses = {}
            # batch up the examples using spaCy's minibatch
            batches = minibatch(TRAIN_DATA, size=compounding(4.0, 32.0, 1.001))
            for batch in batches:
                texts, annotations = zip(*batch)
                nlp.update(
                    texts,  # batch of texts
                    annotations,  # batch of annotations
                    drop=0.5,  # dropout - make it harder to memorise data
                    losses=losses,
                )
            print("Losses", losses)

    # test the trained model
    for text, _ in TRAIN_DATA:
        doc = nlp(text)
#        print("Entities", [(ent.text, ent.label_) for ent in doc.ents])
#        print("Tokens", [(t.text, t.ent_type_, t.ent_iob) for t in doc])

    # save model to output directory
    if output_dir is not None:
        output_dir = Path(output_dir)
        if not output_dir.exists():
            output_dir.mkdir()
        nlp.to_disk(output_dir)
        print("Saved model to", output_dir)

        # test the saved model
        print("Loading from", output_dir)
        nlp2 = spacy.load(output_dir)
#        for text, _ in TRAIN_DATA:
#            doc = nlp2(text)
#            print("Entities", [(ent.text, ent.label_) for ent in doc.ents])
#            print("Tokens", [(t.text, t.ent_type_, t.ent_iob) for t in doc])

I understand I should use

ner = nlp.add_pipe("ner", source=source_nlp)

instead of

 nlp.create_pipe("ner")

I have updated in that way, but regarding the

nlp.update

I guess there is some changes, can someone help to integrate my code to version 3?

SofieVL · November 3, 2020, 9:50am

spaCy 3 now uses the class Example to represent gold-standard annotations, and nlp.update takes a batch of Example objects as input.

To migrate from spaCy 2, have a look at this section specifically in the nightly docs: https://nightly.spacy.io/usage/v3#migrating-training-python. You'll probably want to rewrite your code to something like

for texts, annotations in batch:
    examples.append(Example.from_dict(nlp.make_doc(text), annotations))
nlp.update(examples)

robertto · November 4, 2020, 2:20pm

thank you for your answer, I am working on that, but it does not work, should I write again the whole of my code?

I do not where should I put your, basically I am looking for an update for this code:

!/usr/bin/env python
# coding: utf8
"""Example of training spaCy dependency parser, starting off with an existing
model or a blank model. For more details, see the documentation:
* Training: https://spacy.io/usage/training
* Dependency Parse: https://spacy.io/usage/linguistic-features#dependency-parse

Compatible with: spaCy v2.0.0+
Last tested with: v2.1.0
"""
from __future__ import unicode_literals, print_function

import plac
import random
from pathlib import Path
import spacy
from spacy.util import minibatch, compounding


# training data
TRAIN_DATA = [
    (
        "They trade mortgage-backed securities.",
        {
            "heads": [1, 1, 4, 4, 5, 1, 1],
            "deps": ["nsubj", "ROOT", "compound", "punct", "nmod", "dobj", "punct"],
        },
    ),
    (
        "I like London and Berlin.",
        {
            "heads": [1, 1, 1, 2, 2, 1],
            "deps": ["nsubj", "ROOT", "dobj", "cc", "conj", "punct"],
        },
    ),
]


@plac.annotations(
    model=("Model name. Defaults to blank 'en' model.", "option", "m", str),
    output_dir=("Optional output directory", "option", "o", Path),
    n_iter=("Number of training iterations", "option", "n", int),
)
def main(model=None, output_dir=None, n_iter=15):
    """Load the model, set up the pipeline and train the parser."""
    if model is not None:
        nlp = spacy.load(model)  # load existing spaCy model
        print("Loaded model '%s'" % model)
    else:
        nlp = spacy.blank("en")  # create blank Language class
        print("Created blank 'en' model")

    # add the parser to the pipeline if it doesn't exist
    # nlp.create_pipe works for built-ins that are registered with spaCy
    if "parser" not in nlp.pipe_names:
        parser = nlp.create_pipe("parser")
        nlp.add_pipe(parser, first=True)
    # otherwise, get it, so we can add labels to it
    else:
        parser = nlp.get_pipe("parser")

    # add labels to the parser
    for _, annotations in TRAIN_DATA:
        for dep in annotations.get("deps", []):
            parser.add_label(dep)

    # get names of other pipes to disable them during training
    pipe_exceptions = ["parser", "trf_wordpiecer", "trf_tok2vec"]
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe not in pipe_exceptions]
    with nlp.disable_pipes(*other_pipes):  # only train parser
        optimizer = nlp.begin_training()
        for itn in range(n_iter):
            random.shuffle(TRAIN_DATA)
            losses = {}
            # batch up the examples using spaCy's minibatch
            batches = minibatch(TRAIN_DATA, size=compounding(4.0, 32.0, 1.001))
            for batch in batches:
                texts, annotations = zip(*batch)
                nlp.update(texts, annotations, sgd=optimizer, losses=losses)
            print("Losses", losses)

    # test the trained model
    test_text = "I like securities."
    doc = nlp(test_text)
    print("Dependencies", [(t.text, t.dep_, t.head.text) for t in doc])

    # save model to output directory
    if output_dir is not None:
        output_dir = Path(output_dir)
        if not output_dir.exists():
            output_dir.mkdir()
        nlp.to_disk(output_dir)
        print("Saved model to", output_dir)

        # test the saved model
        print("Loading from", output_dir)
        nlp2 = spacy.load(output_dir)
        doc = nlp2(test_text)
        print("Dependencies", [(t.text, t.dep_, t.head.text) for t in doc])


if __name__ == "__main__":
    plac.call(main)

    # expected result:
    # [
    #   ('I', 'nsubj', 'like'),
    #   ('like', 'ROOT', 'like'),
    #   ('securities', 'dobj', 'like'),
    #   ('.', 'punct', 'like')
    # ]

in spacy main page

many many thanks

robertto · November 5, 2020, 1:47pm

let me ask my question in another fashion, imagine I have the my annotation in this format

[('On the distinction between the first motion and the second or proper motions; and in the proper motions, between the first and the second inequality.',
  {'entities': [(131, 148, 'MODELL')]}) ...

now I want to define a deep earning approach in spacy 3 use this training data and define a custom named entity recognition

1- would it make sense to do it in spacy 3 (is there any difference)_
2- how can I start similar Ias have done in spacy 2 in above?
I am reading this one now, still could figure it out

chinmaydas96 · June 24, 2021, 1:36pm

            for batch in batches:
                texts, annotations = zip(*batch)
                
                example = []
                # Update the model with iterating each text
                for i in range(len(texts)):
                    doc = nlp.make_doc(texts[i])
                    example.append(Example.from_dict(doc, annotations[i]))
                
                # Update the model
                nlp.update(example, drop=0.5, losses=losses)

This is the successful code which I converted from spacy 2 to 3 giving result without error.

Topic		Replies	Views
Training the NER pipeline component of an existing model ner , spacy , off-topic	2	932	September 14, 2021
spaCy 3 nightly: How to further train an existing model usage , spacy	5	4435	February 3, 2023
Add custom NER model from prodigy to spacy pipeline usage , ner , spacy , solved	3	2363	October 5, 2022
Add custom NER model from prodigy to spacy pipeline - spaCy V3 usage , ner , spacy	1	356	October 6, 2022
Migration from spaCy 2.3 to 3.x + Annotating data in prodigy usage , spacy	1	481	August 29, 2021

updating training pipline of NER from spacy 2 to spacy 3

Related topics