Sentencizer seems to break Language.evaluate method

asdf · June 14, 2019, 10:17pm

I just trained some new models (from en_core_web_lg) and I am having trouble with the Language.evaluate method on them.

import spacy
from spacy.gold import GoldParse
import en_core_web_lg


def calculate_ent_offsets(test_ed):
    char_offsets = []
    for x in test_ed.ents:
        char_offsets.append((x.start_char, x.end_char, x.label_))
    return char_offsets

core_model = en_core_web_lg.load()
new_model = spacy.load("CU_new")

doc = next(core_model.entity.pipe([core_model.tokenizer("Where is Mary?")]))

model = core_model
doc_golds = [(model.tokenizer(str(doc)), GoldParse(doc, entities=calculate_ent_offsets(doc)))]
score = model.evaluate(doc_golds)

assert(new_model("Where is Mary?").to_json() == core_model("Where is Mary?").to_json())

print("OK")

model = new_model

doc_golds = [(model.tokenizer(str(doc)), GoldParse(doc, entities=calculate_ent_offsets(doc)))]
score = model.evaluate(doc_golds)

I get an error in the second case for unexpected keyword argument which doesn’t make sense to me because they should have the same class definition.

TypeError                                 Traceback (most recent call last)
<ipython-input-1-0d3e786850a7> in <module>
     26 
     27 doc_golds = [(model.tokenizer(str(doc)), GoldParse(doc, entities=calculate_ent_offsets(doc)))]
---> 28 score = model.evaluate(doc_golds)
     29 
     30 

/venv/lib64/python3.6/site-packages/spacy/language.py in evaluate(self, docs_golds, verbose, batch_size, scorer, component_cfg)
    608             else:
    609                 docs = pipe.pipe(docs, **kwargs)
--> 610         for doc, gold in zip(docs, golds):
    611             if verbose:
    612                 print(doc)

nn_parser.pyx in pipe()

/venv/lib64/python3.6/site-packages/spacy/util.py in minibatch(items, size)
    455     while True:
    456         batch_size = next(size_)
--> 457         batch = list(itertools.islice(items, int(batch_size)))
    458         if len(batch) == 0:
    459             break

nn_parser.pyx in pipe()

/venv/lib64/python3.6/site-packages/spacy/util.py in minibatch(items, size)
    455     while True:
    456         batch_size = next(size_)
--> 457         batch = list(itertools.islice(items, int(batch_size)))
    458         if len(batch) == 0:
    459             break

pipes.pyx in pipe()

/venv/lib64/python3.6/site-packages/spacy/util.py in minibatch(items, size)
    455     while True:
    456         batch_size = next(size_)
--> 457         batch = list(itertools.islice(items, int(batch_size)))
    458         if len(batch) == 0:
    459             break

/venv/lib64/python3.6/site-packages/spacy/language.py in <genexpr>(.0)
    605             kwargs.setdefault("batch_size", batch_size)
    606             if not hasattr(pipe, "pipe"):
--> 607                 docs = (pipe(doc, **kwargs) for doc in docs)
    608             else:
    609                 docs = pipe.pipe(docs, **kwargs)

nn_parser.pyx in spacy.syntax.nn_parser.Parser.__call__()

TypeError: __call__() got an unexpected keyword argument 'batch_size'

If I remove the sentencizer pipe from the language object, evaluate runs fine.

ines · June 15, 2019, 7:31am

Hi! If I read your report correctly, this is related to spaCy only, right? If so, would you mind reporting this on the spaCy issue tracker instead? We’re trying to keep this forum specific to Prodigy, and having spaCy-related reports on the tracker makes it much easier to keep an overview and action on them.

Also, could you double-check that you’re running the latest version, v2.1.4? I vaguely remember a similar bug that we already fixed, so if you’re on an older version, upgrading might fix the problem.

Topic		Replies	Views
Cryptic error message when training a model after evaluation usage , spacy , solved	4	2188	March 13, 2018
"data-to"spacy" does not sentencize text based on custom sentencizer. enhancement , ner , done , spacy	2	1207	June 17, 2020
Tokens from 'Tokenizer' are different from 'en' model usage , spacy , solved	2	686	April 3, 2019
Training a new parser or NER using a model with no lexeme normalization table. This may degrade the performance of the model to some degree... usage , spacy	4	2117	November 2, 2020
Sentencizer configuration questions spacy	2	666	February 19, 2023

Sentencizer seems to break Language.evaluate method

Related Topics