Error when using data-to-spacy

Hi,

I'm using a custom tokenizer so far I've had no issues training a model with that custom tokenizer and using that model in prodigy to help me annotate data for a NER task.
However when I want to export the data with data-to-spacy I run into trouble.
I'm providing a base-model, so that it'll use the correct tokenizer but it seems there's an issue (like I said, I'm not getting that error when using the model to annotate):

Here's the stacktrace:

Traceback (most recent call last):
File "C:\Users\szabop\AppData\Local\Programs\Python\Python37\Lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "C:\Users\szabop\AppData\Local\Programs\Python\Python37\Lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "C:\Users\szabop.virtualenvs\prodigy_nightly_v3-x0wIMKXr\lib\site-packages\prodigy_main
.py", line 54, in
controller = recipe(args, use_plac=True)
File "cython_src\prodigy\core.pyx", line 329, in prodigy.core.recipe.recipe_decorator.recipe_proxy
File "C:\Users\szabop.virtualenvs\prodigy_nightly_v3-x0wIMKXr\lib\site-packages\plac_core.py", line 367, in call
cmd, result = parser.consume(arglist)
File "C:\Users\szabop.virtualenvs\prodigy_nightly_v3-x0wIMKXr\lib\site-packages\plac_core.py", line 232, in consume
return cmd, self.func(
(args + varargs + extraopts), **kwargs)
File "C:\Users\szabop.virtualenvs\prodigy_nightly_v3-x0wIMKXr\lib\site-packages\prodigy\recipes\train.py", line 447, in data_to_spacy
nlp = spacy_init_nlp(config, use_gpu=0 if gpu else -1) # ID doesn't matter
File "C:\Users\szabop.virtualenvs\prodigy_nightly_v3-x0wIMKXr\lib\site-packages\spacy\training\initialize.py", line 76, in init_nlp
nlp.initialize(lambda: train_corpus(nlp), sgd=optimizer)
File "C:\Users\szabop.virtualenvs\prodigy_nightly_v3-x0wIMKXr\lib\site-packages\spacy\language.py", line 1224, in initialize
I = registry.resolve(config["initialize"], schema=ConfigSchemaInit)
File "C:\Users\szabop.virtualenvs\prodigy_nightly_v3-x0wIMKXr\lib\site-packages\thinc\config.py", line 728, in resolve
config, schema=schema, overrides=overrides, validate=validate, resolve=True
File "C:\Users\szabop.virtualenvs\prodigy_nightly_v3-x0wIMKXr\lib\site-packages\thinc\config.py", line 777, in _make
config, schema, validate=validate, overrides=overrides, resolve=resolve
File "C:\Users\szabop.virtualenvs\prodigy_nightly_v3-x0wIMKXr\lib\site-packages\thinc\config.py", line 830, in _fill
promise_schema = cls.make_promise_schema(value, resolve=resolve)
File "C:\Users\szabop.virtualenvs\prodigy_nightly_v3-x0wIMKXr\lib\site-packages\thinc\config.py", line 1021, in make_promise_schema
func = cls.get(reg_name, func_name)
File "C:\Users\szabop.virtualenvs\prodigy_nightly_v3-x0wIMKXr\lib\site-packages\spacy\util.py", line 143, in get
) from None
catalogue.RegistryError: [E893] Could not find function 'customize_tokenizer' in function registry 'callbacks'. If you're using a custom function, make sure the code is available. If the function is provided by a third-party package, e.g. spacy-transformers, make sure the package is installed in your environment.
Available names: spacy.copy_from_base_model.v1

I thought that it'd just use the tokenizer that's "in" the model.

How do I resolve this?
It doesn't seem like data-to-spacy has a --code option like spacy train does :confused:

Any help would be much appreciated.

Hi!

In the upcoming 1.11 version of Prodigy, the -F flag will be similar to the --code flag in spaCy, for importing any custom functions.

Could you try data-to-spacy again with the -F flag pointing to the code of your custom tokenizer?

I'm not 100% sure that'll fix things for you though, because we've been working on a fix in the base_model functionality, that may have not found its way into the latest release yet. So let us know if you still run into trouble!

Thank you! That solved the issue :slight_smile:
It's worth mentioning that I'm on the nightly though.

Prodigy and this forum have been SO helpful! Thank's for all your work!

1 Like

Great, happy to hear it!