TextCategorizer

In the documentation, Default: Stacked ensemble of a bag-of-words model and a neural network model. The neural network uses a CNN with mean pooling and attention. The “ngram_size” and “attr” arguments can be used to configure the feature extraction for the bag-of-words model. How do you add the ngram_size and attr arguments to config?

Hi @mcharest1135,

In general we can't support spaCy usage questions on this forum, so in future you'll need to ask this type of question on a forum such as StackOverflow. I've answered it this time but I've also unlisted the thread, and in future we won't be able to help here.

spaCy v2 doesn't really make it easy to configure the models, but you should be able to pass the ngram_size attribute as a config parameter in nlp.create_pipe, like this:

nlp.add_pipe(
    nlp.create_pipe("textcat", config={"ngram_size": 3, "architecture": "bow"}))

If you're working from Prodigy, you'll want to do this inside your recipe script. You can also modify the cfg dictionary of the component after it's created, like this:

textcat = nlp.get_pipe("textcat")
textcat.cfg["ngram_size"] = 3
textcat.cfg["architecture"] = "bow"

You'll need to make these modifications before the call to nlp.begin_training, as during begin_training the model instance is created from the config.

If you need further help with the spaCy usage, you can also post on the spaCy issue tracker.