do you have any recommendations on how to debug a thinc model?
This is the fault of the BiLSTM layer, which as I said is a bit unloved -- most of the others have nice shape checks :(. The BiLSTM outputs at 2*width, and the residual layers require input width == output width. The simplest solution will be to use 2*width
in the subsequent layers.
As a general tip for debugging: you can always wrap any function in thinc.api.layerize
, like this:
@layerize
def printer(inputs, drop=0.):
print(inputs)
def print_gradient(d_inputs, sgd=None):
print(d_inputs)
return inputs, d_inputs
This will give you a Thinc model you can insert anywhere, to spy on what's going on. I often insert these to monitor the mean and variance of the activations and gradients. Doing this on every 100 updates is good. You want to look for neurons that have 0 variance (i.e. they're always the same value). You also want to check whether the means are increasing, or remaining stable.
Is there supposed to be an easy way to extend the prodigy TextClassifier with a custom thinc.neural.Model?
Either create a subclass that overwrites the .Model()
method, which creates the model, or pass in the model
when creating the class.
You can also assign to textcat.model
if that's easier. After you create the textcat
object, textcat.model
should have the value True
(that's the deault in the __init__()
. It then calls .Model()
during .begin_training()
, .from_bytes()
or .from_disk()
, but only if textcat.model
is set to True
.
Because the model is created late, rather than during __init__()
, it's easy to assign a different model instead.
A more general comment:
One of our next priorities is to finish writing wrappers for other libraries, so that you can use a PyTorch or Tensorflow model within spaCy and Prodigy. You'll also be able to plug a model for one of these libraries into a Thinc model, or even wire together networks from two libraries. Personally I'm very excited to try using XGBoost in some of my models. I think the academic community has been biased against it, especially in NLP.
We're very anxious to avoid a lock in effect, where people would rather use a different machine learning library but feel they're stuck with ours. The last thing we want to do is spend our time replacing things people are already happy with!