A quick note that I should've mentioned in my last reply: of course it makes sense to exploit the category hierarchy as much as possible when annotating --- e.g. annotate for the top of your hierarchy first, and then annotate within a node of the hierarchy once you have that label. But the way you annotate doesn't have to match the way you run your classifier once you have a batch of annotations.
See how you go -- possibly it's a bit slow. I would always recommend getting your stuff wired up end-to-end before fiddling with things like the label hierarchy --- you'll at least get predictions if you flatten it out, and you can work on improving the accuracy once you have everything connected.
Once you're tuning, you'll probably want to export data from Prodigy and train other text classifiers e.g. from Scikit-Learn. I think Vowpal Wabbit has support for hierarchical classification schemes, and it's super fast. You can export the annotations with prodigy db-out <dataset name>
. This will give you the data in a jsonl format. Scikit-Learn in particular is really great for sanity-checking. You can train some simple bag-of-words models to get a baseline, and figure out whether something's not right.
Yes. The good news is Thinc is designed to make this sort of thing pretty easily...The bad news is there's no real documentation, and the API is unstable. You could also just use Tensorflow or PyTorch if you wanted to write a different model yourself.
Here's a quick example of what things would look like in Thinc. The main thing to understand is that Thinc takes a very "functional programming" view of the problem of wiring neural networks together. A model is just a function that can return a callback to do the backward pass. Then we write other functions to compose these models.
Let's say we want to have some model feed-forward into three separate output layers. The function to compose them would look like this:
# Note: Example code that I have not run.
from thinc.api import wrap
def multiplex(lower_layer, output_layers):
''''Connect a lower layer to multiple outputs. The resulting layer outputs a tuple of values, and expects a tuple of gradients.'''
def multiplex_forward(inputs, drop=0.):
'''Perform the forward pass, and return a callback to complete the backward pass.'''
hidden, get_d_inputs = lower_layer.begin_update(inputs, drop=drop)
outputs = []
get_d_hiddens = []
for output_layer in output_layers:
output, get_d_hidden = output_layer.begin_update(hidden, drop=drop)
outputs.append(output)
get_d_hiddens.append(get_d_hidden)
def multiplex_backward(d_outputs, sgd=None):
'''Callback to complete the backward pass. Expects the gradient w.r.t. the outputs,
and a callable, 'sgd', which is the optimizer.'''
d_hidden = get_d_hiddens[0](d_outputs[0], sgd=sgd)
for d_output, get_d_hidden in zip(d_outputs[1:], get_d_hiddens[1:]):
d_hidden += get_d_hidden(d_output, sgd=sgd)
d_inputs = get_d_inputs(d_hidden)
return d_inputs
return outputs, multiplex_backward
# Turns our function into a thinc.model.Model instance, and remembers its sublayers (for serialization etc)
model = wrap(multiplex_forward, [lower_layer] + outputs)
return model
I haven't run that, so it's probably full of bugs --- but it should be roughly what you would need to do. Logically, if you connect 3 output layers to some input layer, the gradients from those output layers get summed to compute the gradient to feed back to the input. (It might be tempted to weight that sum, if some output is less important than another. This can work, but equivalently you can just weight the loss function producing the gradients that are flowing down. This should give you the same thing, while being a bit cleaner and easier to describe.)