Prodigy Output Visualization, Dependencies Structure Training Help

Hello,
I would like about 2 things. First, I trained a model and would like to test it on plain text. How can I do that and is there a recipe to visualize the output text?

Secondly, I have a text file with dependencies for ontonotes dataset for arabic language. Is there a specific structure to follow in order to create the json file for dependencies to be imported to prodigy?

Thank you so much for the help in advance.

Hi! If you've trained a spaCy model, I think the easiest way would be to just load it with spaCy directly and look at the annotations you're interested in. spaCy also comes with built-in visualizers that you can run from your terminal or in a Jupyter notebook: https://spacy.io/usage/visualizers

import spacy
from spacy import displacy

nlp = spacy.load("/path/to/model/you/just/trained")
doc = nlp("Some plain text goes here")
# Print the output (choose the attributes you're interested in)
print([(token.text, token.pos_, token.dep_) for token in doc])
print([(ent.text, ent.label_) for ent in doc.ents])
# Visualize
displacy.serve(doc, style="dep")  # or displacy.render if you're in a notebook
displacy.serve(doc, style="ent")

If you want a more interactive demo you can play with or share with others, check out the spacy-streamlit package:

Yes, see my comment on this thread:

Thank you so much. Can I ask you also about another thing? When I save each trained model from prodigy, the size of it is big like in gb. Is there a way to compress the size of the model?

Thank you in advance.

This is a trade-off you have to make ā€“ if you're using word vectors to initialise your model, those will be required at runtime, so they need to be saved out with the model. Vectors can get quite large and they're mostly what makes up the difference in size (without them, your model directory would be more like 10-20mb).

Depending on the vectors you use, you could experiment with pruning them: https://v2.spacy.io/usage/vectors-similarity#custom-vectors-coverage This can give you similar coverage, which a much smaller vectors table overall. You could also try using smaller vectors, or train your own on a smaller but more representative sample of raw text.