Prodigy Output Visualization, Dependencies Structure Training Help

hmousa961 · June 15, 2021, 7:33am

Hello,
I would like about 2 things. First, I trained a model and would like to test it on plain text. How can I do that and is there a recipe to visualize the output text?

Secondly, I have a text file with dependencies for ontonotes dataset for arabic language. Is there a specific structure to follow in order to create the json file for dependencies to be imported to prodigy?

Thank you so much for the help in advance.

ines · June 16, 2021, 2:02am

Hi! If you've trained a spaCy model, I think the easiest way would be to just load it with spaCy directly and look at the annotations you're interested in. spaCy also comes with built-in visualizers that you can run from your terminal or in a Jupyter notebook: Visualizers · spaCy Usage Documentation

import spacy
from spacy import displacy

nlp = spacy.load("/path/to/model/you/just/trained")
doc = nlp("Some plain text goes here")
# Print the output (choose the attributes you're interested in)
print([(token.text, token.pos_, token.dep_) for token in doc])
print([(ent.text, ent.label_) for ent in doc.ents])
# Visualize
displacy.serve(doc, style="dep")  # or displacy.render if you're in a notebook
displacy.serve(doc, style="ent")

If you want a more interactive demo you can play with or share with others, check out the spacy-streamlit package:

Yes, see my comment on this thread:

hmousa961 · June 17, 2021, 8:13am

Thank you so much. Can I ask you also about another thing? When I save each trained model from prodigy, the size of it is big like in gb. Is there a way to compress the size of the model?

Thank you in advance.

ines · June 18, 2021, 5:36am

This is a trade-off you have to make – if you're using word vectors to initialise your model, those will be required at runtime, so they need to be saved out with the model. Vectors can get quite large and they're mostly what makes up the difference in size (without them, your model directory would be more like 10-20mb).

Depending on the vectors you use, you could experiment with pruning them: https://v2.spacy.io/usage/vectors-similarity#custom-vectors-coverage This can give you similar coverage, which a much smaller vectors table overall. You could also try using smaller vectors, or train your own on a smaller but more representative sample of raw text.

Topic		Replies	Views
Book usage	1	394	March 4, 2022
How to use output model? usage , spacy , training	3	404	November 8, 2021
Dep.Teach doesn't use same tokenenization as pretrained model spacy , dep	13	1803	March 10, 2020
Feeding prodigy annotated data to spacy in python usage , spacy , training	4	651	October 8, 2021
How do I use prodigy as a purely annotation tool with no underlying SpaCy model? usage	1	1590	April 27, 2018

Prodigy Output Visualization, Dependencies Structure Training Help

Related topics