Question about inconsistent labeling between Prodigy and Jupyter notebook

jiebei · May 1, 2023, 4:21pm

Hello. I am using ner to training a citation data. One thing confused me is the reoccurring inconsistency between what I see in Prodigy and Jupyter Notebook by applying the same model (obtained through ner.correct).
For example, after i start the teach recipe with label JOURNAL

python -m prodigy ner.teach citation_2nd_teach-1_binary .\citation_2nd_correct-1_model\model-best .\citation_2nd\To-annotate\4articles.txt --label JOURNAL

The exemplar citation" Oscar Bernal et al., Assessing the Contribution of Banks, Insurance, and Other Financial Services to Systemic Risk, 47 J. Banking & Fin. 270, 271 (2014) "

It can be highlighted correctly in Jupyter Notebook, but was broken into three pages in Prodigy interface, and can not be recognized as JOURNAL. Do you have any ideas of how I can fix it and perform accept and reject in Prodigy teach? Thank you in advance!

ryanwesslen · May 1, 2023, 4:28pm

hi @jiebei!

Great to hear from you and glad to see you're making great progress with Prodigy!

Try to add --unsegmented to your Prodigy command (i.e., python -m prodigy ner.teach ... --unsegmented and it may fix it.

When you're using "binary" recipes like ner.teach, it automatically does sentence segmentation by default. This is why it's breaking up these different parts by the sentence segmenter which is likely a generic one that isn't perfect (hence the model sometimes gets confused by periods used for different purposes). By adding --unsegmented, it'll ignore the sentence segmenter and show the entire document without sentence segmentation.

FYI - you can train a custom segmenter with Prodigy with recipes like sents.manual or sents.correct, which can be very helpful in creating a fine-tuned sentence segmenter for legal texts.

jiebei · May 2, 2023, 1:41pm

Yes, this solved the issue! Thank you, Ryan!

Topic		Replies	Views
Strange text segmentation with ner.teach recipe usage	7	596	September 9, 2019
Implementing ner.correct says the model you are using isn't setting sentence boundaries ner , solved	8	363	July 24, 2023
ner.correct text split across multiple screens in Prodigy GUI ner , solved	2	250	January 19, 2023
merging segmented examples (Prodigy ner.correct) and keeping track on documents usage , ner , solved	2	333	February 16, 2020
Best Practices for Segmenting Text into Passages and Applying Multi-label Classification	1	794	September 13, 2023

Question about inconsistent labeling between Prodigy and Jupyter notebook

Related topics