ner.manual training pauses indefinitely every 10 saves?

koaning · May 2, 2022, 7:16am

To put it very simply; the .jsonl file isn't read into memory all at once. It's more loaded like a Python generator where items are picked one-by-one. If the current item matches the pattern it's added to the batch. Once there's a batch of 10 items (or whatever you've configured) the batch is sent to the front end. I'm glossing over some details here because Prodigy also checks if the item is labelled before, but this is the gist.

One thing that's not totally clear yet, am I able to run the same ner.manual command over and over, and if the parameters are all the same, it will continue to build on the existing database that was previously started?

Yes, unless you're doing something fancy with custom recipes. This is because data inputs and labelling tasks are hashed before they're considered to be candidates and they are compared to already existing labels. This is a mechanism to prevent duplicates from getting into the database. More details on the hashing can be found in the Prodigy docs here.

A final question; what kind of patterns are you using? Very complex regexes or patterns with parts of speech? The reason why I'm mentioning it is because you might be able to get a speedup by only doing string-matching. I'll take the example from the Prodigy docs here to explain.

{"label": "FRUIT", "pattern": [{"lower": "apple"}]}
{"label": "FRUIT", "pattern": [{"lower": "goji"}, {"lower": "berry"}]}
{"label": "VEGETABLE", "pattern": [{"lower": "squash", "pos": "NOUN"}]}
{"label": "VEGETABLE", "pattern": "Lamb's lettuce"}

You'll notice in this final pattern there's no list but a string. These strings are fed internally to spaCy's PhraseMatcher which is typically faster than the Token Matcher.

Topic		Replies	Views
ner.manual order of texts usage , ner , done , solved	10	1224	January 2, 2020
Prodigy NER train recipe getting killed by OOM usage , ner	5	1237	June 14, 2022
Loading... freeze after 150 labels using ner.teach ner , solved	3	633	September 20, 2019
text labeling (ner.manual) session hangs at the 3rd or 4th document ner , front-end	1	312	March 10, 2022
Getting Started Questions usage , ner	1	631	November 6, 2018

ner.manual training pauses indefinitely every 10 saves?

Related topics