BERT recipe when using transformer in pipeline?

ines · May 20, 2021, 2:45am

When you set PRODIGY_LOGGING=basic, is there anything in the logs that looks relevant? If you end up with no examples in the stream, this typically means that all examples were skipped, either because they're already annotated in the dataset, or because they're invalid for some othe reason (invalid JSON, no "text").

Also double-check that your stream generator doesn't get stuck in an infinite loop or similar by accident (bugs here can sometimes be pretty subtle), and if you're using PyTorch, check that PyTorch doesn't spawn multiple threads under the hood. (If it does, try moving the stream processing logic into a separate Python script and pip the JSON output forward, so you can ensure it runs in the main thread.)

Just a quick note, it's possible that the update callback will end up tricky to implement with the large transformer models: the updating itself can be a bit slow (especially on CPU), and the models usually expect larger batch sizes and don't always respond well to small individual batch updates.

So it might turn out that a better workflow for transformers in the loop is to annotate ~100 examples, train, load the new model in, annotate another ~100 examples, and so on.

We already have a nightly pre-release out that you can try: ✨ Prodigy nightly: spaCy v3 support, UI for overlapping spans & more We're hoping to have the stable release ready within the next few weeks – the main feature holding it up was improved support in spaCy v3 for binary annotations and learning from "negative examples (see this PR).

Topic		Replies	Views
transformers model for NER ner , spacy	6	408	October 31, 2023
config.cfg for bert.ner.manual usage , ner , transformers	5	830	September 30, 2022
data-to-spacy is not using my custom tokenizer ner , spacy	7	1087	May 15, 2023
BERT support for prodigy train ner usage , ner , spacy , solved	2	1026	June 30, 2021
Alignment of NER tokens when creating suggestions using Transformers ner	7	1067	August 12, 2022

BERT recipe when using transformer in pipeline?

Related topics