Force GPU when annotating models trained with GPU

kylebigelow · March 11, 2023, 6:53pm

Is there a way to force the recipes like ner.correct to use GPU when I load a model trained with a GPU? I looked in the ner.py file and did not see a way to set this parameter (e.g. spacy.require_gpu(0)).

koaning · March 13, 2023, 9:38am

Hi Kyle!

I don't think we currently allow for this, so I'd like to understand the use-case better. Is the inference too slow? If not, what's the reason you'd like to have a GPU here? If it is, could you describe the documents that you're annotating? I've not heard of people experiencing a serious lag before during ner.correct.

kylebigelow · March 13, 2023, 9:01pm

I have a JSONL file with 350K+ lines. I meant to use GPU for ner.teach. Sometimes loading the file takes a few minutes. I load this large JSONL so I can try to find those outliers that would not be detected if I chunked the file up and loaded a chunk of the dataset.

koaning · March 15, 2023, 2:31pm

I figured I'd give this a spin locally. First, I generate some data.

import srsly 


def make_many(n=1_000_000):
    for i in range(n):
        yield {"text": f"I am Vincent and this is example #{i}."}


srsly.write_jsonl("examples.jsonl", make_many())

This generates a file with 1M examples on disk that's about 50Mb. The documents themselves aren't huge, but there's a lot of them. Next, I am able to run the ner.teach recipe just fine without any lag.

> python -m prodigy ner.teach issue-6423 en_core_web_sm examples.jsonl --label PERSON

Within a few seconds I see the server message.

Using 1 label(s): PERSON
Added dataset issue-6423 to database SQLite.

✨  Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating!

And I'm able to annotate just fine. So it doesn't seem like it's behavior is something I'm able to reproduce just yet.

It could be that you're experiencing a lag because you're dealing with much bigger documents but I'm a bit surprised the startup takes so long on your end. The reason why this recipe loads quickly is that ner.teach doesn't loop over all the examples immediately on startup. It merely checks the current batch under consideration.

I'd like to understand the lag a bit better though, is there anything else you can share about your setup? How long are your documents? Is there anything specific about the model that you're using in ner.teach? Does the issue go away when you use a smaller file? You can quickly create one via:

head -100 examples.jsonl > subset.jsonl

kylebigelow · March 24, 2023, 8:10pm

Thanks @koaning. I had batch_size in the prodigy.json file at 1000 and noticed an improvement when setting it to 50. My documents are job postings that can range from a paragraph or two, to several paragraphs. I also have a very long custom_theme and labels that may be adding some overhead.

koaning · March 25, 2023, 10:10am

Ah, yeah that would explain it.

Out of curiosity, is there a reason why you've set up a larger batch size?

Topic		Replies	Views
How to use GPU to accelerate the train of NER tasks? training	5	2437	August 25, 2021
Train for NER ner , spacy , training	3	605	July 11, 2022
Will a GPU make training faster? spacy	7	7923	July 20, 2018
Large Datasets Google Cloud usage , ner , google-cloud	5	1810	October 13, 2018
GPU Support usage	1	1143	November 28, 2017

Force GPU when annotating models trained with GPU

Related topics