Segmentation Fault

mmars9 · January 24, 2018, 1:23pm

Hi there,

I just want to be sure I’m understanding this error correctly. I’m working on further training the ‘ORG’ label from a spacy model. The model had been trained once in spacy, renamed “ner_train_v3”, and I used ner_train_v3 in prodigy’s ner.teach recipe.

That all goes okay, but when I try using prodigy’s ner.batch-train recipe, I invariably get an error like the below:

line 1: 1914 Segmentation fault: 11 python -m prodigy “$@”
(spacy_pycharm_ner) bash-3.2$

Navigating to that line in the relevant dataset, I can see the sentence is has some issues, because the stream data I’m throwing in the dataset hasn’t been meticulously cleaned:

{“text”:“Romania posted the European Union**\u201a\u00c4\u00f4s** highest economic growth rate in the third quarter at 8.8 percent year-on-year, but it also had the largest rate of household deprivation, Eurostat data showed, with one in two Romanians struggling to keep their home warm or pay their bills on time."”,“spans”:[{“start”:176,“end”:184,“text”:“Eurostat”,“rank”:0,“label”:“ORG”,“score”:0.6449537913,“source”:“core_web_lg”,“input_hash”:762417285}],“meta”:{“score”:0.6449537913},"_input_hash":762417285,"_task_hash":-1449057813,“answer”:“accept”}

Does the inclusion of quotes ( \u201a, etc) throw it off, or is it the strange break at the end of the sentence?

ines · January 24, 2018, 1:42pm

Thanks for the report and finding the example it likely fails on! Errors like this are often difficult to debug, so having a concrete example is very valuable. “Weird” formatting and unicode characters should never cause a segfault – this is definitely a bug, either in Prodigy’s NER model or somewhere in spaCy.

(In the meantime, you could always try removing that example from your set and see if you can run ner.batch-train without any problems?)

mmars9 · January 24, 2018, 2:03pm

Good to know, and I'll be doing that -- thanks!

nikeqiang · February 5, 2018, 5:21am

Hi @mmars9, @ines , I’m experiencing a similar problem training the NER on anything but a very small set of examples. Training on anything over 1000 examples throws the following error. Is this a memory error? Has either of you come up with a temporary solution?

Example Error messages when running prodigy:

line 1: 41665 Segmentation fault: 11 python -m prodigy “$@”

Info about spaCy
Python version: 3.6.3
spaCy version: 2.0.5
Models: en, en_core_sm
Platform: MacOS

I note that I got the same error when trying to train using each of (a) the Prodigy ner.batch-train recipe and (b) the regular spacy train_ner.py script.

FYI I’ve left a note on the spacy boards too (https://github.com/explosion/spaCy/issues/1757) since I assume its the same issue.

Thanks!

baeumer · April 16, 2018, 7:40pm

Unfortunately, I now also have to fight with this error. If I only have a few annotations (ner.teach), then I can work with ner.batch-train. But if I have processed about 1000 texts, then the error appears. But I don’t see any problem with the memory and CPU usage.

Segmentation fault: 11

python3.6 -m prodigy ner.batch-train db /Users/frederik/mdl/modell -l label1,label2 -e db -o modell2

sooheon · June 30, 2018, 7:43pm

I’m seeing this every time that I run ner.train-curve or ner.batch-train now, and like @baeumer I have just over 1000 annotations.

I also do not see too much memory or CPU usage.

aniruddha · August 25, 2018, 5:15am

After I just got over a thousand annotations in the NER, ner.batch-train isn’t succeeding at all. Every single time I am stopped with a segfault.

aniruddha · August 25, 2018, 12:28pm

I think I found my problem. I know about the character limitation on spaCy. In the database I found a few very long strings of text. I dropped them and training seems to work marvelous! (Except the fact that 1107 sentences took 14 GB of RAM and 5 GB of swap on Ubuntu 18.04)

honnibal · August 30, 2018, 9:22am

@aniruddha Thanks for the update! Could you try decreasing your batch size, to see if it solves your memory usage issues?

trevorwelch · April 25, 2019, 9:22pm

Same error, prodigy: line 1: 9693 Segmentation fault: 11 python -m prodigy "$@"

This was during an annotation task, which was launched via prodigy ner.teach menu_brand_tagging en_core_web_sm menu_data_1018.jsonl --label BRAND --patterns brand_patterns.jsonl

Unfortunately it was accompanied by this in the annotation front-end as well! 30%20PM

I don’t think I lost too many, but a good reminder to hit save often.

ines · April 26, 2019, 9:34am

What type of texts are in your menu_data_1018.jsonl? Are they long or short? Any particularly long texts, or texts with lots of whitespace?

The errors in the app are a direct response to the server dying, btw. As soon as the Prodigy app fails to connect to the server, it will show you the error, so you know that something is up. (Otherwise, you’d have to keep checking the terminal, which is pretty inconvenient.) Prodigy auto-saves the annotations in batches and also uses them to update the model in the loop (if you’re using an active learning recipe like ner.teach).

If you’re using the default batch size of 10, the maximum amount of annotations you could theoretically ever lose at a time is 19 (10 items in the history and 9 waiting to be sent out as soon as they become 10). If Prodigy is unable to save, the examples are still all in your browser btw – so you can always restart the server in the terminal and then hit save in the web app, and the “stranded” examples should be saved.

Topic		Replies	Views
Segmentation fault when using ner.batch-train done , spacy	1	508	June 12, 2018
Segmentation fault (intermittent) done , spacy	10	2053	August 30, 2018
ner.batch-train random Python has stopped/Segmentation Fault ner , done , windows	1	571	September 24, 2018
"Known Good" version of Prodigy dependencies ner , spacy	4	558	August 31, 2018
ner correct with prodigy 1.11.8 ner	11	533	December 30, 2022

Segmentation Fault

Related topics