Bus error: 10

mithunpaul08 · November 24, 2022, 5:54am

Hey Guys,

I get a bus error (highlighted below), when I am training using spancat. I searched the documentation/support pages, but couldn't find anything about it. I am on an Apple m1 machine, so not sure if that is causing it. Below are the commands i used for training and also the output of prodigy stats. Since the error below said something about $@ I also searched if $@ occurs in any of my dataset files, couldn't find.

============================== ✨  Prodigy Stats ==============================

Version          1.11.8                        
Location         /Users/mitch/opt/miniconda3/lib/python3.9/site-packages/prodigy
Prodigy Home     /Users/mitch/.prodigy         
Platform         macOS-12.6-arm64-arm-64bit    
Python Version   3.9.12                        
Database Name    SQLite                        
Database Id      sqlite                        
Total Datasets   1                             
Total Sessions   1

prodigy train --spancat annotated_apwg_pass1_sep29th2022 --verbose

Components: spancat
Merging training and evaluation data for 1 components
  - [spancat] Training: 63 | Evaluation: 19 (20% split)
Training: 40 | Evaluation: 9
Labels: spancat (31)
  - [spancat] signature_url, sentence_org_used_by_employer, sentence_url_third_party, signature_signoff, sentence_intent_service, sentence_intent_click, signature_jobtitle, sentence_tone_polite, sentence_tone_urgent, sentence_passwd, signature_address, words_receiver_organization, sentence_intent_intro, signature_handle, sentence_url_no_name, sentence_intent_scheduling, signature, sentence_intent_phonecall, sentence_intent_attachment, signature_fullname, words_sender_organization, message_org, sentence_intent_money, message_contact_person_asking, signature_email, signature_org, words_sender_location, sentence_intent_unsubscribe, signature_phone, sentence_intent_recruiting, message_contact_person_org
ℹ Pipeline: ['tok2vec', 'spancat']
ℹ Initial learn rate: 0.001
E    #       LOSS TOK2VEC  LOSS SPANCAT  SPANS_SC_F  SPANS_SC_P  SPANS_SC_R  SCORE 
---  ------  ------------  ------------  ----------  ----------  ----------  ------
  0       0     258284.84     100890.67        0.00        0.00       50.38    0.00
**/Users/mitch/opt/miniconda3/bin/prodigy: line 2: 24178 Bus error: 10           python -m prodigy "$@"**

koaning · November 24, 2022, 3:23pm

Interesting. I found similar errors on Github for the spaCy project (here and here) but these are for old spaCy versions. So just to check, what version of spaCy is in your venv? Can you train a base entity model on your machine?

mithunpaul08 · November 24, 2022, 9:31pm

@koanig thank you for working during holidays.

Anyway, here are the outputs of spacy versions. I just realized am on python 3.9. Do you think that might be the problem ?

Also I didn't explicitly install spacy. Just prodigy. Should I try reinstalling spacy separately?

(base) mitch@D21ML-MMITHUN annotated_datasets % python -m spacy info

============================== Info about spaCy ==============================

spaCy version    3.4.3                         
Location         /Users/mitch/opt/miniconda3/lib/python3.9/site-packages/spacy
Platform         macOS-12.6-arm64-arm-64bit    
Python version   3.9.12                        
Pipelines                                      

(base) mitch@D21ML-MMITHUN annotated_datasets % python -m spacy validate
✔ Loaded compatibility table

================= Installed pipeline packages (spaCy v3.4.3) =================
ℹ spaCy installation:
/Users/mitch/opt/miniconda3/lib/python3.9/site-packages/spacy

No pipeline packages found in your current environment

Can you train a base entity model on your machine?

Did you mean something simple without spancat from Prodigy? Or run from spacy (after exporting using data-to-spacy) ? Can you point me to a documentation please?
update: i tried running using Spacy directly using the output of data-to-spacy. No explicit error, but got killed automatically after epoch 0 . Meanwhile let me go check the fixes mentioned in the links you pasted above.

To use this data for training with spaCy, you can run:
python -m spacy train corpus/config.cfg --paths.train corpus/train.spacy --paths.dev corpus/dev.spacy
(base) mitch@D21ML-MMITHUN annotated_datasets % python -m spacy train corpus/config.cfg --paths.train corpus/train.spacy --paths.dev corpus/dev.spacy
ℹ No output directory provided
ℹ Using CPU

=========================== Initializing pipeline ===========================
[2022-11-24 14:01:39,694] [INFO] Set up nlp object from config
[2022-11-24 14:01:39,699] [INFO] Pipeline: ['tok2vec', 'spancat']
[2022-11-24 14:01:39,701] [INFO] Created vocabulary
[2022-11-24 14:01:39,701] [INFO] Finished initializing nlp object
[2022-11-24 14:01:42,299] [INFO] Initialized pipeline components: ['tok2vec', 'spancat']
✔ Initialized pipeline

============================= Training pipeline =============================
ℹ Pipeline: ['tok2vec', 'spancat']
ℹ Initial learn rate: 0.001
E    #       LOSS TOK2VEC  LOSS SPANCAT  SPANS_SC_F  SPANS_SC_P  SPANS_SC_R  SCORE 
---  ------  ------------  ------------  ----------  ----------  ----------  ------
  0       0    7242988.50     529976.62        0.00        0.00       50.77    0.00
zsh: killed     python -m spacy train corpus/config.cfg --paths.train corpus/train.spacy

koaning · November 25, 2022, 9:53am

Out of curiosity, how large are your documents? Could it be a memory error because the documents are just huge?

mithunpaul08 · November 25, 2022, 7:40pm

No, its in kilobytes . I tested with documents of 10 data points through 516. (just annotated spans from prodigy-nothing fancy). The code gets either killed or hits bus error for all of them. On the bright side, i managed to install spacy separately and am trying now if I can train models from spacy alone, not using prodigy and see if the error persists.

mithunpaul08 · November 26, 2022, 3:32am

update: Good news. I was able to continue training by building spacy from source code and using pretty much the same command that was created by data-to-spacy. Which brings me to the conclusion that this was neither a data issue nor a prodigy/spacy issue, but an Apple M1 issue. or atleast there was something wrong with the compatibility of the spacy/prodigy executable when it comes to apple M1. Since its a C-level bug, I would suggest don't go into that rabbit hole. But if you want to, i can send you the data files I used. But one big learning I have had from this experience is: ALWAYS BUILD FROM SOURCE Thank you @koaning for your timely help. All Clear/you can close this bug report if you want to.

(base) mitch@D21ML-MMITHUN spaCy % python -m spacy train config.cfg --output ./output --paths.train ~/research/piranha/annotated_datasets/corpus/train.spacy --paths.dev ~/research/piranha/annotated_datasets/corpus/dev.spacy.
============================= Training pipeline =============================
ℹ Pipeline: ['tok2vec', 'spancat']
ℹ Initial learn rate: 0.001
E    #       LOSS TOK2VEC  LOSS SPANCAT  SPANS_SC_F  SPANS_SC_P  SPANS_SC_R  SCORE 
---  ------  ------------  ------------  ----------  ----------  ----------  ------
  0       0        213.07       8297.01        0.05        0.03       14.62    0.00
  3     200         81.81       5781.66        0.00        0.00        0.00    0.00
  7     400          0.00        450.28        0.00        0.00        0.00    0.00
 11     600          0.00        472.88        0.00        0.00        0.00    0.00
 16     800          2.61        854.36        0.00        0.00        0.00    0.00
 20    1000          0.00        558.93        0.00        0.00        0.00    0.00

koaning · November 28, 2022, 3:41pm

If possible, feel free to report the bug here. Our spaCy team would love to understand the details better.

Topic		Replies	Views
Train spancat bug spacy , training , spancat	7	556	October 12, 2021
SpanCat Training Error on Custom Preprocessed Dataset usage , training , spancat	6	837	March 7, 2023
[E143] Labels for component 'spancat' not initialized. ner , spancat	5	369	February 16, 2024
Error in Mac OS with the same training data that works in Unix usage , install , solved	7	511	November 18, 2018
Error running textcat.batch-train if text is empty string textcat , done	16	1696	November 20, 2017

Bus error: 10

Related topics