Hello @ljvmiranda921 ,
I think I forgot to mention a couple of (important) details, which might be of interest:
-
Prodigy
is currently a "tool under test" in my team, and my manager purchased it, storing it in a Compute Engine VM; therefore, I don't have "owner permissions" over it.
- For the reason explaned above, I am not "directly" running the command I mentioned in my original post, but "emulating the super user". For the command you suggested for instance, what I am running looks as follows:
# 'super_user' was changed, due to privacy issues
sudo runuser -l super_user -c 'python3 -m prodigy train ./test_model --ner station1_job1 --verbose'
After running the command mentioned above, the following results are obtained:
========================= Generating Prodigy config =========================
ℹ Auto-generating config with spaCy
✔ Generated training config
=========================== Initializing pipeline ===========================
[2022-05-05 17:25:17,007] [INFO] Set up nlp object from config
Components: ner
Merging training and evaluation data for 1 components
- [ner] Training: 386 | Evaluation: 96 (20% split)
Training: 386 | Evaluation: 96
Labels: ner (4)
- [ner] SOFTSKILL, SOFTWARE, SPECTOOL, HARDSKILL
[2022-05-05 17:25:17,846] [INFO] Pipeline: ['tok2vec', 'ner']
[2022-05-05 17:25:17,850] [INFO] Created vocabulary
[2022-05-05 17:25:17,851] [INFO] Finished initializing nlp object
[2022-05-05 17:25:21,020] [DEBUG] [W033] Training a new parser or NER using a model with no lexeme normalization table. This may degrade the performance of the model to some degree. If this is intentional or the language you're using doesn't have a normalization table, please ignore this warning. If this is surprising, make sure you have the spacy-lookups-data package installed and load the table in your config. The languages with lexeme normalization tables are currently: cs, da, de, el, en, id, lb, mk, pt, ru, sr, ta, th
Load the table in your config with:
[initialize.lookups]
@misc = "spacy.LookupsDataLoader.v1"
lang = ${nlp.lang}
tables = ["lexeme_norm"]
[2022-05-05 17:25:22,502] [INFO] Initialized pipeline components: ['tok2vec', 'ner']
✔ Initialized pipeline
============================= Training pipeline =============================
Components: ner
Merging training and evaluation data for 1 components
- [ner] Training: 386 | Evaluation: 96 (20% split)
Training: 386 | Evaluation: 96
Labels: ner (4)
- [ner] SOFTSKILL, SOFTWARE, SPECTOOL, HARDSKILL
[2022-05-05 17:25:23,261] [DEBUG] Removed existing output directory: test_model/model-best
[2022-05-05 17:25:23,263] [DEBUG] Removed existing output directory: test_model/model-last
ℹ Pipeline: ['tok2vec', 'ner']
ℹ Initial learn rate: 0.001
E # LOSS TOK2VEC LOSS NER ENTS_F ENTS_P ENTS_R SCORE
--- ------ ------------ -------- ------ ------ ------ ------
0 0 0.00 247.44 0.98 0.66 1.93 0.01
0 200 440.69 10213.80 21.18 42.38 14.12 0.21
1 400 312.32 6208.34 35.14 49.91 27.12 0.35
1 600 528.59 5607.65 46.36 50.12 43.12 0.46
2 800 392.14 5033.03 48.00 55.46 42.31 0.48
2 1000 424.79 4397.42 49.85 54.59 45.86 0.50
3 1200 481.08 4604.98 48.25 65.65 38.14 0.48
3 1400 568.77 3875.40 51.24 58.38 45.66 0.51
4 1600 553.32 3732.43 51.73 60.56 45.15 0.52
4 1800 700.81 3609.54 50.12 62.47 41.85 0.50
5 2000 756.99 3692.40 52.24 58.37 47.28 0.52
5 2200 731.88 3032.61 50.73 62.77 42.56 0.51
6 2400 764.19 3166.53 52.29 63.10 44.64 0.52
6 2600 791.88 2730.50 53.47 64.20 45.81 0.53
7 2800 858.60 2708.93 50.59 61.91 42.76 0.51
7 3000 1051.33 2844.19 51.69 55.40 48.45 0.52
8 3200 950.03 2255.55 51.07 54.84 47.79 0.51
8 3400 1075.91 2678.61 51.75 54.20 49.52 0.52
9 3600 1201.84 2239.80 52.82 59.03 47.79 0.53
9 3800 1182.50 2467.74 52.21 56.54 48.50 0.52
10 4000 1168.55 2098.34 53.64 59.54 48.81 0.54
10 4200 1338.16 2207.73 52.91 54.46 51.45 0.53
11 4400 953.04 1652.57 52.86 58.22 48.40 0.53
11 4600 1355.55 2106.60 50.37 52.88 48.10 0.50
12 4800 1311.10 1786.16 51.93 56.44 48.10 0.52
12 5000 1292.80 1807.32 53.13 56.31 50.28 0.53
13 5200 1550.46 1641.75 52.17 55.32 49.37 0.52
14 5400 1445.59 1778.92 53.29 58.75 48.76 0.53
14 5600 1358.91 1425.84 52.04 56.48 48.25 0.52
✔ Saved pipeline to output directory
test_model/model-last
However, I still could not find such test_model
directory. I have just figured out that maybe than model was landing in the "super user" home directory... which ended up being true. With this in mind, I slightly changed the command to:
# 'super_user' was changed, due to privacy issues
sudo runuser -l super_user -c 'python3 -m prodigy train ./my_home_directory/test_model --ner station1_job1 --verbose'
...thinking this way, I was gonna obtain my "goal result":
|-- my_home_directory
| |--test_model
| |--model_best
| |--model_last
|--super_user # don't care what happens here; don't have permissions to work here
Funnily enough, and after temporarily request access to super_user
, what I ended up having, was:
|-- my_home_directory # No model here!
|--super_user
|--my_home_directory # Prodigy being 'funny' XD
|--test_model
|--model_best
|--model_last
Needless to say, what I want to achieve my "goal result": exporting models to a folder destination "outside Prodigy owner's 'home folder'".
Is that possible?