Trained model location path

Hello everyone. I have some experience using spaCy, however we have been recently provided of Prodigy as one of our development tools. As an exploratory exercise, I was trying to train a NER model (inside a virtual environment as a good practice) using the corresponding command. More specifically, I ran the following command:

prodigy train ./test_model --ner station1_job1

The model manages to successfully train. If my understanding is correct, I should find all the model related files, at the ./test_model directory; however (and a bit embarrased here TBH), I cannot find that directory anywhere (!)

What am I missing here? How to access the files related to the model? How to "export" a model so it becomes a folder (in the same fashion than spaCy)?

Thank you.

Hi @dave-espinosa , that's a bit unexpected :confused: Ideally you'd see the test_model folder with both model-best and model-last directories inside.

Can you try running

prodigy train ./test_model --ner <dataset> --verbose

And copy-paste the whole log here? You can try it out with a small sample first so that it trains fast.

Under the hood, the whole prodigy train process also runs spacy train, so you should be able to achieve similar results if you went that route

Hello @ljvmiranda921 ,

I think I forgot to mention a couple of (important) details, which might be of interest:

  1. Prodigy is currently a "tool under test" in my team, and my manager purchased it, storing it in a Compute Engine VM; therefore, I don't have "owner permissions" over it.
  2. For the reason explaned above, I am not "directly" running the command I mentioned in my original post, but "emulating the super user". For the command you suggested for instance, what I am running looks as follows:
# 'super_user' was changed, due to privacy issues
sudo runuser -l super_user -c 'python3 -m prodigy train ./test_model --ner station1_job1 --verbose'

After running the command mentioned above, the following results are obtained:

========================= Generating Prodigy config =========================
ℹ Auto-generating config with spaCy
✔ Generated training config

=========================== Initializing pipeline ===========================
[2022-05-05 17:25:17,007] [INFO] Set up nlp object from config
Components: ner
Merging training and evaluation data for 1 components
  - [ner] Training: 386 | Evaluation: 96 (20% split)
Training: 386 | Evaluation: 96
Labels: ner (4)
[2022-05-05 17:25:17,846] [INFO] Pipeline: ['tok2vec', 'ner']
[2022-05-05 17:25:17,850] [INFO] Created vocabulary
[2022-05-05 17:25:17,851] [INFO] Finished initializing nlp object
[2022-05-05 17:25:21,020] [DEBUG] [W033] Training a new parser or NER using a model with no lexeme normalization table. This may degrade the performance of the model to some degree. If this is intentional or the language you're using doesn't have a normalization table, please ignore this warning. If this is surprising, make sure you have the spacy-lookups-data package installed and load the table in your config. The languages with lexeme normalization tables are currently: cs, da, de, el, en, id, lb, mk, pt, ru, sr, ta, th

Load the table in your config with:

@misc = "spacy.LookupsDataLoader.v1"
lang = ${nlp.lang}
tables = ["lexeme_norm"]

[2022-05-05 17:25:22,502] [INFO] Initialized pipeline components: ['tok2vec', 'ner']
✔ Initialized pipeline

============================= Training pipeline =============================
Components: ner
Merging training and evaluation data for 1 components
  - [ner] Training: 386 | Evaluation: 96 (20% split)
Training: 386 | Evaluation: 96
Labels: ner (4)
[2022-05-05 17:25:23,261] [DEBUG] Removed existing output directory: test_model/model-best
[2022-05-05 17:25:23,263] [DEBUG] Removed existing output directory: test_model/model-last
ℹ Pipeline: ['tok2vec', 'ner']
ℹ Initial learn rate: 0.001
---  ------  ------------  --------  ------  ------  ------  ------
  0       0          0.00    247.44    0.98    0.66    1.93    0.01
  0     200        440.69  10213.80   21.18   42.38   14.12    0.21
  1     400        312.32   6208.34   35.14   49.91   27.12    0.35
  1     600        528.59   5607.65   46.36   50.12   43.12    0.46
  2     800        392.14   5033.03   48.00   55.46   42.31    0.48
  2    1000        424.79   4397.42   49.85   54.59   45.86    0.50
  3    1200        481.08   4604.98   48.25   65.65   38.14    0.48
  3    1400        568.77   3875.40   51.24   58.38   45.66    0.51
  4    1600        553.32   3732.43   51.73   60.56   45.15    0.52
  4    1800        700.81   3609.54   50.12   62.47   41.85    0.50
  5    2000        756.99   3692.40   52.24   58.37   47.28    0.52
  5    2200        731.88   3032.61   50.73   62.77   42.56    0.51
  6    2400        764.19   3166.53   52.29   63.10   44.64    0.52
  6    2600        791.88   2730.50   53.47   64.20   45.81    0.53
  7    2800        858.60   2708.93   50.59   61.91   42.76    0.51
  7    3000       1051.33   2844.19   51.69   55.40   48.45    0.52
  8    3200        950.03   2255.55   51.07   54.84   47.79    0.51
  8    3400       1075.91   2678.61   51.75   54.20   49.52    0.52
  9    3600       1201.84   2239.80   52.82   59.03   47.79    0.53
  9    3800       1182.50   2467.74   52.21   56.54   48.50    0.52
 10    4000       1168.55   2098.34   53.64   59.54   48.81    0.54
 10    4200       1338.16   2207.73   52.91   54.46   51.45    0.53
 11    4400        953.04   1652.57   52.86   58.22   48.40    0.53
 11    4600       1355.55   2106.60   50.37   52.88   48.10    0.50
 12    4800       1311.10   1786.16   51.93   56.44   48.10    0.52
 12    5000       1292.80   1807.32   53.13   56.31   50.28    0.53
 13    5200       1550.46   1641.75   52.17   55.32   49.37    0.52
 14    5400       1445.59   1778.92   53.29   58.75   48.76    0.53
 14    5600       1358.91   1425.84   52.04   56.48   48.25    0.52
✔ Saved pipeline to output directory

However, I still could not find such test_model directory. I have just figured out that maybe than model was landing in the "super user" home directory... which ended up being true. With this in mind, I slightly changed the command to:

# 'super_user' was changed, due to privacy issues
sudo runuser -l super_user -c 'python3 -m prodigy train ./my_home_directory/test_model --ner station1_job1 --verbose'

...thinking this way, I was gonna obtain my "goal result":

|-- my_home_directory
|   |--test_model
|      |--model_best
|      |--model_last
|--super_user  # don't care what happens here; don't have permissions to work here

Funnily enough, and after temporarily request access to super_user, what I ended up having, was:

|-- my_home_directory  # No model here!
   |--my_home_directory  # Prodigy being 'funny' XD

Needless to say, what I want to achieve my "goal result": exporting models to a folder destination "outside Prodigy owner's 'home folder'".

Is that possible?

Hi @dave-espinosa , is it because when you're using sudo runuser -l, the home directory actually changes? You can try inspecting the current directory you're on with ls just to be sure.

Hello @ljvmiranda921 ,

Thanks for your heads-up, I was aware of that; however I was wondering if Prodigy can output a model to a directory distinct than "Prodigy owner's"?

Best regards.

Hi, if you want to ensure that the model will output on the directory you want, perhaps it may be better to specify the absolute path. Under the hood, Prodigy resolves the directories using Python's pathlib module, so it shouldn't deviate much from that.