Injecting environment variable in Weasel run script

Dear SpaCy Team,

I am currently working on a project that involves using Prodigy with a SpaCy project (weasel), and I've encountered a challenge in setting up environment variables.

I am using the weasel run <command> to execute a Prodigy recipe with the PRODIGY_ALLOWED_SESSIONS variable.

My project.yml file is defined as follows:

...
env:
  PRODIGY_ALLOWED_SESSIONS: "user_1,user_2,user_3"
...

commands:
  - name: "man"
    help: "Start the Prodigy manual annotation recipe"
    script:
      - "echo PRODIGY_ALLOWED_SESSIONS=${env.PRODIGY_ALLOWED_SESSIONS}"
      - "python -m prodigy textcat.manual ${vars.database} assets/${vars.document_paths} --label ${vars.labels}"
...

When I execute the weasel run man command, here is my output:

==================================== man ====================================
Running command: echo PRODIGY_ALLOWED_SESSIONS=
PRODIGY_ALLOWED_SESSIONS=
Running command: /opt/homebrew/Caskroom/miniforge/base/envs/annotations/bin/python3.11 -m prodigy textcat.manual ...
Using 11 label(s): ...

✨ Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating!

As you can see, the PRODIGY_ALLOWED_SESSIONS environment variable is not being injected correctly.

I also tried using a vars variable, but that resulted in an error like:

Running command: PRODIGY_ALLOWED_SESSIONS=user_1,user_2,user_3 python -m prodigy textcat.manual ...'
Traceback (most recent call last):
...

FileNotFoundError: [E501] Can not execute command 'PRODIGY_ALLOWED_SESSIONS=user_1,user_2,user_3 python -m prodigy textcat.manual ... 
Do you have 'PRODIGY_ALLOWED_SESSIONS=user_1,user_2,user_3' installed?

How am I supposed to inject an environment variable like PRODIGY_ALLOWED_SESSIONS in a weasel project?

Thank you for your assistance.

Hi @TimothePearce,

The env section is to make it possible to refer directly to env vars in your command definitions. The values aren't stored in project.yml, it's just a mapping from env var names to variable names you can use within a project command.

What I've typically done is to use python-dotenv and usedotenv run --.

For example, I'd have a .env file (for example here using LLM keys) in my root folder and then could run:

  - name: "prodigy-ner-fewshot"
    help: "Prodigy ner few shot"
    script:
      - "dotenv run -- python -m prodigy ner.llm.fetch ${vars.config-fewshot} ${vars.input} ${vars.output-fewshot}"
      - "python -m prodigy db-in ${vars.dataset-fewshot} ${vars.output-fewshot}"

Also just a heads up - if you have specific weasel questions, you're likely better off to post those directly on the weasel GH issues pages. While we're still a small team, we have multiple libraries and each respective teammate typically answers questions per their library they're maintaining (e.g., the spaCy core team typically focuses on spaCy GH issues, not this forum, which is for Prodigy-specific problems).

Hi @ryanwesslen

Thank you very much for your response.

I followed your suggestion to use python-dotenv, and it works perfectly.

I appreciate your help and the heads-up about where to post specific questions. Next time, I'll make sure to direct my queries to the appropriate GitHub issues page for more targeted assistance.

Thanks again for your support!