Problems - Installation

Hi all.

I hate to do this, but I have stupid questions I have to ask because I am very beginner and I have not found answers to these simple questions.

Following the installation guide published in Prodigy library, I was able to install general requirements, and actually, I was able to read the file in which annotations are required:

# 3. Activar ambiente virtual (Anaconda Prompt).
python -m venv venv
Ivenv\ Scripts\ activate
# 4. Instalar Prodigy (Anaconda Prompt).
python -m pip install prodigy -f https://xxxx-xxxx-xxxx-xxxx@download.prodi.gy #Key prov
# 5. Instalar Spacy (Anaconda Prompt).
python -m spacy download en core web sm
# 6. Instalar Jupyter Lab (Anaconda Prompt).
pip install jupyterlab›=3.0.0
pip install jupyterlab-prodigy
# 7. Verificar instalación (Anaconda Prompt).
jupyter labextension list
# 8. Instalar node. js (Anaconda Prompt).
conda update -n base -c defaults conda
conda install -c conda-forge nodejs
# 9. Instalar JupyterLab-Prodigy (Anaconda Prompt).
jupyter labextension install jupyterlab-prodigy
#jupyter lab
# 8. Lanzamiento de plataforma con base lista para evaluación!
python -m prodigy ner.manual my set blank: en
"C: /Users/FERNANDO GUDIÑO/Documents/PreDoc/

And all seems to work correctly! I can save annotations just pressing the "Save" button in the upper part:

However, I cannot really understand how to save annotations in a .jsonl file.

Replacing "my_set" with some path for saving the dataset into a .jsonl file does not work at all, so I re-read documentation and I found that there is another command line I could use for saving this dataset:

The problem is, where could I run this line as the other line is running so there is no way to interrupt the process (close port 8080) and then save the dataset?. This process cannot be stopped for saving.

Trying Jupyer Lab is not working at all:

As probably the solution is very easy but I am lost, I apollogize for my terrible understanding. i am just trying to solve independently problems but I am stuck in this.

Cheers, Fernando,

hi @JFernando!

Thank you for your questions and welcome to the Prodigy community :wave:

Please don't worry at all about your questions! We're happy to have you join us and this is exactly what this forum is here for!

So your data is automatically being saved into a SQLite database. So it's really how to access or export that.

You're right -- you can't run db-out on the same terminal that is actively running Prodigy and serving annotations. However, you could also open a second concurrent terminal. Then you could run try to export (e.g., run db-out) or you can access your data through the Python component functions to connect directly to the database like:

from prodigy.components.db import connect
import srsly

db = connect()
examples = db.get_dataset("my_dataset")
print(examples[0]) # to see an example

srsly.write_jsonl("my_annotations.jsonl", examples)

However, more commonly, after you've run a session and annotated (and click the save button), you'd shut down your session by pressing CTRL + C. Then try db-out or using the Python DB component.

Two other points for future posts. Thanks for the details (we really appreciate it b/c it's clear you're learning) but please avoid posting images of code. Images of screenshots is great (thank you) but images of code isn't searchable and it's hard to copy/paste. This blog uses Markdown so you can add code chunks with ``` to show when the code begins/ends or use the code button.

Also, I needed to remove your image and replaced it with code b/c you accidentally posted your Prodigy license key :slight_smile:

Hope this helps and let us know if you make progress or have further questions!

1 Like

Dear @ryanwesslen.

Thanks a lot for your answer!

I have a question. After creation of localhost, I would like to create an URL such that other persons can enter and annotate but the dataset is saved into my folder. Is it possible? Cheers, Fernando.

hi @JFernando!

Are you running Prodigy on a local machine (e.g., a personal laptop/desktop)?

If so, the easiest approach may be to use a service like ngrok:

My colleague @koaning has a great tutorial on ngrok at calmcode.io (I personally highly recommend it for many Python libraries!).

Alternatively, you can set your localhost to 0.0.0.0 but this is usually better for servers where you also likely need to set up reverse proxies/https, login/authentication, and open up ports and manage firewalls:

But all of those requires a strong knowledge of networking and may be a lot of work if you're new to it.

Also, in the last post I attached, be sure to see the follow up comment on using named multi-user sessions too with ngrok. This can help you track multiple users' sessions.

Hope this helps!