first steps - loading the correct dataset - troubleshooting

Hi everybody,
and nice to meet you, I am a new user a nd I am trying to take the first steps with prodigy and Ner.
I have been having a problem with the annotation interface. I run prodigy in a jupyter lab framework, I think I have installed everything correctly, but I can't figure out how to solve this.
I start with this

import os
import prodigy
!python -m prodigy stats -l

and I can see two databases, which is correct.

However, when I try and start a new annotation task with this

!python -m prodigy ner.manual GP_set blank:en ./test_data.jsonl --label LABEL_A

and I get to the prodigy tab, the annotation interface is set to a different dataset, which I loaded in a previous session (new_set instead of GP_set which I am trying to load, see pic below).

What am I doing wrong?

Sorry if this is very trivial, any help would be greatly appreciated.
Best, g.

Hi! Just to make sure I understand the question correctly: you mean the name of the dataset that the annotations are saved to, right? And that it says new_set in the sidebar, even though you started a command with GP_set?

From the command you posted, it looks like you're in a Jupyter Notebook, right? Maybe what happened is that the previous server you started wasn't terminated properly? So what you're seeing here in the app is still the first command/process running on port 8080, not the second one (although you should have seen an error in that case).

Try killing the process thatr runs on that port and re-start Prodigy. How you kill a process running on a port depends on your operating system, but you'll find lots of instructions on StackOverflow as it's a common thing people don't remember how to do (I have to google for the command every time :sweat_smile:).

1 Like

Hi Ines!

Thanks for chiming in. Yes, you understood the problem perfectly, I am indeed in a Jupyter notebook (in windows 10), and killing the process that runs on the prodigy port actually did the trick.

I am posting the command lines below for windows users who might need them:

!netstat -ano | findstr :<YOURPORT>

in my case, port number is 8095, the associated PID number for the process is 8392 (see below)

image

which you have to type in the following line to kill the corresponding task

!Taskkill /PID 8392 /F 

This worked fine, thanks!
I was wondering however, is there a "prodigy" way to do this? (you mentioned the problem might be due to the server not being terminated properly).

Also, I am still facing a minor bug (idk if a new thread is needed). Whenever I try to start a new annotation process, and I type for example:

 !python -m prodigy ner.manual a_new_dataset blank:en ./new_data.jsonl --label LABEL1,LABEL2

the jupyter interface always freezes, and I am forced to interrupt the kernel. Only when I type the command again, I get the message:

"Using 2 label(s): LABEL1, LABEL2 [*] Starting the web server at 
[http://localhost:8095](http://localhost:8095/) ... Open the app in your 
browser and start annotating!

and i can start annotaning.

Any ideas why this happens? Thanks in advance!

Thanks for the update, glad it worked! :slightly_smiling_face:

I think this mostly comes down to the way command line commands work in Jupyter environments. If you're running the prodigy command in a terminal, you'd typically press ctrl+c to exit the server/process it's running. The equivalent of this in a notebook is interrupting the kernel: python - Is there an equivalent to CTRL+C in IPython Notebook in Firefox to break cells that are running? - Stack Overflow