Buggy behavior on Ubuntu

We recently installed Prodigy on an Ubuntu Server 18.04 VM. We have been running on Windows without issue but wanted to be able to leverage some of the command line tools which only have coloring on Linux.

However, we have lost almost all of our work that was done on this instance of Prodigy. 1 or 2 documents were successfully saved, but in the other sessions although it claimed to save successfully, when we relaunched Prodigy the total count was back to the original number (plus 1 or 2). I would have thought it might be an issue with the location of our DB file, but the fact that several documents were successfully saved seems to rule that out. Nothing was logged in the command window. I am not sure what to look at next, please advise.

Also, a second question which might be a pip/Python issue rather than a Prodigy issue, but we are seeing some pathological behavior with the dependencies. "sudo pip3 list" is showing that srsly is installed for me, but the exact same command for another developer does not include it, and so she is unable to start Prodigy on that server at all. We verified that it's using the same pip with "sudo which pip3." Is there anything we can look at to see why this would be the case? We haven't had any issues like this on Windows.

Hi! This is definitely strange – there's no difference in what Prodigy does on Windows vs. Linux, and considering the weird Python environment problems you're seeing, it's possible that there's something deeper going on with your environment, user accounts or permissions.

Just to make sure I understand your setup correctly: you have several users logging into the same machine under the same account? Or under different user accounts? And then you're annotating in different sessions / datasets?

There's really no logical explanation I can think of why pip would report different installed dependencies, unless it's actually running under a different user account or using different home directories or something like that. (Also, I'm not fully sure what sudo does in combination with pip – but I remember hearing that it was potentially problematic? See here for instance. So you probably want to avoid that, just to prevent any side-effects, problems with permissions etc.)

Did you see the "annotations saved" message in the UI and in the terminal when you stopped the session?

We do use several accounts to log into the same machine in order to start Prodigy and do dev tasks. We are using a relatively basic setup of Prodigy to run ner.manual, and all the annotations were on the same dataset (no named sessions). I agree that there could be something subtle about the system that has gone wrong, however I am at a loss as to what that could be, as we only recently created this VM and it has nothing on it except Prodigy and a very small API server.

I wasn't aware of the issues with sudo pip, thanks for pointing that out. It doesn't seem at first glance like that would result in the bizarre mismatch we are seeing, since it seems to be mostly a security concern to avoid sudo pip, but I will look into setting up a venv instead of using sudo to maintain a consistent environment.

We did see the expected "annotations saved" messages in the UI, but unfortunately the machine with the SSH session running Prodigy crashed, so it's not surprising that we lost some work. But, our process is to save after every document is annotated, so it should have only been 2 or 3 documents lost at maximum, as we only had 2 or 3 people annotating at the time. Instead we lost at least 20 documents. I am assuming that if someone clicks the save button in the UI, that work would be saved immediately and not wait until Prodigy is shut down?

I'm glad to hear it wasn't so much work that was lost. I'm also a bit confused about what might have happened, as normally sqlite is pretty robust about saving data correctly even if the machine crashes.

I wonder whether sudo might be causing you some subtle problems? If you install with sudo, the files and directories will all be owned by the root account. I don't immediately see what problems this would cause, but I guess it could be something? There are also a couple of things to think about if you run Prodigy under sudo: that would change where Prodigy would look for a home directory I think. You could also have problems if you run Prodigy under sudo, and it creates a database owned by the root user.

You might want to consider creating a prodigy user on the machine, and giving the various people access to that account. This would also let people share tmux sessions, which is helpful with server processes like Prodigy's.

Hi Matthew, thanks for the reply. We are now using a venv and running as a prodigy user now, which seems to have solved the weirdness. To do that I aliased prodigy to 'sudo -u prodigyuser path/to/venv/python -m prodigy', but that is a bit clumsy. I guess the venv isn't necessary if we run pip install --user.

I will definitely take your suggestion to use a shared login so multiple people can get to the running process. That would also make the environment a lot simpler to manage.

1 Like

Glad you got it working! I'll mark this as solved, since it really sounds like there was something about the environment and user accounts that caused the problem.

Btw, if you're using the default SQLite database and haven't done it already, search for .prodigy/prodigy.db across the machine and see if you can find more than one database file. If so, this could mean that some user at some point actually created and wrote to a different database (for instance, in the home directory of their user instead of the one you were using etc.).