No Save button on the UI

Hi,

a) I'm not seeing a Save button on the UI. Below are the stats from my Prodigy:

C:\RTFilingsAI\train\ticker>python -m prodigy stats -l

============================== :sparkles: Prodigy Stats ==============================

Version 1.14.5
Location C:\Program Files\Python\Lib\site-packages\prodigy
Prodigy Home C:\Users\Ronny.prodigy
Platform Windows-10-10.0.22621-SP0
Python Version 3.11.6
Spacy Version 3.6.1
Database Name SQLite
Database Id sqlite
Total Datasets 2
Total Sessions 5

================================ :sparkles: Datasets ================================

ner_news_headlines, ner_ticker

C:\RTFilingsAI\train\ticker>

b) When I run the command:
python -m prodigy db-out ner_ticker > out.jsonl

nothing is written to the file

Also, though it says I've made 6% progress, I cannot go back to the previous session's annotations.

Can you please shed some light on this? I do not want to continue annotating if I can't see the output from Prodigy.

Thanks.

Hi there.

Could you share a screenshot of the interface that you see? I'm assuming you're referring to the save icon that should appear in the top left corner?

That button should appear once you've annotated one example. But you seem to suggest that it does not appear after hitting the green accept button?

hi @ronnysh,

Yes, see about @koaning's instructions.

Just curious - what browser are you using? If you're not already, can you try Chrome or Firefox?

If you're still not seeing the save button, can you right click, choose "Inspect" and see if you find any browser errors?

Can you run:

python -m prodigy print-dataset ner_ticker

Do you see any annotations? I'm worried that you may not have saved actually annotations into your dataset.

Per the docs:

As you annotate, batches of examples will be cleared from your outbox and sent back to the Prodigy server to be saved in the database. You can also hit the save button in the top-left corner or press COMMAND+S (on Mac) or CTRL+S to clear your outbox and manually save.

You can also view the logs to confirm whether annotations are actually saved to your database. Please provide logs when you can when debugging issues; this helps us diagnose problems a lot faster.

To learn more about how progress is measured, check out this part of the docs, especially the "What is a source" box. Unfortunately, progress can sometimes be unintuitive.

This offers a tractable way to estimate progress, but it has a few consequences that can be unintuitive in multiuser scenarios.

  • The Source-based progress tracks the progress through the input file rather than completed annotations. This means that even when starting a single-annotator flow, there will already be a non-zero progress as some of the file will have been consumed in the process of session initialization.
  • It is possible for one annotator to reach the end of their queue before the rest does. When this happens, the Source will have a position at the end of the file. So the Source-based progress bar will show 100% to every annotator.
  • The Source is unaware of the task router and the current sessions. Depending on the Prodigy configuration it could be possible for new annotators to join at given time. That also means that a person could join and immediately see a significant progress when they arrive at the annotation interface.

To prevent these scenarios, it may be more convenient to configure to set a target via total_examples_target in your prodigy.json file or to use a custom progress callback.

Last you may also want to familiar yourself with the Database components. These allow you to extract data directly from your database.

I ran print-dataset ner_ticker and got the below error:

C:\RTFilingsAI\train\ticker>python -m prodigy print-dataset ner_ticker
✘ Can't load 'ner_ticker' from database SQLite

even though stats returns:

=============================== :sparkles: Datasets ================================

ner_news_headlines, ner_ticker

You're right, it was appearing after hitting the green button - I don't think it's in issue as it is probably auto-saving. My concern is that db-out is not returning any data (out.jsonl is empty):

python -m prodigy db-out ner_ticker > out.jsonl

I ran print-dataset ner_ticker and got the below error:

C:\RTFilingsAI\train\ticker>python -m prodigy print-dataset ner_ticker
✘ Can't load 'ner_ticker' from database SQLite

even though stats returns:

=============================== :sparkles: Datasets ================================

ner_news_headlines, ner_ticker

I tried the below database code but the examples variable returned an empty list:

from prodigy.components.db import connect

db = connect()
all_dataset_names = db.datasets
examples = db.get_dataset_examples("ner_ticker")

Can you tell me how to enable the logs so I can capture debugging information?

You can turn on logs by running:

PRODIGY_LOGGING=basic python -m prodigy ...

It's one of the many environment variables detailed here:

Could you share the command you use to annotate data? I just want to rule out that this is a copy/paste issue or that there's a typo.

The app starts out fine and I'm able to see the UI:

C:\RTFilingsAI\train\ticker>python -m prodigy ner.manual ner_ticker blank:en ./ticker_text.jsonl --label ACQUIREE,ACQUIREE_EXCHANGE,ACQUIREE_TICKER,ACQUIROR,ACQUIROR_EXCHANGE,ACQUIROR_TICKER
Using 6 label(s): ACQUIREE, ACQUIREE_EXCHANGE, ACQUIREE_TICKER, ACQUIROR,
ACQUIROR_EXCHANGE, ACQUIROR_TICKER

:sparkles: Starting the web server at http://localhost:8080 ...
Open the app in your browser and start annotating!

Get an error when enabling logging:

C:\RTFilingsAI\train\ticker>PRODIGY_LOGGING=basic python -m prodigy ner.manual ner_ticker blank:en ./ticker_text.jsonl --label ACQUIREE,ACQUIREE_EXCHANGE,ACQUIREE_TICKER,ACQUIROR,ACQUIROR_EXCHANGE,ACQUIROR_TICKER
'PRODIGY_LOGGING' is not recognized as an internal or external command,
operable program or batch file.

Ah! My bad. I didn't notice that you were using Windows. I think that environment variables need to be declared differently on that platform. I found this resource that might prove useful.

Maybe something like this?

$Env:PRODIGY_LOGGING = "basic"

I'm not a Windows user myself so I'm not 100% sure how Windows prefers to receive this. I've also found this resource other resource ...

This suggests that you might be able to set and use the variable by first running a set command and then run the Prodigy command after.

set PRODIGY_LOGGING='basic' 
python -m prodigy ... 

Got this on the console - do you want me to check the logs (if so, please give me the location):

C:\RTFilingsAI\train\ticker>python -m prodigy ner.manual ner_ticker blank:en ./ticker_text.jsonl --label ACQUIREE,ACQUIREE_EXCHANGE,ACQUIREE_TICKER,ACQUIROR,ACQUIROR_EXCHANGE,ACQUIROR_TICKER
12:20:09: RECIPE: Loaded model blank:en
Using 6 label(s): ACQUIREE, ACQUIREE_EXCHANGE, ACQUIREE_TICKER, ACQUIROR,
ACQUIROR_EXCHANGE, ACQUIROR_TICKER
12:20:09: RECIPE: Calling recipe 'ner.manual'
12:20:09: RECIPE: Starting recipe ner.manual
12:20:09: RECIPE: Annotating with 6 labels
12:20:09: get_stream: Loading .jsonl file
12:20:09: get_stream: Rehashing stream
12:20:09: get_stream: Removing duplicates
12:20:09: VALIDATE: Validating components returned by recipe
12:20:09: CONTROLLER: Initialising from recipe
12:20:09: CONTROLLER: Recipe Config
12:20:09: VALIDATE: Creating validator for view ID 'ner_manual'
12:20:09: CONTROLLER: Using no_overlap router.
12:20:09: VALIDATE: Validating Prodigy and recipe config
12:20:09: FILTER: Filtering duplicates from stream
12:20:09: FILTER: Filtering out empty examples for key 'text'
12:20:09: PREPROCESS: Tokenizing examples (running tokenizer only)
12:20:09: DB: Creating unstructured dataset '2023-11-01_12-20-09'
12:20:09: CORS: initialized with wildcard "*" CORS origins

Well, playing around with the different options I discovered that Prodigy will write out the annotations to the database when I use ner.openai.correct (see below). I would still like to know why ner.manual is not writing to the database - it would be nice to have that recipe working as we would still like to use it.

python -m prodigy ner.openai.correct ner_ticker ./ticker_text.jsonl --label ACQUIREE,ACQUIREE_EXCHANGE,ACQUIREE_TICKER,ACQUIROR,ACQUIROR_EXCHANGE,ACQUIROR_TICKER

hi @ronnysh,

Can you record a video showing (1) start a new Prodigy session (2) annotate examples and accept them (either click the box or press A and (3) either save them by clicking the Save button or complete a batch (e.g., accept more than 10 examples)?

For example, this is what you should see in your logs.

2023-11-01 17.48.02