Getting date-time of an annotated document

Is there a way to extract the exact moment a specific document has been annotated?

Currently in the output Jsonl file I'm getting only _task_hash and _input_hash and I couldn't find a way of extracting date-time from those fields

Hi! By default, Prodigy doesn't store the timestamps on the individual examples – but it does create separate session datasets for each annotation session, using the timestamp of when the session was started. You can see an overview of all timestamped sessions by running prodigy stats -ls and the timestamp name of the session dataset is also printed when you exit the server.

If you do want to store more fine-grained timestamps, you can do that with a custom stream that adds an entry to all outgoing examples:

def add_timestamps_to_stream(stream):
    for eg in stream:
        eg["timestamp"] = str(datetime.datetime.now())
        yield eg

This will reflect the time the example was generated, though – depending on the batch size and the annotation behaviour, this might not exactly reflect when the example was actually annotated. (Like, in theory, an annotator could open the app and request a batch of examples to annotate, do nothing for 2 hours and then start and submit the answers.) If you want to add timestamps on the client side, you could probably do this, too, by using custom JavaScript and calling window.prodigy.update({ timestamp: new Date().toJSON() }).

3 Likes

Hello @ines , hope you are doing fine.

In my case, my team have requested me something slightly similar to the original post, for which I decided to leave my question here. See, in our case, we have several data anotators, each with their own storage table. However, we want to get a report of elapsed work of all of them BY DATE (i.e., "I wanna know what the annotators team did on April 1st, 2022"). As you have stated before, "Prodigy doesn't store the timestamps on the individual examples – but it does create separate session datasets for each annotation session, using the timestamp of when the session was started", and "You can see an overview of all timestamped sessions by running prodigy stats -ls and the timestamp name of the session dataset is also printed when you exit the server" . I have done so, but what I get is something which goes as follows:

========================= Sessions ==========================
2022-02-17_01-26-06, 2022-02-17_01-25-31, 2022-02-17_01-27-39

From this result, I have some questions:

  1. The format seems to be "YYYY-MM-DD_hh-mm-ss ", is that correct? If so, is that GMT? Where can I set "my local time"?
  2. What does each "session time" mean? In other words, when a "new session" appears in this report?
  3. How could I retrieve the information requested before? In other words, how can I know which user did what, for each session? Is that even feasible?

I'm sorry if they are "too many questions", please let me know if I need to split them in different cases.

Thank you, and have a nice day!

Since this thread was started, we actually included the _timestamp property by default in all examples in v1.11: https://prodi.gy/docs/changelog#v1.11.0 So you should be able to get the _timestamp of each individual example, and also the _session_id to map the examples to the respective session.

The timestamp is generated in JavaScript as follows: Math.floor(new Date().getTime() / 1000).