No _timestamp field

I've annotated some projects using Prodigy v1.17. When I run prodigy progress dataset_name I get the following:

✔ Loaded 3441 annotations from 1 datasets

=================================== Legend ===================================

New      New annotations collected in interval
Total    Total annotations collected   
Unique   Unique examples (not counting multiple annotations of same example)

============================ Annotation Progress ============================

            New   Unique   Total   Unique
--------   ----   ------   -----   ------
May 2022   3441     2379    3441     2379

⚠ No "_timestamp" found in 3441 annotations from datasets:
soft_skills_classification. Maybe the data was created with Prodigy v1.10 or
lower? Using dataset creation time as a fallback.

I've had this issue with textcat and spancat projects. Is there a flag I need to set to enable the timestamps?

hi @rolisz!

Thanks for the heads up. This is interesting. It should happen by default since v1.11 (August 2021).

Can you show an example of your data?

from prodigy.components.db import connect

db = connect()
all_dataset_names = db.datasets
examples = db.get_dataset("dataset_name")

You can mask/change any fields. I'm more interested in seeing the _session_id or other fields.

Have you noticed any other patterns with this problem? For example, you mentioned textcat and spancat -- does it work sometimes but not others?

The fact that it's happening is indicative of a bug that our testing is missing. Worst case, I'll log this and queue it up. We greatly appreciate you bringing this to our attention!

I'm running the following command:

prodigy textcat.manual nl_seniority nl_ads_classif.jsonl   --label "Entry Level (0-8 months),Junior (8-27 months),Medior (21-54 months),Senior (42-84 months),Mature (72-180 months),Seasoned (168+ months)"

The first example from the database is:

{'text': 'CRM Medewerker\nVoor A.S. ....',
 'meta': {'id': 18180},
 '_input_hash': -860025884,
 '_task_hash': -287355978,
 'options': [{'id': 'Entry Level (0-8 months)',
   'text': 'Entry Level (0-8 months)'},
  {'id': 'Junior (8-27 months)', 'text': 'Junior (8-27 months)'},
  {'id': 'Medior (21-54 months)', 'text': 'Medior (21-54 months)'},
  {'id': 'Senior (42-84 months)', 'text': 'Senior (42-84 months)'},
  {'id': 'Mature (72-180 months)', 'text': 'Mature (72-180 months)'},
  {'id': 'Seasoned (168+ months)', 'text': 'Seasoned (168+ months)'}],
 '_view_id': 'choice',
 'config': {'choice_style': 'multiple'},
 'accept': ['Junior (8-27 months)', 'Medior (21-54 months)'],
 'answer': 'accept'}

Another example is:

prodigy spans.manual de_ad_components_split blank:de de_job_vacancies_split.jsonl --label "Soft skills,Hard skills,Activities,Benefits,Company attributes,Company Location,Education,Industry,Licenses and Courses,Location,Salary,Header"

And the first example is:

{'text': 'Mitarbeiter/in Montage\n....t!',
 'tokens': [{'text': 'Mitarbeiter',
   'start': 0,
   'end': 11,
   'id': 0,
   'ws': False},
  {'text': '/', 'start': 11, 'end': 12, 'id': 1, 'ws': False},
  {'text': 'in', 'start': 12, 'end': 14, 'id': 2, 'ws': True},
  {'text': '!', 'start': 1068, 'end': 1069, 'id': 158, 'ws': False}],
 'spans': [{'start': 73,
   'end': 114,
   'token_start': 15,
   'token_end': 19,
   'label': 'Company attributes'},
  {'start': 958,
   'end': 979,
   'token_start': 143,
   'token_end': 144,
   'label': 'Benefits'}],
 '_input_hash': -1458903816,
 '_task_hash': 1297739242,
 '_view_id': 'spans_manual',
 'answer': 'accept'}

I've looked at multiple projects: there are cases where 6 out of 1000 examples have timestamp, but the other 990+ don't have it.

1 Like