What does the "id" field in the "tokens" mean?

Zainpann · August 1, 2018, 7:07am

{
    "text":"Tabdeli Yhi Log Laskengy",
    "_input_hash":-915958241,
    "_task_hash":694961601,
    "tokens":[
        {"text":"Tabdeli","start":0,"end":7,"id":0},
        {"text":"Yhi","start":8,"end":11,"id":1},
        {"text":"Log","start":12,"end":15,"id":2},
        {"text":"Laskengy","start":16,"end":24,"id":3}
    ],
    "answer":"accept"
}

above is the output of my annotated file, Can you please tell me what “id” means there?

ines · August 1, 2018, 9:09am

Sure! The "id" field on the token is the token ID or index. Those are assigned automatically by Prodigy, and allow mapping annotated spans in the text back to their token positions. The first token will receive the ID 0, the second token the ID 1, and so on.

It’s also identical to spaCy’s Token.i property, for example:

doc = nlp(u"Tabdeli Yhi Log Laskengy")
print([token.i for token in doc])
# [0, 1, 2, 3]

Zainpann · August 1, 2018, 11:56am

Just wanted to know a couple of things…

What does “_input_hash”:-915958241, and “_task_hash”:694961601, mean and how are these values generated ?

And also can we add some more sentences in the output json file generated

ines · August 1, 2018, 1:54pm

The input and task hashes are unique IDs that help Prodigy identify annotations that apply to the same input text. My comment on this thread explains this in more detail. You can also find more information on the hashing functions in your PRODIGY_README.html, available for download with the Prodigy package.

I'm not 100% sure I understand the question correctly. The output data you're exporting with prodigy db-out contains all annotations stored in the database that have been labelled in the web app. So if you load in more texts, annotate them and then save them to a dataset, they should be included in the exported data.

If you haven't seen it already, you might also want to check out the "First steps" guide. It explains the most important terms and concepts, and shows a simple Prodigy workflow:

Topic		Replies	Views
Annotation JSON ner , spancat	7	955	September 28, 2022
Difference between Input hash and task hash database	1	1909	July 22, 2020
/get_questions in classification task not returning id? api , solved	3	559	August 23, 2019
Logic behind hash keys (in relation to REVIEW API)	4	12	October 16, 2024
Api vs view_id key name yield different results usage , ner , front-end , solved	4	869	June 7, 2018

What does the "id" field in the "tokens" mean?

Related topics