Identifier for the text

Thank you for the quick reply for the previous concern I had on how to identify unlabeled text and what is the best way [Identify Unlabelled text].

We were also thinking if we can attach an identifier/(meta data tags) with each text and if those identifier were made available in the sq-lite dump it would be useful. Presently we had tried the same but in the dump we are not getting the metadata tags.

Work around we are following is that we also add meta data with actual text in the text field and as these text field also comes in the dump, we can debug from there.

Instead of this meta data prefix work around is there any better way to add text identifier and make it available in the resultant labeled dump that would be of great help.

Thank You

I hope I understand your question correctly – but you should be able to attach any custom data to the task dicts you stream in and Prodigy should just pass it through. For example, if the data you load in looks like this:

{"text": "some text", "foo": "bar", "baz": 1}

... the annotated example that gets saved to the database could look like this:

{"text": "some text", "foo": "bar", "baz": 1, "answer": "accept"}

I really apologies for this question. As you said it does pass the metadata tags through and stores in sql-lite dump.

I did test it once but I lost the meta data tag passed through in the middle of several tokens, spans, and other keys.

Apologies once again and Thank You for the quick reply, appreciate it.

1 Like