Hi all, I’m wondering how to export the output of ner.print-stream
to jsonl or prodigy dataset.
I would like to use the output as validation dataset of ner.batch-train
.
Thanks in advance
C.
Hi all, I’m wondering how to export the output of ner.print-stream
to jsonl or prodigy dataset.
I would like to use the output as validation dataset of ner.batch-train
.
Thanks in advance
C.
By default, the ner.print-stream
recipe adds formatting to output the entities in the stream nicely and coloured – but if you have a look at the recipe, you’ll see that it’s actually pretty straightforward. If you just want it to output the unformatted JSONL, you could remove the following line:
printers.pretty_print_ner(stream)
…and replace it with:
for eg in stream:
print(json.dumps(eg))
You can then run the recipe in the same way you did before and pipe it to less
, or like this to save it to a file:
ner.print-stream [... your arguments] > your_file.jsonl
Thanks a lot @ines ! It is exactly what I was looking for.
Just a last question: I would like to assign --answer accept
to all examples in the exported jsonl. I can do it with db-in
command, but there is any other way to get the same via code (I don’t find the db-in
code to get through).
Thanks again for your support!
Sure! The examples in the stream are regular Python dictionaries, so when you iterate over the examples, you can do:
eg['answer'] = 'accept'
(The db-in
code is in __main__.py
btw!)