Export ner.print-stream output

Hi all, I’m wondering how to export the output of ner.print-streamto jsonl or prodigy dataset.
I would like to use the output as validation dataset of ner.batch-train.

Thanks in advance

C.

By default, the ner.print-stream recipe adds formatting to output the entities in the stream nicely and coloured – but if you have a look at the recipe, you’ll see that it’s actually pretty straightforward. If you just want it to output the unformatted JSONL, you could remove the following line:

printers.pretty_print_ner(stream)

…and replace it with:

for eg in stream:
    print(json.dumps(eg))

You can then run the recipe in the same way you did before and pipe it to less, or like this to save it to a file:

ner.print-stream [... your arguments] > your_file.jsonl
1 Like

Thanks a lot @ines ! It is exactly what I was looking for.

Just a last question: I would like to assign --answer accept to all examples in the exported jsonl. I can do it with db-in command, but there is any other way to get the same via code (I don’t find the db-in code to get through).

Thanks again for your support!

Sure! The examples in the stream are regular Python dictionaries, so when you iterate over the examples, you can do:

eg['answer'] = 'accept'

(The db-in code is in __main__.py btw!)