Incorrect terms.to-patterns example in web documentation

The terms.to-patterns documentation gives the following example command line

prodigy terms.to-patterns programming_langs --label PROG_LANG --output-file /tmp/patterns.jsonl

However, this does not work.

$ prodigy terms.to-patterns programming_langs --label PROG_LANG --output-file patterns.jsonl
usage: prodigy terms.to-patterns [-h] [-l None] [dataset] [output_file]
prodigy terms.to-patterns: error: unrecognized arguments: --output-file patterns.jsonl

(Where programming_langs is a text file containing the names of programming languages, one per line.)

I actually haven’t been able to figure how to convert a list of words to a list of patterns using Prodigy recipes. What’s an example of how to do that?

Thanks, will fix this! The problem here is that the output_file argument is positional and not an option, so the correct usage would be:

prodigy terms.to-patterns programming_langs /tmp/patterns.jsonl --label PROG_LANG

You can still omit the argument, though, which will print the individual patterns on the command line, so you can pipe them forward or use less to view them:

prodigy terms.to-patterns programming_langs --label PROG_LANG | less

programming_langs (or the first argument for that matter) should be the name of a dataset containing the terms. This is because the recipe is originally intended to be used with terminology lists created by terms.teach. If you already have a text file, you'll need to add it to a dataset first (which is easy, because db-in supports the same loaders as the other streaming recipes):

prodigy dataset programming_langs "List of programming languages"
prodigy db-in programming_langs prog_langs.txt
prodigy terms.to-patterns programming_langs /tmp/patterns.jsonl --label PROG_LANG

The example using db-in works for me. Thanks.

1 Like

Hej Ines, I’ve been trying to follow your example but I’m faced with the following error.

‘utf-8’ codec can’t decode byte 0xf0 in position 3404: invalid continuation byte

The file I’m tring to load is a textfile with a Danish name in each line.

This usually indicates that the encoding of the file isn't valid utf-8 (unicode). Could you try explicitly chaning the encoding to utf-8? For example from the command line using a tool like iconv or in your text editor (e.g. in Visual Studio Code: Change encoding > UTF-8).

Hi Ines, Thanks for the reply. Like you said, I just had to open the text editor and save it as UTF-8.

1 Like