Does Prodigy allow loading all files from a filepath

bhanu · March 9, 2018, 11:55am

Hey

I have various txt files in a folder, is it possible for me to load the all the txt files from the folder into my prodigy annotation dataset for entity annotations ?

ines · March 9, 2018, 12:26pm

Out-of-the-box, Prodigy currently supports loading in data from single files of various types – for text, that’s .jsonl, .json, .txt and .csv. You can specify the loader via the --loader argument on the command line. If no loader is set, Prodigy will use the file extension to pick the respective loader.

prodigy ner.teach your_dataset en_core_web_sm /path/to/data.txt

So if you have multiple .txt files and want to use them all, the easiest way would be to combine them into one file. Alternatively, you can also always write your own loader script.

If no source argument (file path etc.) is set on the command line, it will default to sys.stdin. This lets you pipe data forward from a different process, like a custom script. For example:

python load_data.py | prodigy ner.teach your_dataset en_core_web_sm

All your custom loader script needs to do is load the data somehow, create annotation tasks in Prodigy’s format (a dictionary with a "text" key) and print the dumped JSON. For example:

# load_data.py
from pathlib import Path
import json

data_path = Path('/path/to/directory')
for file_path in data_path.iterdir():  # iterate over directory
    lines = Path(file_path).open('r', encoding='utf8')  # open file
    for line in lines:
       task = {'text': line}  # create one task for each line of text
       print(json.dumps(task))  # dump and print the JSON

This approach works for any file format and data type – for example, you could also load in data from a different database or via an API. If you can load your data in Python, you can use it with Prodigy

There’s currently also an open feature request for allowing paths to directories instead. If that’s something you’re interested in having Prodigy support out-of-the-box, you can vote for it on that thread.

bhanu · March 9, 2018, 12:29pm

Thanks a lot for the help.

Topic		Replies	Views
Feature request: directories/archives of text files as a source format enhancement	4	1124	March 29, 2018
Loading Multiple Files for ner.teach ner , custom , solved	4	1515	February 1, 2018
Using Loaders usage , solved	8	3654	November 12, 2018
Loading message prodigy UI usage , solved	7	907	September 12, 2019
OSError: Can't find file path: train docs , usage , solved	8	1840	July 17, 2019

Does Prodigy allow loading all files from a filepath

Related topics