but when Prodigy tool launches it loads 9,265 documents. I have tried a few things from creating new .json files to adding flags such as --total-num-tasks 154 and still getting the issue. Can someone help me with this?
Can you provide more details about your documents.jsonl file?
Can you run this script on it?
import json
def count_text_keys(jsonl_file_path):
count = 0
with open(jsonl_file_path, 'r') as file:
for line in file:
try:
data = json.loads(line)
if 'text' in data:
count += 1
except json.JSONDecodeError:
print(f"Skipping invalid JSON: {line}")
return count
# Replace with the actual path to your JSONL file
file_path = 'documents.jsonl'
result = count_text_keys(file_path)
print(f'Total dictionaries with key "text" in {file_path}: {result}')
I'm curious - where are you getting 9,265 from? Can you provide where you see this?
Where do you see the flag --total-num-tasks? That's not a built-in flag for ner.manual nor any other recipes.
If you're looking to modify your source (input) file, you should apply some filters on the front end with a Python script (e.g., remove certain examples).
Can you also provide your prodigy version (run prodigy stats) too and any modifications to your prodigy.json file?