Using prodigy with patterns causes error: TypeError: 'tuple' object is not callable

Hi!

I am trying to follow this example:

This is how I call prodigy:

prodigy textcat.teach \
        toy_example \
        blank:en \
        ./test.jsonl \
        --label RELEVANT,IRRELEVANT \
        --patterns ./test-patterns.jsonl

I am getting the following error:

Using 2 label(s): RELEVANT, IRRELEVANT
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/git/eagle-train/venv/lib/python3.10/site-packages/prodigy/__main__.py", line 50, in <module>
    main()
  File "/home/ubuntu/git/eagle-train/venv/lib/python3.10/site-packages/prodigy/__main__.py", line 44, in main
    controller = run_recipe(run_args)
  File "cython_src/prodigy/cli.pyx", line 110, in prodigy.cli.run_recipe
  File "cython_src/prodigy/core.pyx", line 155, in prodigy.core.Controller.from_components
  File "cython_src/prodigy/core.pyx", line 307, in prodigy.core.Controller.__init__
  File "cython_src/prodigy/components/stream.pyx", line 189, in prodigy.components.stream.Stream.is_empty
  File "cython_src/prodigy/components/stream.pyx", line 204, in prodigy.components.stream.Stream.peek
  File "cython_src/prodigy/components/stream.pyx", line 317, in prodigy.components.stream.Stream._get_from_iterator
  File "cython_src/prodigy/components/sorters.pyx", line 129, in prodigy.components.sorters.ExpMovingAverage.__next__
  File "cython_src/prodigy/components/sorters.pyx", line 132, in __iter__
  File "cython_src/prodigy/components/sorters.pyx", line 14, in genexpr
  File "cython_src/prodigy/util.pyx", line 569, in predict
TypeError: 'tuple' object is not callable

Versions: prodigy-1.14.4 spacy-3.6.1

I appended the files that I am using

test-patterns.jsonl (40 Bytes)
test.jsonl (64 Bytes)

I am not sure what I am doing wrong here?

hi @reinoldus!

Thanks for your question and welcome back to the forum :slight_smile:

So there's a few problems with your code -- to be honest, I think this example likely needs to be updated so thanks for pointing this out.

I think the root of the problem is your patterns file, i.e., it's not the correct formatting. If you remove it, this will run, although not really correctly. I think we could have a better warning message on this, so thanks for the heads up. The docs did have this example pattern file that does work.

It's not correct because you're using a model-in-the-loop recipe (e.g., correct or teach) but you're have a blank model blank:en that doesn't have a trained spaCy component. Especially using textcat, you'd need to train a textcat component first with the same labels you're using, then you can use textcat.teach.

This is where I agree we may need to change the docs to reflect this. Sorry for the confusion.

To give you an example, if you try to change to ner.teach, use en_core_web_sm, and change the labels to correspond to the ner component in that pipeline:

python -m prodigy ner.teach toy_example en_core_web_sm ./test.jsonl --label PERSON,ORG

One other thing. I noticed in your test.jsonl file you seemed to put the label as a meta tag. Were you assuming that was previously labeled data? You can do that, but that's not consistent with how Prodigy (and spaCy) looks for labeled textcat labels. If you're just doing a binary classifier, then you need that label to be a string. The docs had this example. For example, let's assume this data for binary sentiment ("True" = positive, "False" = negative):

#sent-test.jsonl
{"text": "I'm happy.", "meta": {"title": "XXXXXXXX"}, "label": "True"}
{"text": "I'm really angry.", "meta": {"title": "XXXXXXXX"}, "label": "False"}
{"text": "I'm very very excited.", "meta": {"title": "XXXXXXXX"}, "label": "True"}

Load it as a dataset using db-in and then you can train:

$ python -m prodigy db-in sent-test sent-test.jsonl
✔ Created dataset 'sent-test' in database SQLite
✔ Imported 3 annotated examples and saved them to 'sent-test' (session
2023-10-19_14-24-07) in database SQLite
Found and keeping existing "answer" in 0 examples

$ python -m prodigy train --textcat sent-test      
ℹ Using CPU
ℹ To switch to GPU 0, use the option: --gpu-id 0

========================= Generating Prodigy config =========================
ℹ Auto-generating config with spaCy
✔ Generated training config

=========================== Initializing pipeline ===========================
[2023-10-19 14:24:11,670] [INFO] Set up nlp object from config
Components: textcat
Merging training and evaluation data for 1 components
  - [textcat] Training: 3 | Evaluation: 0 (20% split)
Training: 3 | Evaluation: 0
Labels: textcat (2)
[2023-10-19 14:24:11,679] [INFO] Pipeline: ['textcat']
[2023-10-19 14:24:11,680] [INFO] Created vocabulary
[2023-10-19 14:24:11,681] [INFO] Finished initializing nlp object
[2023-10-19 14:24:11,685] [INFO] Initialized pipeline components: ['textcat']
✔ Initialized pipeline

============================= Training pipeline =============================
Components: textcat
Merging training and evaluation data for 1 components
  - [textcat] Training: 3 | Evaluation: 0 (20% split)
Training: 3 | Evaluation: 0
Labels: textcat (2)
ℹ Pipeline: ['textcat']
ℹ Initial learn rate: 0.001
E    #       LOSS TEXTCAT  CATS_SCORE  SCORE 
---  ------  ------------  ----------  ------
  0       0          0.25        0.00    0.00
200     200         22.32        0.00    0.00
400     400          6.06        0.00    0.00
600     600          2.72        0.00    0.00
800     800          1.55        0.00    0.00
1000    1000          1.00        0.00    0.00
1200    1200          0.70        0.00    0.00
1400    1400          0.52        0.00    0.00
1600    1600          0.40        0.00    0.00

Hope this provides more clarity!

Hi,

thank you for your reply, but I don't get it.

First of all, I don't see how my pattern file is incorrect:
image

compared to the second line here:

It is exactly the same, except the keys are not in the same order, but there is no guaranteed order in JSON anyway.

Regarding the label in "meta": This is from my data and not relevant for the labeling, I have a guy who marks relevant articles and I re-classify the individual paragraphs, so this is where this is coming from.

I trained a model before with the following parameters:

prodigy train ./model_test --textcat toy_example -L

Now I am trying to run it:

prodigy textcat.teach toy_example \
        ./model_test/model-best \
        ./machine_learning/dataset/dataset.jsonl \
        --label RELEVANT,IRRELEVANT \
        --patterns ./patterns.jsonl

I am still getting the same error:

Using 2 label(s): RELEVANT, IRRELEVANT
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/git/eagle-train/venv/lib/python3.10/site-packages/prodigy/__main__.py", line 50, in <module>
    main()
  File "/home/ubuntu/git/eagle-train/venv/lib/python3.10/site-packages/prodigy/__main__.py", line 44, in main
    controller = run_recipe(run_args)
  File "cython_src/prodigy/cli.pyx", line 129, in prodigy.cli.run_recipe
  File "cython_src/prodigy/core.pyx", line 155, in prodigy.core.Controller.from_components
  File "cython_src/prodigy/core.pyx", line 307, in prodigy.core.Controller.__init__
  File "cython_src/prodigy/components/stream.pyx", line 189, in prodigy.components.stream.Stream.is_empty
  File "cython_src/prodigy/components/stream.pyx", line 204, in prodigy.components.stream.Stream.peek
  File "cython_src/prodigy/components/stream.pyx", line 317, in prodigy.components.stream.Stream._get_from_iterator
  File "cython_src/prodigy/components/sorters.pyx", line 129, in prodigy.components.sorters.ExpMovingAverage.__next__
  File "cython_src/prodigy/components/sorters.pyx", line 132, in __iter__
  File "cython_src/prodigy/components/sorters.pyx", line 14, in genexpr
  File "cython_src/prodigy/util.pyx", line 571, in predict
TypeError: 'tuple' object is not callable

I even changed the pattern file to the example one from here: https://github.com/explosion/prodigy-recipes/blob/master/example-patterns/patterns_insults-INSULT.jsonl

I just changed the label to the label that I am using, excerpt:

{"label":"RELEVANT","pattern":[{"lower":"wankers"}]}
{"label":"RELEVANT","pattern":[{"lower":"morons"}]}
{"label":"RELEVANT","pattern":[{"lower":"fool"}]}
{"label":"RELEVANT","pattern":[{"lower":"dumbfuck"}]}

Is there also a way to use prodigy programatically? Doing everything on the command line is really not my preferred DX

hi @reinoldus,

Thanks for the reply. Yes, I see now your point with the patterns. We'll look into it soon - it may have been caused when we updated our streams recently.

There is the prodigy.serve command that can serve and start Prodigy app from Python. Is this more in line of what you're looking for?

We have a much wider range of functionality for using CLI to start/stop/create and manage tasks/assets, but that's in our more advanced tool Prodigy Teams, that we're starting to run as a beta.

Hey, just to update on the issue with the PatternMatcher. It indeed was a bug on our end. We have just released a patch release (1.14.7) that includes a fix for this particular issue.