Confusion with loading a raw dataset for texcat.teach, all answers marked as "accept" ?

trevorwelch · November 21, 2019, 9:30pm

I'm planning to run a binary textcat.teach on a corpus of raw texts, and I'm a bit confused by the process.

$ prodigy dataset social-texts

  ✨  Successfully added 'social-texts' to database SQLite.

$ prodigy db-in social-texts ./data/social_text_data_1.jsonl

  ✨  Imported 10000 annotations for 'social-texts' to database SQLite
  Added 'accept' answer to 10000 annotations
  Session ID: 2019-11-21_13-14-54

These are raw texts that don't have any annotations yet, where one line of social_text_data_1.jsonl is like:

{"text": "i can't believe the service on American Airlines! It's so terrible @aa #badflights"}

I'm confused as to this message upon loading the dataset: Added 'accept' answer to 10000 annotations

Is there a different way to load a corpus of raw texts for annotation that doesn't assume the examples are all 'Accept'?

ines · November 21, 2019, 9:52pm

Hi! I think the solution might be a lot simpler Prodigy doesn't require you to upload any data before you start annotating – so you can pass your social_text_data_1.jsonl to the textcat.teach recipe as the source argument and it'll load the data from a file.

The datasets in the database only store the collected annotations. So the db-in command to import data is mostly intended to load in already annotated examples. That's also why it adds the answer by default.

trevorwelch · November 21, 2019, 10:01pm

Thank you! That makes sense.

EDIT: Moving my follow-up question to another thread.

Topic		Replies	Views
Automating the annotation for textcat.teach base on score usage , textcat	4	1048	October 25, 2017
Load dataset from recipe usage , database , solved	6	1710	October 15, 2018
Loading records via db-in aren't accepted database , solved	3	962	March 16, 2018
error while loading pre-annotated jsonl file usage , textcat , solved	9	538	March 29, 2023
Loading a dataset from the DB instead of from disk/api? usage , solved	4	1964	March 6, 2018

Confusion with loading a raw dataset for texcat.teach, all answers marked as "accept" ?

Related topics