How to auto accept answers based on meta data?

hi @lazerlightning,

Thanks for your message and sorry for the delay -- most of the team was offsite last week and catching up.

I'm a bit curious to learn more about your use case.

At first glance, maybe I'm missing something -- if you have some unlabeled data that you want to "auto accept" based on metadata, why don't you simply remove those records from your source (input) so you're only labeling those records that you want to annotate?

For these examples, where do the spans come from? Are you simply just accepting and not expecting any spans?

For example, if you were to label in ner.manual the sentence: "Uber’s Lesson: Silicon Valley’s Start-Up Machine Needs Fixing" where the only span is Uber as an ORG, then the data would be:

{
  "text": "Uber’s Lesson: Silicon Valley’s Start-Up Machine Needs Fixing",
  "meta": {
    "source": "The New York Times"
  },
  "_input_hash": 1886699658,
  "_task_hash": -1952856502,
  "_is_binary": false,
  "tokens": [
    {
      "text": "Uber",
      "start": 0,
      "end": 4,
      "id": 0,
      "ws": false
    },
    {
      "text": "’s",
      "start": 4,
      "end": 6,
      "id": 1,
      "ws": true
    },
    {
      "text": "Lesson",
      "start": 7,
      "end": 13,
      "id": 2,
      "ws": false
    },
    {
      "text": ":",
      "start": 13,
      "end": 14,
      "id": 3,
      "ws": true
    },
    {
      "text": "Silicon",
      "start": 15,
      "end": 22,
      "id": 4,
      "ws": true
    },
    {
      "text": "Valley",
      "start": 23,
      "end": 29,
      "id": 5,
      "ws": false
    },
    {
      "text": "’s",
      "start": 29,
      "end": 31,
      "id": 6,
      "ws": true
    },
    {
      "text": "Start",
      "start": 32,
      "end": 37,
      "id": 7,
      "ws": false
    },
    {
      "text": "-",
      "start": 37,
      "end": 38,
      "id": 8,
      "ws": false
    },
    {
      "text": "Up",
      "start": 38,
      "end": 40,
      "id": 9,
      "ws": true
    },
    {
      "text": "Machine",
      "start": 41,
      "end": 48,
      "id": 10,
      "ws": true
    },
    {
      "text": "Needs",
      "start": 49,
      "end": 54,
      "id": 11,
      "ws": true
    },
    {
      "text": "Fixing",
      "start": 55,
      "end": 61,
      "id": 12,
      "ws": false
    }
  ],
  "_view_id": "ner_manual",
  "spans": [
    {
      "start": 0,
      "end": 4,
      "token_start": 0,
      "token_end": 0,
      "label": "ORG"
    }
  ],
  "answer": "accept",
  "_timestamp": 1690324597,
  "_annotator_id": "ner_dataset-ryan",
  "_session_id": "ner_dataset-ryan"
}

Notice how it includes the "answer": "accept" but also includes lots of other data like the tokens (see add_tokens) as well as the hashes, timestamp, annotator_id, and session_id.

A few other questions so maybe it's just terminology:

What do you mean by "the correct answers"? Is an answer a correct annotation span?

Can you clarify this a bit more?

Thanks for your help in clarifying!