Integrating Tensorflow's Object Detection API with Prodigy

Hello all! Many of you have been asking about a recipe for image.teach and I am really excited to share my work of integrating Tensorflow's Object Detection API with Prodigy which, I did during this summer in collab with @honnibal and @ines. You can find the source code in prodigy-recipes repo. The support is still experimental and feedbacks are welcome! Basically, the point of this post is to act as a guide for the recipe.

NOTE:

Since we cannot control how the Tensorflow's Object Detection API changes in the future. I have a fork of the repository which ensures that this recipe works even if some breaking changes happen in the future. The fork can be found here

A high-level introduction:

Before getting into the semantics of the recipe, let's first understand how the recipe works at a high level. A simple working example will be given in the next section.

To run this recipe you will need Tensorflow Serving.

prodigy image.trainmodel -F $PATH_TO/image_train.py --help to see the arguments for the recipe

Running this recipe will create the following 3 folders if not already present:

  • An export directory where the models used by Tensorflow Serving will be saved. Specified by export_dir argument.
  • A model directory where trained model checkpoints and Tensorboard events are stored. Specified by model_dir argument.
  • A data directory where the TF-Records for training are stored. Specified by data_dir argument.

Recipe Flow:

The general flow of the recipe is as follows:

  1. Create the object detection model as given in the pipeline.config and convert is as a custom Tensorflow Estimator
  2. Check if model directory has a checkpoint (if resuming annotations) else, do a dummy training for 1 step. The dummy one step training is required because, the Tensorflow Estimator does not allow SavedModel creation without having a checkpoint in model dir
  3. Save the model as SavedModel in the export directory.
  4. Start Tensorflow Serving and point it to export directory so that it can load updated models automatically for predictions
  5. Perform assisted annotations in prodigy with predictions coming from Tensorflow Serving.
  6. Use the annotations to train the model in the loop and optionally run evaluation, save the trained model as a model.ckpt in the model directory and SavedModel in export directory.
  7. Run the garbage collector.
  8. Tensorflow Serving automatically picks up the recent model present in the export directory and downs the previous model.
  9. Repeat 4 and 5 until satisfied.

In a nutshell, predictions happen in Tensorflow Serving and the training happens parallely inside Prodigy. This structure ensures that, predictions can run parallely in a different hardware resource (CPU/GPU) and training and evaluation can run in another hardware resource(GPU/CPU). GPU for training and evaluation is highly recommended!

Configuring the recipe:

This section explains how the pipeline.config and other arguments work in coherence for this recipe. This assumes that you have some prior knowledge on how to setup the pipeline.config for Tensorflow's Object Detection API.

  • While starting this recipe first time for a new project, make sure to provide a seed TF Record containing one training example in train_input_reader config in the pipeline.config. This is required to do a dummy training for 1 step and save the model as SavedModel in the export directory. The dummy 1 step training is required because, the Tensorflow Estimator does not allow SavedModel creation without having a checkpoint in model directory. This TF Record can be created from a CSV file using the provided create_tfrecord.py script.
train_input_reader {
  label_map_path: "PATH_TO_BE_CONFIGURED/label_map.pbtxt"
  tf_record_input_reader {
    input_path: "PATH_TO_BE_CONFIGURED/train.record"
  }
}

However, if you are resuming annotations, you can skip the above, iff your model directory already has checkpoints from the previous runs.

  • If you want to run the evaluation also in parallel(set by run_eval flag argument) you need to provide the eval_input_reader config in the pipeline.config.
eval_input_reader {
  label_map_path: "PATH_TO_BE_CONFIGURED/label_map.pbtxt"
  tf_record_input_reader {
    input_path: "PATH_TO_BE_CONFIGURED/validation.record"
  }
}

N number of samples are sampled from this validation.record (set by eval_steps argument) and evaluation is run on these examples. Supports all the evaluation protocols supported by the Object Detection API

Logging

  • Set Prodigy logging level to basic to view detailed logs from this recipe.
  • Optionally set Tensorflow Logging to 10/20 if you want to see detailed Tensorflow logs. This is set by tf_logging_level argument

Notes and Recommendations

  • Object detection algorithms are extremely resource hungry! So, make sure that you run this recipe with Tensorflow GPU. However, you can choose to run Tensorflow Serving in CPU without much loss in performance.
  • Point TensorBoard to model directory to view the training progress. The TensorBoard is really well populated. Especially with evaluation enabled.
  • The recipe also supports all of the data augmentations provided by the Object Detection API out of the box. This can be enabled in the pipeline_config. This is especially useful if you are setting the steps_per_epoch argument to be more than the number of annotated examples.
  • A custom garbage collector ensures that only recent N files/folders are stored in the export and data directory. This is specified by temp_files_num argument. The number of recent model checkpoints stored in model directory is governed by max_checkpoints_num argument.
  • It is recommended to provide the label_map_path in the pipeline.config rather than passing it as an argument to the recipe

A simple working example.

Let us try teaching an object detector to detect Raccoons. This toy dataset can be found in this github repo. This repo already has the images and annotations(which we don't need for training but needed for evaluation) stored in TF-records and also the image files. I will be running the training/evaluation on GPU(a conda environment containing tensorflow-gpu version 1.12.0) and the predictions using Tensorflow-Serving(a docker container) running on CPU. Additionally, I am using a manually compiled image of Tensorflow Serving so as to use AVX2 instructions. This optimized image can be downloaded by running docker pull abhijit2592/tensorflow-serving-devel. If your CPU does not support AVX2 instructions, you also use the official image docker pull tensorflow/serving:latest-devel.

First, let's setup the following directory structure:

raccoon_detection
├── export_dir
│   ├── serve_models.conf
│   ├── serve_models.sh
├── labelmap.pbtxt
├── pipeline.config
├── run_tensorflow_serving.sh
└── run_train.sh

Note: I am manually creating an export_dir because it's easier to setup Tensorflow-Serving with docker this way.

Contents of serve_models.sh

#!/bin/bash
nohup tensorflow_model_server \
--port=8500 \
--model_config_file=/tensorflow_servables/serve_models.conf >/tensorflow_servables/serving.log

Contents of serve_models.conf

model_config_list: {
    config: {
    name: "faster_rcnn_raccoon",
    base_path: "/tensorflow_servables",
    model_platform: "tensorflow"
    }
}

Contents of labelmap.pbtxt

item {
  id: 1
  name: 'raccoon'
}

Contents of pipeline.config

model {
  faster_rcnn {
    num_classes: 1
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 600
        max_dimension: 1024
      }
    }
    feature_extractor {
      type: 'faster_rcnn_inception_v2'
      first_stage_features_stride: 16
    }
    first_stage_anchor_generator {
      grid_anchor_generator {
        scales: [0.25, 0.5, 1.0, 2.0]
        aspect_ratios: [0.5, 1.0, 2.0]
        height_stride: 16
        width_stride: 16
      }
    }
    first_stage_box_predictor_conv_hyperparams {
      op: CONV
      regularizer {
        l2_regularizer {
          weight: 0.0
        }
      }
      initializer {
        truncated_normal_initializer {
          stddev: 0.01
        }
      }
    }
    first_stage_nms_score_threshold: 0.0
    first_stage_nms_iou_threshold: 0.7
    first_stage_max_proposals: 300
    first_stage_localization_loss_weight: 2.0
    first_stage_objectness_loss_weight: 1.0
    initial_crop_size: 14
    maxpool_kernel_size: 2
    maxpool_stride: 2
    second_stage_box_predictor {
      mask_rcnn_box_predictor {
        use_dropout: false
        dropout_keep_probability: 1.0
        fc_hyperparams {
          op: FC
          regularizer {
            l2_regularizer {
              weight: 0.0
            }
          }
          initializer {
            variance_scaling_initializer {
              factor: 1.0
              uniform: true
              mode: FAN_AVG
            }
          }
        }
      }
    }
    second_stage_post_processing {
      batch_non_max_suppression {
        score_threshold: 0.0
        iou_threshold: 0.6
        max_detections_per_class: 100
        max_total_detections: 300
      }
      score_converter: SOFTMAX
    }
    second_stage_localization_loss_weight: 2.0
    second_stage_classification_loss_weight: 1.0
  }
}

train_config: {
   data_augmentation_options {
    random_horizontal_flip {
	}
    random_vertical_flip {
	}
  }
  batch_size: 1
  optimizer {
    momentum_optimizer: {
      learning_rate: {
        manual_step_learning_rate {
          initial_learning_rate: 0.0002
          schedule {
            step: 900000
            learning_rate: .00002
          }
          schedule {
            step: 1200000
            learning_rate: .000002
          }
        }
      }
      momentum_optimizer_value: 0.9
    }
    use_moving_average: false
  }
  gradient_clipping_by_norm: 10.0
  fine_tune_checkpoint: "PATH TO COCO TRAINED CHECKPOINT/model.ckpt"
  from_detection_checkpoint: true
}

train_input_reader: {
  tf_record_input_reader {
    input_path: "PATH TO/test.record"
  }
  label_map_path: "PATH TO/labelmap.pbtxt"
}

eval_config: {
  num_examples: 10
}

eval_input_reader: {
  tf_record_input_reader {
    input_path: "PATH TO/test.record"
  }
  label_map_path: "PATH TO/labelmap.pbtxt"
  shuffle: false
  num_readers: 1
}

NOTE: Here we setup 2 types of data augmentations namely: random_horizontal_flip and random_vertical_flip. You can also setup other augmentations. An exhaustive list of augmentations can be found here

Contents of run_tensorflow_serving.sh

#!/bin/bash
docker run -d \
--name raccoon_tfodapi_container \
-v $PATH TO/export_dir:/tensorflow_servables \
-p 8500:8500 \
abhijit2592/tensorflow-serving-devel:latest \
bash /tensorflow_servables/serve_models.sh

Contents of run_train.sh

#!/bin/bash
PRODIGY_LOGGING=basic prodigy image.trainmodel \
-F $PATH TO/image_train.py \
odapi_train \
$PATH TO/raccoon_dataset/images \
$PATH TO/pipeline.config \
localhost \
8500 \
faster_rcnn_raccoon \
--threshold 0.9 \
--run-eval \
--eval-steps 10 \
--steps-per-epoch 100 \
--tf-logging-level 40

NOTE: Here we are setting --steps-per-epoch to 100. This ensures that even if we annotate only 20 images, the API will sample 100 augmented images from the annotated 20 images. Thus we can synthetically increase the training set on the fly.

Starting the recipe:

Now we have all the required scripts.

  • First, run bash run_tensorflow_serving.sh from terminal. This should create a log file named serving.log inside the export_dir and start the Tensorflow Serving. You can use this log file to track how subsequent models are loaded and unloaded. Initially the logfile will show No versions of servable faster_rcnn_raccoon found under base path /tensorflow_servables. This is expected because we don't have any servables inside the export_dir. The servables will be created when we run the run_train.sh script.
  • Now we are ready to start Prodigy. Run bash run_train.sh from terminal. Make sure you are using tensorflow-gpu while running this script because this part is extremely resource-intensive.
  • Now you can open your browser and start annotating. As soon as you press the save button you can see that the training starts as specified in the pipeline.config file. You can see live log reports in the terminal where you ran the run_train.sh command.
  • After annotating a few samples (say 35-40 images), you should start seeing good predictions from the model.
  • You can also track the training in Tensorboard by running tensorboard --logdir=$path_to/model_dir

That's all folks! We have integrated Tensorflow's Object Detection API with Prodigy.

Sample results on how training progresses


0





PS

  • We also have a few other recipes which you might find useful. All of these can be found in prodigy-recipes.
  • We also have a Speed vs Accuracy tradeoff study for few models from Tensorflow model zoo here
  • Also, a few miscellaneous scripts which you might find useful
4 Likes

Hi,

I tried to use prodigy to annotate images. Then I used prodigy dataset to create tfrecord files everything seems fine but when I use the tain_model.py from tensorflow api to train a model for a new class I will get low accuracy around 20% with high loss and negative recall.
Example:
I have about 30 images of different canes, I starts prodigy session to annotate canes then I get the annotation from prodigy dataset, the labelmap and tfrecored would be created using some part of this tutorial. The new train_model.py from tf api used to train a model from SSD model checkpoint but as I said the accuracy is around 20% even I add more data and increase number of steps. Can you help me to understand what would be my problem?

Thanks,
Fara

@fara1 Hi Thanks for checking out this code. First of all 30 images is too small a dataset to get any reliable results. It is true that the code uses augmentations to blow up the training set artificially. But still 30 images is really small to get any significant results. Maybe try increasing the training set size? Also try transfer learning from the model zoo? Hope this helps

great! are there similar integration for pytorch?

I am new the TF Object Detection API. Do I need to standardize (i.e. resize them to same shape) all the images before I annotate them? Or can the object detection models handle the resizing of the images and annotated bounding boxes?

Adding a link to question above answered in a different thread.