Make Good Choices:

We want to detect obstacles, that's why we called it obstacle detector.

Candidate Framework:

Tensorflow wins because of its large community and is available on Windows. It is a big deal because one teammmate has his RTX 2080Ti on a Windows system.

Candidate Model:

SSD has more tensorflow implementation than YOLO and pretrained weights as well. However, I will pick YOLO v3 over SSD.

We used SSD because it is supported by Tensorflow Object Detection API and its fastness. It is a big deal because the API really shortens the development time. We used an SSD with MobileNet V2 as backbone (from the Model Zoo).

Therefore, the trial-and-error process is following:

  1. (DONE) I will work with SSD on Tensorflow on Windows first and see if it works.
    1. It works pretty well... Not the best result, but still manages.
  2. (PASS) Then, I will work with modifying model's first layer to consume disparity map as the fourth channel with models in 1.
  3. (CONSIDERABLE) Then, I will work with image segmentation with Mask-RCNN in Tensorflow Object Detection API with pretrained weights to classify each pixel as road or non-road. Idea from
  4. (PASS) Then, I will work with modifying model's first layer to consume disparity map as the fourth channel with models in 3.
  5. (PASS) Then, I will work on back-up plan.


Tensorflow Object Detection API:

Model Zoo:

Data labeling software:

Step by Step Tensorflow Object Detection API Tutorial (step by step my ass):

Image Segmentation Idea Source:




Postponed Indefinitely:

Guides Coming Up:

Install Tensorflow Object Detection API:

The procedure is simple. Follow the guide.

However, since I am installing on Windows, a few tweeks shall take place.

  1. Dependencies:
  2. COCO API installation
  3. Protobuf Compilation
  4. Add Libraries to PYTHONPATH

Download Data and Prepare Data:

In General:


What I did was to edit the scripts from dataturks to pascal voc and from pascal voc to tensorflow record.

I thought about combining both scripts into one. However, the script tensorflow provides uses tensorflow as intermediate between console and actual script. Therefore, using two scripts does not seems to be too much of a big deal.


Data Preparation General Pipeline:

  1. Upload data onto and label each image.
  2. Download the json file that has the url of each uploaded image and its annotation.
  3. Manually write label_map.pbtxt. It matches different labels to a number. This is simple.
  4. Use, to read in the json file for a list of all images and randomly choose train and validation data. The result will be stored in train.txt and val.txt, which will be needed for converting Pascal VOC to TFRecord. The train.txt and val.txt has one image name per line, with " 1" following the name. I have not yet discovered what that 1 or -1 is for in PASCAL VOC format, but this number is not used in This script will also create folders for storing images and annotations.
  5. Use to download the data. The script is written for Linux, thus, it parses paths with "/". However, I am using Windows. Therefore, there are a few places that needs to change "/" to "". Otherwise, the image will download in a wierd file structure.
  6. Use to convert Pascal VOC format to TFRecord. What this script is different from original one is that there are fields that dataturks does not support. These fields are removed from conversion. The script is also altered to fit our file structure.
  7. Done. You should have train.record and val.record.

The json file downloaded from dataturks will be called "Obstacle Detection Dataset.json"

Step 1: Create Label Map:

item { id: 1 name: 'Obstacle' } item { id: 2 name: 'Pothole' } ......etc......

There is no id 0. id of 0 is a placeholder. Therefore, id starts from 1.

Step 2: Make Data Folder Structure for Conversions and Randomly Select Train and Val Data:


Directories are defined in file.

It will create a directory:

+formatted data +annotated +imgs

It will also create two files: train.txt and val.txt.

Step 3: Convert From Dataturks to Pascal VOC (Also Downloads Images):

python "Obstacle Detection Dataset.json" "formatted data\imgs" "formatted data\annotated"

Modified from

Beaware of difference between Linux and Windows path representation. Linux uses '/' while Windows uses ''.

Step 4: Convert From Pascal VOC to TFRecord:

python --data_dir="formatted data" --annotations_dir="formatted data\annotated" --output_path=pascal.record --label_map_path=label_map.pbtxt --set=train

set parameter can be "train" or "val", which it will read train.txt or val.txt to create training set or val set.

Modified from models-master\research\object_detection\dataset_tools\

Beaware of difference between Linux and Windows path representation. Linux uses '/' while Windows uses ''.

Fine-tune SSD:

We have data processed. We need to train SSD to fit out data.

Guide is here:

Step 1: Create Training Folder Structure:

The folder structure is like this:

+data -label_map file -train TFRecord file (train.record) -eval TFRecord file (val.record) +models + model -pipeline config file (pipeline.config from pretrained model) -checkpoint files ( model.ckpt.index model.ckpt.meta) +train +eval

Step 2: Modify pipeline.config:

There is one line in SSD's pipeline.config (a line about batch_norm) that will not pass. Simply delete that line.

Modify the paths to something like this:

model: fine_tune_checkpoint: "C:/Users/zhuka/iCloudDrive/Desktop/Obstacle Detection/train/models/model/ train: label_map_path: "C:/Users/zhuka/iCloudDrive/Desktop/Obstacle Detection/train/data/label_map.pbtxt" input_path: "C:/Users/zhuka/iCloudDrive/Desktop/Obstacle Detection/train/data/train.record" val: label_map_path: "C:/Users/zhuka/iCloudDrive/Desktop/Obstacle Detection/train/data/label_map.pbtxt" input_path: "C:/Users/zhuka/iCloudDrive/Desktop/Obstacle Detection/train/data/val.record"

Absolute path seems necessary. Change them accordingly.

Beaware of difference between Linux and Windows path representation. Linux uses '/' while Windows uses ''.

Step 3: Training:

PIPELINE_CONFIG_PATH={path to pipeline config file} MODEL_DIR={path to model directory} NUM_TRAIN_STEPS=50000 SAMPLE_1_OF_N_EVAL_EXAMPLES=1 python --pipeline_config_path=${PIPELINE_CONFIG_PATH} --model_dir=${MODEL_DIR} --num_train_steps=${NUM_TRAIN_STEPS} --sample_1_of_n_eval_examples=$SAMPLE_1_OF_N_EVAL_EXAMPLES --alsologtostderr located in tensorflow/models/research/object_detection directory

Change these variables to suites your needs.

Step 3.5: Saving:

By default, the model saves at most 5 checkpoints. To find the optimal checkpoint, we need to save the new checkpoint somewhere else. This process is not user-friendly since it is manual.

Therefore, I changed saving checkpoints to 100 by modifying following code:

In object detection/, we modify line ~490, which creates a saver.

saver = tf.train.Saver( sharded=True, keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours, save_relative_paths=True, max_to_keep=10000)

This change suffice our needs, but to keep everything consistent, we shall change line ~470 in the same file.

We hardcode this value because it is simple to do. If we want to not to hardcode it in our code, we need to put it in pipeline.config, edit pipeline.proto, recompile proto, and then it may work.

I have also modified line 62 in to:

config = tf.estimator.RunConfig(model_dir=FLAGS.model_dir, keep_checkpoint_max=10000)

Setting this only would not work. I am not sure if not setting it and only sets the saver would work or not.

Step 4: Open TensorBoard:

tensorboard --logdir=${MODEL_DIR}

Export Model to TFLite:

Step 1: Convert from checkpoints to tflite.pb:

python \ --pipeline_config_path=pipeline.config \ --trained_checkpoint_prefix=model.ckpt-16812\ --output_directory=output_tflite \ --add_postprocessing_op=true

trained_checkpoint_prefix should matches three files: .data-00000-of-00001 .index .meta.

Using this script is to make sure that all operations from tflite.pb will be supported in Tensorflow Lite.

Step 2a: Convert from tflite.pb to .tflite files (Non Quantized Model):

tflite_convert --output_file=ssd.tflite --graph_def_file=output_tflite\tflite_graph.pb --input_arrays=normalized_input_image_tensor --output_arrays=TFLite_Detection_PostProcess,TFLite_Detection_PostProcess:1,TFLite_Detection_PostProcess:2,TFLite_Detection_PostProcess:3 --input_shapes=1,300,300,3 --allow_custom_op

input_shape, input_arrays, and output_arrays are described in file header.

output_arrays is TFLite_Detection_PostProcess,TFLite_Detection_PostProcess:1,TFLite_Detection_PostProcess:2,TFLite_Detection_PostProcess:3 because TFLite_Detection_PostProcess tensor has 4 outputs. TFLite_Detection_PostProcess alone will only includes the first output. It is similar to pointer array (CSE 30~~~, thanks, Rick).

TFLite_Detection_PostProcess is a custom operation, thus adding allow_custom_op is necessary because tflite_convert thought native TFLite will not support it. However, it is actually implemented. Therefore, we need to bypass this check. There is no other custom operation in the network.

input_shape's width and height is defined in pipeline.config:

fixed_shape_resizer { height: 300 width: 300 }

Original Guide is here:

Step 2b: Convert from tflite.pb to .tflite files (Quantized Model):

tflite_convert --output_file=model_quantized.tflite --graph_def_file=output_tflite\tflite_graph.pb --input_arrays=normalized_input_image_tensor --output_arrays=raw_outputs/box_encodings,raw_outputs/class_predictions --input_shapes=1,300,300,3 --inference_type=QUANTIZED_UINT8 --mean_values=128 --std_dev_values=127 --default_ranges_min=0 --default_ranges_max=6

We use dummy quantization because the trained model may have layers that does not have max and min values for quantization. For example, Relu6 does not have such information. Therefore, we are specifying plausible min and max.

Original Guide is here:


Guide is here:

This guide lets talks about how to run an example project and how to understand the code.

Step 1: Install iOS Tensorflow Lite in Xcode Project:

The Tensorflow Lite is installed through CocoaPods.

Install cocoapods.

In Podfile under the project folder:

target 'Obstacle-Detection' do pod 'Zip', '~> 1.1' #pod 'TensorFlow-experimental', '~> 1.1' #pod 'TensorFlowLiteGpuExperimental', '0.0.1' end

Zip is another framework Obstacle-Detection used.

Choose either TensorFlow-experimental or TensorFlowLiteGpuExperimental. If you want to use GPU to run the network, install TensorFlowLiteGpuExperimental. To be mindful, there are extra steps to configure the project if you want to use GPU. Read: In this tutorial, TFLITE_USE_CONTRIB_LITE does not exist in sample code, simply ignore.

Then, in bash under that project folder:

pod install

Or have installed before, use:

pod repo update

Step 2: Develop iOS App:

Just use the camera example from tensorflow lite, and modify elemtents.

Click into TFLite methods to see its description.

  1. Copy the converted model over.
    1. Copy the model to the project.
    2. Add the model to Bundle Resources.
  2. List the labels used.
  3. Change parameters of model and its input.
  4. Change some code in runModelOnFrame() to handle the output of the model.

Bemindful that there is an input process error in the demo code, the x and y values are flipped. Code has to be modified for both quantized and non-quantized functions. Details are here:

Bemindful that the demo code is missing an input check. In, function evaluateOnBuffer, it calls CFRetain on pixelBuffer directly without checking whether pixelBuffer is NULL or not. The pixelBuffer returned by CMSampleBufferGetImageBuffer can be NULL. CFRetain(NULL) will crash the app. Thus, add a condition to check whether pixelBuffer is NULL or not is necessary before calling CFRetain(pixelBuffer).

Results and a Conclusion Coming Up:

The initial model that was trained on 45 images was having decent recognition on big boxes and poles. Even a slim coke bottle is recognized as an obstacle. Therefore, we can conclude that our architecture can converge.

The complete dataset was then used. After early stopping technique was used, the model does not recognize well, worse than the initial model. With an overfitted model, the model does not recognize at all.

It turns out, the data was poorly labeled such that some boxes are not tight, some obstacles are not drawn, some edges are labeled with multiple boxes, some boxes are not bounding the object, etc.

After all of these problems were solved, We are able to achieve a decent detection. Our peak mAP is 0.1705, peak mAP@.50IOU is 0.291, and mAP large is 0.1784. Recall that we only have 315 images.

There is an interesting observation: Fire hydrant is particularily accurate as a fire hydrant in pretrained model, and is also super accurate as an obstacle. However, when I remove the blue channel, the pretrained model categorized it as a bottle...

From such observation, I believe we can reach this result with such limited images can be due to the fact that obstacles are a superset of lots of objects. The pretrained model has been trained on lots and lots of objects. Therefore, during the fine-tuning process, our model does not need to learn new features or new objects, but only need to categorize the ones that exists. This could be the reason why we are able to obtain such accuracy with a limited dataset.

According to Apple, some model can be fine-tuned with as little as 60 images... But, as overachievers, we are looking to expand our dataset to at least 1000 images. (I take this back)

The followings are the metrics for our model.

As you can see, the mAP oscillates larger and larger at the end of training, but the loss is not decreasing. Even more, in mAP@0.75, the model is not having a higher mAP. In mAP@0.5, the model performance starts to stablalize. The smoothed version looks kind of fine, but the raw values are decreasing and oscillates a lot. Also, the validation loss is not decreasing but the training loss continues to decrease. This could result in an overfitting problem. Therefore, I stopped the training.

Tensorboard Results 1 Tensorboard Results 2 Tensorboard Results 3 Tensorboard Results 4