Make Good Choices:

We want to detect obstacles, that's why we called it obstacle detector.

Candidate Framework:

Tensorflow wins because of its large community and is available on Windows. It is a big deal because one teammmate has his RTX 2080Ti on a Windows system.

Candidate Model:

SSD has more tensorflow implementation than YOLO and pretrained weights as well. However, I will pick YOLO v3 over SSD.

We used SSD because it is supported by Tensorflow Object Detection API and its fastness. It is a big deal because the API really shortens the development time. We used an SSD with MobileNet V2 as backbone (from the Model Zoo).

Therefore, the trial-and-error process is following:

  1. (DONE) I will work with SSD on Tensorflow on Windows first and see if it works.
    1. It works pretty well... Not the best result, but still manages.
  2. (PASS) Then, I will work with modifying model's first layer to consume disparity map as the fourth channel with models in 1.
  3. (CONSIDERABLE) Then, I will work with image segmentation with Mask-RCNN in Tensorflow Object Detection API with pretrained weights to classify each pixel as road or non-road. Idea from https://stackoverflow.com/questions/6007822/finding-path-obstacles-in-a-2d-image
  4. (PASS) Then, I will work with modifying model's first layer to consume disparity map as the fourth channel with models in 3.
  5. (PASS) Then, I will work on back-up plan.

Links:

Tensorflow Object Detection API: https://github.com/tensorflow/models/tree/master/research/object_detection

Model Zoo: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md

Data labeling software: dataturks.com

Step by Step Tensorflow Object Detection API Tutorial (step by step my ass): https://medium.com/@WuStangDan/step-by-step-tensorflow-object-detection-api-tutorial-part-1-selecting-a-model-a02b6aabe39e

Image Segmentation Idea Source: https://stackoverflow.com/questions/6007822/finding-path-obstacles-in-a-2d-image

TODOs:

TODOS:

Postponed:

Postponed Indefinitely:






































Guides Coming Up:

Install Tensorflow Object Detection API:

The procedure is simple. Follow the guide.

https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md

However, since I am installing on Windows, a few tweeks shall take place.

  1. Dependencies:
  2. COCO API installation
  3. Protobuf Compilation
  4. Add Libraries to PYTHONPATH

Download Data and Prepare Data:

In General:

Story:

What I did was to edit the scripts from dataturks to pascal voc and from pascal voc to tensorflow record.

I thought about combining both scripts into one. However, the script tensorflow provides uses tensorflow as intermediate between console and actual script. Therefore, using two scripts does not seems to be too much of a big deal.

Scripts:

Data Preparation General Pipeline:

  1. Upload data onto dataturks.com and label each image.
  2. Download the json file that has the url of each uploaded image and its annotation.
  3. Manually write label_map.pbtxt. It matches different labels to a number. This is simple.
  4. Use mkdir_and_random_train_val.py, to read in the json file for a list of all images and randomly choose train and validation data. The result will be stored in train.txt and val.txt, which will be needed for converting Pascal VOC to TFRecord. The train.txt and val.txt has one image name per line, with " 1" following the name. I have not yet discovered what that 1 or -1 is for in PASCAL VOC format, but this number is not used in pascal_to_tf.py. This script will also create folders for storing images and annotations.
  5. Use dataturks_to_pascal.py to download the data. The script is written for Linux, thus, it parses paths with "/". However, I am using Windows. Therefore, there are a few places that needs to change "/" to "". Otherwise, the image will download in a wierd file structure.
  6. Use pascal_to_tf.py to convert Pascal VOC format to TFRecord. What this script is different from original one is that there are fields that dataturks does not support. These fields are removed from conversion. The script is also altered to fit our file structure.
  7. Done. You should have train.record and val.record.

The json file downloaded from dataturks will be called "Obstacle Detection Dataset.json"

Step 1: Create Label Map:

item { id: 1 name: 'Obstacle' } item { id: 2 name: 'Pothole' } ......etc......

There is no id 0. id of 0 is a placeholder. Therefore, id starts from 1.

Step 2: Make Data Folder Structure for Conversions and Randomly Select Train and Val Data:

python mkdir_and_random_train_val.py

Directories are defined in mkdir_and_random_train_val.py file.

It will create a directory:

+formatted data +annotated +imgs

It will also create two files: train.txt and val.txt.

Step 3: Convert From Dataturks to Pascal VOC (Also Downloads Images):

python dataturks_to_pascal.py "Obstacle Detection Dataset.json" "formatted data\imgs" "formatted data\annotated"

Modified from https://dataturks.com/help/ibbx_dataturks_to_pascal_voc_format.php.

Beaware of difference between Linux and Windows path representation. Linux uses '/' while Windows uses ''.

Step 4: Convert From Pascal VOC to TFRecord:

python pascal_to_tf.py --data_dir="formatted data" --annotations_dir="formatted data\annotated" --output_path=pascal.record --label_map_path=label_map.pbtxt --set=train

set parameter can be "train" or "val", which it will read train.txt or val.txt to create training set or val set.

Modified from models-master\research\object_detection\dataset_tools\create_pascal_tf_record.py

Beaware of difference between Linux and Windows path representation. Linux uses '/' while Windows uses ''.

Fine-tune SSD:

We have data processed. We need to train SSD to fit out data.

Guide is here: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_locally.md.

Step 1: Create Training Folder Structure:

The folder structure is like this:

+data -label_map file -train TFRecord file (train.record) -eval TFRecord file (val.record) +models + model -pipeline config file (pipeline.config from pretrained model) -checkpoint files (model.ckpt.data-00000-of-00001 model.ckpt.index model.ckpt.meta) +train +eval

Step 2: Modify pipeline.config:

There is one line in SSD's pipeline.config (a line about batch_norm) that will not pass. Simply delete that line.

Modify the paths to something like this:

model: fine_tune_checkpoint: "C:/Users/zhuka/iCloudDrive/Desktop/Obstacle Detection/train/models/model/ train: label_map_path: "C:/Users/zhuka/iCloudDrive/Desktop/Obstacle Detection/train/data/label_map.pbtxt" input_path: "C:/Users/zhuka/iCloudDrive/Desktop/Obstacle Detection/train/data/train.record" val: label_map_path: "C:/Users/zhuka/iCloudDrive/Desktop/Obstacle Detection/train/data/label_map.pbtxt" input_path: "C:/Users/zhuka/iCloudDrive/Desktop/Obstacle Detection/train/data/val.record"

Absolute path seems necessary. Change them accordingly.

Beaware of difference between Linux and Windows path representation. Linux uses '/' while Windows uses ''.

Step 3: Training:

PIPELINE_CONFIG_PATH={path to pipeline config file} MODEL_DIR={path to model directory} NUM_TRAIN_STEPS=50000 SAMPLE_1_OF_N_EVAL_EXAMPLES=1 python model_main.py --pipeline_config_path=${PIPELINE_CONFIG_PATH} --model_dir=${MODEL_DIR} --num_train_steps=${NUM_TRAIN_STEPS} --sample_1_of_n_eval_examples=$SAMPLE_1_OF_N_EVAL_EXAMPLES --alsologtostderr

model_main.py located in tensorflow/models/research/object_detection directory

Change these variables to suites your needs.

Step 3.5: Saving:

By default, the model saves at most 5 checkpoints. To find the optimal checkpoint, we need to save the new checkpoint somewhere else. This process is not user-friendly since it is manual.

Therefore, I changed saving checkpoints to 100 by modifying following code:

In object detection/model_lib.py, we modify line ~490, which creates a saver.

saver = tf.train.Saver( sharded=True, keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours, save_relative_paths=True, max_to_keep=10000)

This change suffice our needs, but to keep everything consistent, we shall change line ~470 in the same file.

We hardcode this value because it is simple to do. If we want to not to hardcode it in our code, we need to put it in pipeline.config, edit pipeline.proto, recompile proto, and then it may work.

I have also modified line 62 in model_main.py to:

config = tf.estimator.RunConfig(model_dir=FLAGS.model_dir, keep_checkpoint_max=10000)

Setting this only would not work. I am not sure if not setting it and only sets the saver would work or not.

Step 4: Open TensorBoard:

tensorboard --logdir=${MODEL_DIR}

Export Model to TFLite:

Step 1: Convert from checkpoints to tflite.pb:

python export_tflite_ssd_graph.py \ --pipeline_config_path=pipeline.config \ --trained_checkpoint_prefix=model.ckpt-16812\ --output_directory=output_tflite \ --add_postprocessing_op=true

trained_checkpoint_prefix should matches three files: .data-00000-of-00001 .index .meta.

Using this script is to make sure that all operations from tflite.pb will be supported in Tensorflow Lite.

Step 2a: Convert from tflite.pb to .tflite files (Non Quantized Model):

tflite_convert --output_file=ssd.tflite --graph_def_file=output_tflite\tflite_graph.pb --input_arrays=normalized_input_image_tensor --output_arrays=TFLite_Detection_PostProcess,TFLite_Detection_PostProcess:1,TFLite_Detection_PostProcess:2,TFLite_Detection_PostProcess:3 --input_shapes=1,300,300,3 --allow_custom_op

input_shape, input_arrays, and output_arrays are described in export_tflite_ssd_graph.py file header.

output_arrays is TFLite_Detection_PostProcess,TFLite_Detection_PostProcess:1,TFLite_Detection_PostProcess:2,TFLite_Detection_PostProcess:3 because TFLite_Detection_PostProcess tensor has 4 outputs. TFLite_Detection_PostProcess alone will only includes the first output. It is similar to pointer array (CSE 30~~~, thanks, Rick).

TFLite_Detection_PostProcess is a custom operation, thus adding allow_custom_op is necessary because tflite_convert thought native TFLite will not support it. However, it is actually implemented. Therefore, we need to bypass this check. There is no other custom operation in the network.

input_shape's width and height is defined in pipeline.config:

fixed_shape_resizer { height: 300 width: 300 }

Original Guide is here: https://www.tensorflow.org/lite/convert/cmdline_examples#convert_a_tensorflow_graphdef_.

Step 2b: Convert from tflite.pb to .tflite files (Quantized Model):

tflite_convert --output_file=model_quantized.tflite --graph_def_file=output_tflite\tflite_graph.pb --input_arrays=normalized_input_image_tensor --output_arrays=raw_outputs/box_encodings,raw_outputs/class_predictions --input_shapes=1,300,300,3 --inference_type=QUANTIZED_UINT8 --mean_values=128 --std_dev_values=127 --default_ranges_min=0 --default_ranges_max=6

We use dummy quantization because the trained model may have layers that does not have max and min values for quantization. For example, Relu6 does not have such information. Therefore, we are specifying plausible min and max.

Original Guide is here: https://www.tensorflow.org/lite/convert/cmdline_examples#use_dummy-quantization_to_try_out_quantized_inference_on_a_float_graph_.

Deploy:

Guide is here: https://www.tensorflow.org/lite/guide/ios.

This guide lets talks about how to run an example project and how to understand the code.

Step 1: Install iOS Tensorflow Lite in Xcode Project:

The Tensorflow Lite is installed through CocoaPods.

Install cocoapods.

In Podfile under the project folder:

target 'Obstacle-Detection' do pod 'Zip', '~> 1.1' #pod 'TensorFlow-experimental', '~> 1.1' #pod 'TensorFlowLiteGpuExperimental', '0.0.1' end

Zip is another framework Obstacle-Detection used.

Choose either TensorFlow-experimental or TensorFlowLiteGpuExperimental. If you want to use GPU to run the network, install TensorFlowLiteGpuExperimental. To be mindful, there are extra steps to configure the project if you want to use GPU. Read: https://www.tensorflow.org/lite/performance/gpu. In this tutorial, TFLITE_USE_CONTRIB_LITE does not exist in sample code, simply ignore.

Then, in bash under that project folder:

pod install

Or have installed before, use:

pod repo update

Step 2: Develop iOS App:

Just use the camera example from tensorflow lite, and modify elemtents.

Click into TFLite methods to see its description.

  1. Copy the converted model over.
    1. Copy the model to the project.
    2. Add the model to Bundle Resources.
  2. List the labels used.
  3. Change parameters of model and its input.
  4. Change some code in runModelOnFrame() to handle the output of the model.

Bemindful that there is an input process error in the demo code, the x and y values are flipped. Code has to be modified for both quantized and non-quantized functions. Details are here: https://github.com/tensorflow/tensorflow/issues/25784

Bemindful that the demo code is missing an input check. In ODModelEvaluator.mm, function evaluateOnBuffer, it calls CFRetain on pixelBuffer directly without checking whether pixelBuffer is NULL or not. The pixelBuffer returned by CMSampleBufferGetImageBuffer can be NULL. CFRetain(NULL) will crash the app. Thus, add a condition to check whether pixelBuffer is NULL or not is necessary before calling CFRetain(pixelBuffer).


Results and a Conclusion Coming Up:

The initial model that was trained on 45 images was having decent recognition on big boxes and poles. Even a slim coke bottle is recognized as an obstacle. Therefore, we can conclude that our architecture can converge.

The complete dataset was then used. After early stopping technique was used, the model does not recognize well, worse than the initial model. With an overfitted model, the model does not recognize at all.

It turns out, the data was poorly labeled such that some boxes are not tight, some obstacles are not drawn, some edges are labeled with multiple boxes, some boxes are not bounding the object, etc.

After all of these problems were solved, We are able to achieve a decent detection. Our peak mAP is 0.1705, peak mAP@.50IOU is 0.291, and mAP large is 0.1784. Recall that we only have 315 images.

There is an interesting observation: Fire hydrant is particularily accurate as a fire hydrant in pretrained model, and is also super accurate as an obstacle. However, when I remove the blue channel, the pretrained model categorized it as a bottle...

From such observation, I believe we can reach this result with such limited images can be due to the fact that obstacles are a superset of lots of objects. The pretrained model has been trained on lots and lots of objects. Therefore, during the fine-tuning process, our model does not need to learn new features or new objects, but only need to categorize the ones that exists. This could be the reason why we are able to obtain such accuracy with a limited dataset.

According to Apple, some model can be fine-tuned with as little as 60 images... But, as overachievers, we are looking to expand our dataset to at least 1000 images. (I take this back)

The followings are the metrics for our model.

As you can see, the mAP oscillates larger and larger at the end of training, but the loss is not decreasing. Even more, in mAP@0.75, the model is not having a higher mAP. In mAP@0.5, the model performance starts to stablalize. The smoothed version looks kind of fine, but the raw values are decreasing and oscillates a lot. Also, the validation loss is not decreasing but the training loss continues to decrease. This could result in an overfitting problem. Therefore, I stopped the training.

Tensorboard Results 1 Tensorboard Results 2 Tensorboard Results 3 Tensorboard Results 4