We want to detect obstacles, that's why we called it obstacle detector.
Tensorflow wins because of its large community and is available on Windows. It is a big deal because one teammmate has his RTX 2080Ti on a Windows system.
SSD has more tensorflow implementation than YOLO and pretrained weights as well. However, I will pick YOLO v3 over SSD.
We used SSD because it is supported by Tensorflow Object Detection API and its fastness. It is a big deal because the API really shortens the development time. We used an SSD with MobileNet V2 as backbone (from the Model Zoo).
Tensorflow Object Detection API: https://github.com/tensorflow/models/tree/master/research/object_detection
Data labeling software: dataturks.com
Step by Step Tensorflow Object Detection API Tutorial (step by step my ass): https://medium.com/@WuStangDan/step-by-step-tensorflow-object-detection-api-tutorial-part-1-selecting-a-model-a02b6aabe39e
Image Segmentation Idea Source: https://stackoverflow.com/questions/6007822/finding-path-obstacles-in-a-2d-image
Mark some image on Dataturks and download them and convert to Pascal VOC. (Upload image, mark image, download img use script and convert to Pascal VOC use script. Should have existing scirpts)
Install Tensorflow Object Detection API and convert Pascal VOC to TFRecord. (complicate installation and dataset prep and training and testing and exporting)
Use existing SSD Tensorflow implementation with Tensorflow Object Detection API. (Finished dataset prep, tried to train a model)
Use Tensorflow Lite and the trained SSD model on iPhone.
Build a UI.
Train another model at 500 images.
Train another model at 750 images.
Train another model at 1000 images.
The procedure is simple. Follow the guide.
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md
However, since I am installing on Windows, a few tweeks shall take place.
What I did was to edit the scripts from dataturks to pascal voc and from pascal voc to tensorflow record.
I thought about combining both scripts into one. However, the script tensorflow provides uses tensorflow as intermediate between console and actual script. Therefore, using two scripts does not seems to be too much of a big deal.
The json file downloaded from dataturks will be called "Obstacle Detection Dataset.json"
item {
id: 1
name: 'Obstacle'
}
item {
id: 2
name: 'Pothole'
}
......etc......
There is no id 0. id of 0 is a placeholder. Therefore, id starts from 1.
python mkdir_and_random_train_val.py
Directories are defined in mkdir_and_random_train_val.py file.
It will create a directory:
+formatted data
+annotated
+imgs
It will also create two files: train.txt and val.txt.
python dataturks_to_pascal.py "Obstacle Detection Dataset.json" "formatted data\imgs" "formatted data\annotated"
Modified from https://dataturks.com/help/ibbx_dataturks_to_pascal_voc_format.php.
Beaware of difference between Linux and Windows path representation. Linux uses '/' while Windows uses ''.
python pascal_to_tf.py --data_dir="formatted data" --annotations_dir="formatted data\annotated" --output_path=pascal.record --label_map_path=label_map.pbtxt --set=train
set parameter can be "train" or "val", which it will read train.txt or val.txt to create training set or val set.
Modified from models-master\research\object_detection\dataset_tools\create_pascal_tf_record.py
Beaware of difference between Linux and Windows path representation. Linux uses '/' while Windows uses ''.
We have data processed. We need to train SSD to fit out data.
Guide is here: https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/running_locally.md.
The folder structure is like this:
+data
-label_map file
-train TFRecord file (train.record)
-eval TFRecord file (val.record)
+models
+ model
-pipeline config file (pipeline.config from pretrained model)
-checkpoint files (model.ckpt.data-00000-of-00001 model.ckpt.index model.ckpt.meta)
+train
+eval
There is one line in SSD's pipeline.config (a line about batch_norm) that will not pass. Simply delete that line.
Modify the paths to something like this:
model:
fine_tune_checkpoint: "C:/Users/zhuka/iCloudDrive/Desktop/Obstacle Detection/train/models/model/
train:
label_map_path: "C:/Users/zhuka/iCloudDrive/Desktop/Obstacle Detection/train/data/label_map.pbtxt"
input_path: "C:/Users/zhuka/iCloudDrive/Desktop/Obstacle Detection/train/data/train.record"
val:
label_map_path: "C:/Users/zhuka/iCloudDrive/Desktop/Obstacle Detection/train/data/label_map.pbtxt"
input_path: "C:/Users/zhuka/iCloudDrive/Desktop/Obstacle Detection/train/data/val.record"
Absolute path seems necessary. Change them accordingly.
Beaware of difference between Linux and Windows path representation. Linux uses '/' while Windows uses ''.
PIPELINE_CONFIG_PATH={path to pipeline config file}
MODEL_DIR={path to model directory}
NUM_TRAIN_STEPS=50000
SAMPLE_1_OF_N_EVAL_EXAMPLES=1
python model_main.py --pipeline_config_path=${PIPELINE_CONFIG_PATH} --model_dir=${MODEL_DIR} --num_train_steps=${NUM_TRAIN_STEPS} --sample_1_of_n_eval_examples=$SAMPLE_1_OF_N_EVAL_EXAMPLES --alsologtostderr
model_main.py located in tensorflow/models/research/object_detection directory
Change these variables to suites your needs.
By default, the model saves at most 5 checkpoints. To find the optimal checkpoint, we need to save the new checkpoint somewhere else. This process is not user-friendly since it is manual.
Therefore, I changed saving checkpoints to 100 by modifying following code:
In object detection/model_lib.py, we modify line ~490, which creates a saver.
saver = tf.train.Saver(
sharded=True,
keep_checkpoint_every_n_hours=keep_checkpoint_every_n_hours,
save_relative_paths=True,
max_to_keep=10000)
This change suffice our needs, but to keep everything consistent, we shall change line ~470 in the same file.
We hardcode this value because it is simple to do. If we want to not to hardcode it in our code, we need to put it in pipeline.config, edit pipeline.proto, recompile proto, and then it may work.
I have also modified line 62 in model_main.py to:
config = tf.estimator.RunConfig(model_dir=FLAGS.model_dir, keep_checkpoint_max=10000)
Setting this only would not work. I am not sure if not setting it and only sets the saver would work or not.
tensorboard --logdir=${MODEL_DIR}
python export_tflite_ssd_graph.py \
--pipeline_config_path=pipeline.config \
--trained_checkpoint_prefix=model.ckpt-16812\
--output_directory=output_tflite \
--add_postprocessing_op=true
trained_checkpoint_prefix should matches three files: .data-00000-of-00001 .index .meta.
Using this script is to make sure that all operations from tflite.pb will be supported in Tensorflow Lite.
tflite_convert --output_file=ssd.tflite --graph_def_file=output_tflite\tflite_graph.pb --input_arrays=normalized_input_image_tensor --output_arrays=TFLite_Detection_PostProcess,TFLite_Detection_PostProcess:1,TFLite_Detection_PostProcess:2,TFLite_Detection_PostProcess:3 --input_shapes=1,300,300,3 --allow_custom_op
input_shape, input_arrays, and output_arrays are described in export_tflite_ssd_graph.py file header.
output_arrays is TFLite_Detection_PostProcess,TFLite_Detection_PostProcess:1,TFLite_Detection_PostProcess:2,TFLite_Detection_PostProcess:3 because TFLite_Detection_PostProcess tensor has 4 outputs. TFLite_Detection_PostProcess alone will only includes the first output. It is similar to pointer array (CSE 30~~~, thanks, Rick).
TFLite_Detection_PostProcess is a custom operation, thus adding allow_custom_op is necessary because tflite_convert thought native TFLite will not support it. However, it is actually implemented. Therefore, we need to bypass this check. There is no other custom operation in the network.
input_shape's width and height is defined in pipeline.config:
fixed_shape_resizer {
height: 300
width: 300
}
Original Guide is here: https://www.tensorflow.org/lite/convert/cmdline_examples#convert_a_tensorflow_graphdef_.
tflite_convert --output_file=model_quantized.tflite --graph_def_file=output_tflite\tflite_graph.pb --input_arrays=normalized_input_image_tensor --output_arrays=raw_outputs/box_encodings,raw_outputs/class_predictions --input_shapes=1,300,300,3 --inference_type=QUANTIZED_UINT8 --mean_values=128 --std_dev_values=127 --default_ranges_min=0 --default_ranges_max=6
We use dummy quantization because the trained model may have layers that does not have max and min values for quantization. For example, Relu6 does not have such information. Therefore, we are specifying plausible min and max.
Original Guide is here: https://www.tensorflow.org/lite/convert/cmdline_examples#use_dummy-quantization_to_try_out_quantized_inference_on_a_float_graph_.
Guide is here: https://www.tensorflow.org/lite/guide/ios.
This guide lets talks about how to run an example project and how to understand the code.
The Tensorflow Lite is installed through CocoaPods.
Install cocoapods.
In Podfile under the project folder:
target 'Obstacle-Detection' do
pod 'Zip', '~> 1.1'
#pod 'TensorFlow-experimental', '~> 1.1'
#pod 'TensorFlowLiteGpuExperimental', '0.0.1'
end
Zip is another framework Obstacle-Detection used.
Choose either TensorFlow-experimental or TensorFlowLiteGpuExperimental. If you want to use GPU to run the network, install TensorFlowLiteGpuExperimental. To be mindful, there are extra steps to configure the project if you want to use GPU. Read: https://www.tensorflow.org/lite/performance/gpu. In this tutorial, TFLITE_USE_CONTRIB_LITE does not exist in sample code, simply ignore.
Then, in bash under that project folder:
pod install
Or have installed before, use:
pod repo update
Just use the camera example from tensorflow lite, and modify elemtents.
Click into TFLite methods to see its description.
Bemindful that there is an input process error in the demo code, the x and y values are flipped. Code has to be modified for both quantized and non-quantized functions. Details are here: https://github.com/tensorflow/tensorflow/issues/25784
Bemindful that the demo code is missing an input check. In ODModelEvaluator.mm, function evaluateOnBuffer, it calls CFRetain on pixelBuffer directly without checking whether pixelBuffer is NULL or not. The pixelBuffer returned by CMSampleBufferGetImageBuffer can be NULL. CFRetain(NULL) will crash the app. Thus, add a condition to check whether pixelBuffer is NULL or not is necessary before calling CFRetain(pixelBuffer).
The initial model that was trained on 45 images was having decent recognition on big boxes and poles. Even a slim coke bottle is recognized as an obstacle. Therefore, we can conclude that our architecture can converge.
The complete dataset was then used. After early stopping technique was used, the model does not recognize well, worse than the initial model. With an overfitted model, the model does not recognize at all.
It turns out, the data was poorly labeled such that some boxes are not tight, some obstacles are not drawn, some edges are labeled with multiple boxes, some boxes are not bounding the object, etc.
After all of these problems were solved, We are able to achieve a decent detection. Our peak mAP is 0.1705, peak mAP@.50IOU is 0.291, and mAP large is 0.1784. Recall that we only have 315 images.
There is an interesting observation: Fire hydrant is particularily accurate as a fire hydrant in pretrained model, and is also super accurate as an obstacle. However, when I remove the blue channel, the pretrained model categorized it as a bottle...
From such observation, I believe we can reach this result with such limited images can be due to the fact that obstacles are a superset of lots of objects. The pretrained model has been trained on lots and lots of objects. Therefore, during the fine-tuning process, our model does not need to learn new features or new objects, but only need to categorize the ones that exists. This could be the reason why we are able to obtain such accuracy with a limited dataset.
According to Apple, some model can be fine-tuned with as little as 60 images... But, as overachievers, we are looking to expand our dataset to at least 1000 images. (I take this back)
The followings are the metrics for our model.
As you can see, the mAP oscillates larger and larger at the end of training, but the loss is not decreasing. Even more, in mAP@0.75, the model is not having a higher mAP. In mAP@0.5, the model performance starts to stablalize. The smoothed version looks kind of fine, but the raw values are decreasing and oscillates a lot. Also, the validation loss is not decreasing but the training loss continues to decrease. This could result in an overfitting problem. Therefore, I stopped the training.