Google Summer of Code 2021 with OpenCV: Loop closure algorithm based on HF-Net

Loop closure algorithm based on HF-Net for depth fusion

Student: Zihao Mu

Mentor: Rostislav Vasilikhin

Link to accomplished work:

Merged PR: opencv_contrib/pull/3002;
Detailed Tutorial: OpenCV LCD Tuorial (pending);
LCD Sample: large_kinfu_LCD;
Pre-trained and useful model: Model LINK;
Video Demo;

Introduction

Hi, I'm Zihao Mu! I was the developer of openCV GSoC2021. The goal of this project is to improve Depth Fusion in OpenCV RGBD module. In this deep learning era, we can implement some more efficient loop closure detection (LCD) methods. The project mainly consists of two parts:

First Part

Deploy HF-Net to OpenCV DNN.

Second Part:

Combine the existing large kinect fusion class, run loop closure detection, and add a new loop detection edge. Finally, optimize the whole pose graph.

My Journey

First Period

At this stage, since some layers are not supported, the HF-Net can not run by OpenCV DNN, more details at issue. But fortunately, here is a temporary solution : use OpenVINO backend. In my case, I tried to compiled the OpenCV nightly (2021.08.18) with OpenVINO 2021.3. NOTE that the latest 2021.4 OpenVINO can not work. (How to Compile OpenCV with OpenVINO?)

In addition to HF-Net, I also tried other DNN model which was supported by OpenCV DNN, like DeepLCD[10], models in place365[11]. But sadly, these model can only work on constrained outdoor enviroment, and its performance and accuracy is guaranteed. And I also tried NetVLAD and NextVLAD[13] with ONNX format. However, there are also some missing layer, like ReduceL2. Additionally, the precision of the two model is not very good. And in loop closure detection job, the precision is more important than recall. In the end I discarded these models and only kept HF-Net.

After GSoC, I will try to add a new layer to enable OpenCVDNN to support HF-Net.

The openvino model and original model of HF-Net can be found at Model LINK.

Second Period

In order to add the whole pipeline of LCD, I was inspired the code from [2, 4, 12]. Firstly, I have implemented the KeyFrame strategy inopencv_contrib/modules/rgbd/src/keyframe.hpp. Then, I have implemented the LCD class in opencv_contrib/modules/rgbd/src/loop_closure_detection.hpp. When loop checking in the loop_closure_detection.cpp, we not only used the DNN feature extracted by HF-Net, we also added the ORB feature matching as a double check. After loop check, , Finally, I have integrated new LCD edge to opencv_contrib/modules/rgbd/src/submap.hpp and opencv_contrib/modules/rgbd/src/large_kinfu.cpp.

If the OpenVINO binary model in Model LINK can not work, you should convert the original_frozen_tf.pb to OpenVINO binary model by yourself. (How to convert the TF mode to OpenVINO?)

What you should to do is run the following code:

python3 mo_tf.py --input_model <INPUT_MODEL>.pb --output_dir <OUTPUT_MODEL_DIR>

Experimental Results

Computer Enviroment

HW:

CPU: i5-10th Gen RAM: 16GB GPU: No GPU

SW:

mac OS OpenCV 4.5. OpenVINO 2021.3

Experiment on TUM RGBD Dataset

Used dataset: fr1/room

After running the large_kinfu_LCD, there are three LCD edges was detected.

The terminal output is :

...
# Looping edge 1
loopCheck LCD: Best Frame ID = 12, similarity = 0.846252
LCD: Find a NEW LOOP! from Submap :6 to Submap:0
Current frameID: 1026
...
# Looping edge 2
loopCheck LCD: Best Frame ID = 173, similarity = 0.841523
LCD: Find a NEW LOOP! from Submap :7 to Submap:0
Current frameID: 1207
...
# Looping edge 3
loopCheck LCD: Best Frame ID = 243, similarity = 0.84479
LCD: Find a NEW LOOP! from Submap :8 to Submap:0
Current frameID: 1248

Now we can detect matching image pairs:

# Looping edge 1: image-pair <12, 1026>, the detected result is Correct.

# Looping edge 2: image-pair <173, 1207>, the detected result is Correct.

# Looping edge 3: image-pair <243, 1248>, the detected result is Correct.

Insufficient of LCD in RGBD

Insufficient of HF-Net

The current HF-Net is not perfect, but he is good enough.
- The model has a large size (125 MB) and a large amount of parameters. It is hard to deploy the model on some edge device. INT8 model maybe helpful.
- Hard to implemet with OpenCV DNN. More DNN layers need to be supported. (I will continue this work. )
Insufficient LCD implementation

There are three output of HF-Net, global descriptor, local descriptor, and keypoints. At this stage, we just use the global descriptor.
No BoW support

After the [PR by Justin], we can try to add the BoW which can speed up the DNN feature mathing in LCD.