The next natural step of the pipeline for scene understanding and classification is semantic segmentation, which labels every point and pixel in the point cloud and image respectively of their enclosing object or region. There are multiple existing deep neural networks available for semantically segment indoor and outdoor environments, but they lack scalability for large-scale dense point cloud, efficient downsampling strategies, and effective deep feature aggregation modules. These systems also failed to achieve high mean intersection over union (mIoU) and overall accuracy (OA) on all classes, especially for long-tailed data distribution. With these challenges in mind, it is beneficial to develop a deep neural network scene classification solution utilizing optical images and clean point cloud to extract characteristics unique to multimodality.
Contributors: Jacob Yoo