IS-NEAR: Implicit Semantic Neural Engine and Multi-Sensor Data Rendering With 3D Global Feature

3DV 2024

1Huawei 2012 Laboratories   2University of Stuttgart   3Huawei Munich Research Center  

We introduce IS-NEAR, an advanced implicit semantic neural engine designed for the efficient rendering of high-quality images, geometry, and semantics.

Abstract

Data-driven Computer Vision (CV) tasks are still limited by the amount of labeled data. Recently, some semantic NeRFs have been proposed to render and synthesize novel-view semantic labels. Although current NeRF methods achieve spatially consistent color and semantic rendering, the capability of the geometrical representation is limited. This problem is caused by the lack of global information among rays in the traditional NeRFs since they are trained with independent directional rays. To address this problem, we introduce the point-to-surface global feature into NeRF to associate all rays, which enables the single ray representation capability of global geometry. In particular, the relative distance of each sampled ray point to the learned global surfaces is calculated to weight the geometry density and semantic-color feature. We also carefully design the semantic loss and back-propagation function to solve the problems of unbalanced samples and the disturbance of implicit semantic field to geometric field. The experiments validate the 3D scene annotation capability with few feed labels. The quantification results show that our method outperforms the state-of-the-art works in efficiency, geometry, color and semantics on the public datasets. The proposed method is also applied to multiple tasks, such as indoor, outdoor, part segmentation labeling, texture rerendering and robot simulation.


Method



The proposed implicit engine IS-NEAR includes four modules, i.e., feature module, geometry module, semantic module and color module. Firstly, the hash features are looked up from our feature module, and then they are converted into density features. These features are subsequently weighted by the point-to-surface weights to obtain the semantic-color feature. Then, the semantic-color feature is fed into decoders to get color and semantic probability. Finally, the image, depth and semantics are obtained by volume rendering.


Results

Synthetic NeRF dataset

Virtual KITTI Dataset with Sparse Views

Application Texture Rerendering

BibTeX

@inproceedings{sun2024is,
    title     = {IS-NEAR: Implicit Semantic Neural Engine and Multi-Sensor Data Rendering With 3D Global Feature},
    author    = {Sun, Tiecheng and Zhang, Wei and Dong, Xingliang and Lin, Tao},
    booktitle = {International Conference on 3D Vision (3DV)},
    month     = {March},
    year      = {2024},
}