Robot perception is the capability of a robot to estimate and understand its surroundings to the degree that enables it to navigate and interact with the environment. Right at the core of robot perception lies the problem of building an internal model of the robot’s surroundings using onboard sensor data and prior knowledge. Although the internal model of the environment can be purely geometric (e.g., a point cloud)—as in traditional simultaneous localization and mapping (SLAM)—it can also contain higher-level structures, such as objects and other semantic elements of the scene (e.g., buildings, roads, pedestrians).