This paper discusses the visual-inertial structure from motion problem (VI-SfM problem) with special focus on the following three fundamental issues: observability properties, resolvability in closed form and data association. Regarding the first issue, after a discussion about the current state of the art, the paper investigates more complex scenarios. Specifically, with respect to the common formulation, which assumes three orthogonal accelerometers and three orthogonal gyroscopes, the analysis is extended to cope with the cases of a reduced number of inertial sensors and any number of point features observed by monocular vision. In particular, the minimal case of a single accelerometer, no gyroscope and a single point feature is addressed. Additionally, the analysis accounts for biased measurements and unknown extrinsic camera calibration. The results derived for these new and very challenging scenarios have interesting consequences both from a technological and neuroscientific perspective. Regarding the second issue, a simple closed form solution to the VI-SfM is presented. This solution expresses the structure of the scene and the motion only in terms of the visual and inertial measurements collected during a short time interval. This allows introducing deterministic algorithms able to simultaneously determine the structure of the scene together with the motion without the need for any initialization or prior knowledge. Additionally, the closed-form solution allows us to identify the conditions under which the VI-SfM has a finite number of solutions. Specifically, it is shown that the problem can have a unique solution, two distinct solutions or infinite solutions depending on the trajectory, on the number of point-features and on their arrangement in the 3D space and on the number of camera images. Finally, the paper discusses the third issue, i.e., the data association problem. Starting from basic results in computer vision, it is shown that, by exploiting the information provided by the inertial measurements, a single point correspondence (in the case of a planar motion) and two point correspondences (for a general 3D motion) are sufficient to characterize the motion between two camera poses. This allows us to use a 1-point RANSAC algorithm (in the planar case) or a 2-point RANSAC algorithm (in the general 3D case) to detect outliers. The paper concludes with some discussion about connections to related research fields both in the framework of computer science and neuroscience.
The term Structure from Motion (SfM) was coined by the computer vision community to define the problem of estimating the three-dimensional structure of the scene and the motion from two-dimensional image sequences. This monograph considers the same estimation problem but where the sensor suit is also composed of inertial sensors (accelerometers and gyroscopes). This problem is referred to as the Visual-Inertial Structure from Motion (VI-SfM). The VI-SfM problem has generated particular interest and has been investigated by both computer science and neuroscience. These sensors require no external infrastructure which is a key advantage for robots operating in unknown environments where GPS signals are shadowed. For this reason, vision and inertial sensing have received great attention from within the mobile robotics community in recent years and many approaches have been introduced.
Observability Properties and Deterministic Algorithms in Visual-Inertial Structure from Motion provides the reader with the state of the art in VI-SfM and also adds a series of new results. In particular, these new results significantly improve the current state of the art by providing new properties related to three fundamental issues: observability properties, resolvability in closed-form and data association. These results are important from a technological point of view. Additionally, they can provide a new insight for the comprehension of the process of vestibular and visual integration, which has been investigated in the framework of neuroscience.