Inferring a generic 3D scene by using multi-view methods has been extensively investi-
gated since the beginning of the research in Computer Vision. However, performance is
generally low when the observed scene is complex: Strong shading variations, illumina-
tion changes can affect heavily the final estimation, especially if the scene is composed by
moving objects with smooth and untextured surfaces. For example, urban street scenes are
characterised by all these difficulties, and classical approaches based solely on geometri-
cal cues can give poor results. A strategy to overcome the limitations that arise in some
scenarios consists in exploiting the semantic information to improve the robustness of ge-
ometric approaches. Semantics can be also used to localise objects inside the scene and to
separate them from the surrounding environment. This thesis proposes novel approaches
for scene understanding using RGB images, in particular for the motion segmentation and
the object localisation problems.
For segmenting motions two novel frameworks are described: A pair-wise consensus and a n-view optimisation based approaches. Both of them employ a state-of-art object detector to derive the semantics. The pair-wise method adopts a RANSAC strategy for fitting the motions, where the selection of the samples is driven by a semantic score con- fidence. The n-view framework utilises geometrical constraints and known object classes associated to the urban-street level scenario to over-constrain the problem and to better separate long-term trajectories belonging to background or objects, reducing the effect of the noise.
The object localisation task is performed by a multi-view technique, which handles the information provided by the object detector through the bounding boxes in order to estimate the volume occupied by the objects. The method is geometric and has been formulated in closed form for both the perspective and orthographic camera models. An extensive campaign of experiments has been performed for all the techniques, showing that the inclusion of high-level reasoning in geometrical approaches leads to better results, especially when dealing with realistic scenarios.