April 12, 2025

Recognizing salient objects and estimating their pose accurately, can tremendously benefit autonomous robots in modeling their environment. Such geometric and semantic parsing is significant for scene understanding. These models can serve as backbones for tasks such as Active Perception, Interactive Perception, SLAM (Simultaneous Localization and Mapping), Manipulation, or Visual Navigation. The open challenges in this domain still lie in recognizing relevant objects under heavy occlusion and clutter. This is independent of the problem of having to deal with high-frequency noise encountered in various visual sensors. 
I spent a large part of my academic career researching 3D object recognition under heavy clutter and occlusion. These methods used sensory data that provides both texture and depth information which allowed the recognition system to be robust to sensing noise and variations in lighting. [Reference 1], [Reference 2].
These 3D recognition systems were developed to service indoor mobile manipulators. More recently, I have focused on extending these approaches to classes of objects where the interclass variations are either not pronounced or the environments have a very uniform texture, as encountered in building construction.