Standard frame-based cameras have shortcomings of low dynamic range and motion blur in real applications. On the other hand, event cameras, which are bio-inspired sensors, asynchronously output the polarity values of pixel-level log intensity changes and report continuous stream data even under fast motion with a high dynamic range. Therefore, event cameras are effective in stereo depth estimation under challenging illumination conditions and/or fast motion. To estimate the disparity map with events, existing state-of-the-art event-based stereo models use the image together with past events that occurred up to the current image acquisition time. However, not all events equally contribute to the disparity estimation of the current frame since past events occur at different times under different movements with different disparity values. Therefore, events need to be carefully selected for accurate event-guided disparity estimation.
In this paper, we aim to effectively deal with events that continuously occur with different disparity values in the scene depending on the camera’s movement. To this end, we first propose the differentiable event selection network to select the most relevant events for current depth estimation. Furthermore, we effectively use feature-like events triggered around the boundary of objects, leading them to serve as ideal guides in disparity estimation. To this end, we propose a neighbor cross similarity feature (NCSF) that considers the similarity between different modalities. Finally, our experiments on various datasets demonstrate the superiority of our method to estimate the depth using images and event data together.