Disentangled Cross-modal Fusion for Event-guided Image Super-Resolution
– Published Date : 2024/06/28
– Category : Super Resolution, Event Camera
– Place of publication : IEEE Transactions on Artificial Intelligence (TAI), 2024
Abstract:
Event cameras detect the intensity changes and produce asynchronous events with high dynamic range and no motion blur. Recently, several attempts have been made to superresolve the intensity images guided by events. However, these methods directly fuse the event and image features without distinguishing the modality difference and achieve image superresolution (SR) in multiple steps, leading to error-prone image SR results. Also, they lack quantitative evaluation of real-world data. In this paper, we present an end-to-end framework, called EGI-SR to narrow the modality gap and subtly integrate the event and RGB modality features for effective image SR. Specifically, EGI-SR employs three Cross-Modality Encoders (CME) to learn modality-specific and modality-shared features from the stacked events and the intensity image, respectively. As such, EGI-SR can better mitigate the negative impact of modality varieties and reduce the difference in the feature space between the events and the intensity image. Subsequently, a transformer-based decoder is deployed to reconstruct the SR image. Moreover, we collect a real-world dataset, with temporally and spatially aligned events mand color image pairs. We conduct extensive experiments on the synthetic and real-world datasets, showing EGI-SR favorably surpassing the existing methods by a large margin.