KPN-MFI: A Kernel Prediction Network with Multi-frame Interaction for Video Inverse Tone Mapping


Up to now, the image-based inverse tone mapping (iTM) models have been widely investigated, while there is little research on video-based iTM methods. It would be interesting to make use of these existing image-based models in the video iTM task. However, directly transferring the image-based iTM models to video data without modeling spatial-temporal information remains nontrivial and challenging. Considering both the intra-frame quality and the inter-frame consistency of a video, this article presents a new video iTM method based on a kernel prediction network (KPN), which takes advantage of multi-frame interaction (MFI) module to capture temporal-spatial information for video data. Specifically, a basic encoder-decoder KPN, essentially designed for image iTM, is trained to guarantee the mapping quality within each frame. More importantly, the MFI module is incorporated to capture temporal-spatial context information and preserve the inter-frame consistency by exploiting the correction between adjacent frames. Notably, we can readily extend any existing image iTM models to video iTM ones by involving the proposed MFI module. Furthermore, we propose an inter-frame brightness consistency loss function based on the Gaussian pyramid to reduce the video temporal inconsistency. Extensive experiments demonstrate that our model outperforms state-of-the-art image and video-based methods. The code is available at