Online material for VQA of Inversely Tone Mapped HDR Videos

Our VADIV database can be downloaded from the following link:



Verification Code:yx66

Code & Model:


Verification Code: f1jo

These can also be downloaded on github:

More details about ITM-HDR-Videos dataset.

Visually, the ITM algorithm changes the dynamic range, colour representation and local detail of the SDR video, and we will show some visual examples from the ITM-HDR-Videos dataset in these areas. In addition we will also show examples of similar MRS but with very different MAS. It is worth noting that HDR video data cannot be stored correctly on web pages, and that web pages in general are displayed based on srgb. Therefore, we show these examples with the same contrast and saturation stretching to obtain a visual effect as similar as possible to viewing on an HDR display.

1、All SDR videos in ITM-HDR-Videos dataset .

2、More details about the ITM-HDR-videos generation from ITM algorithms.

The ITM based converters to generate HDR10 videos are different between image-based ITM algorithms and video-based ones. As shown in Fig (a), we can directly package the HDR frames outputted from video-based algorithms into an HDR10 video by the FFMPEG tool. In contrast, image-based algorithms output HDR images that record linear luminance values instead of HDR10 pixel values. Moreover, the color coordinate system to indicate RGB channels in HDR images is still based on the BT.709 space rather than the BT.2020 space. Thus, it is required to perform the color space conversion, the PQ OETF transformation, and the 10-bit quantization on HDR images before being packaged into an HDR10 video, as shown in Fig (b).

3、Different ITM algorithms lead to differences in dynamic range.

4、Different ITM algorithms lead to differences in colorfulness.

5、Different ITM algorithms lead to differences in local details.

6、Example of flicking caused by ITM.

7、The same MRS with different MAS.

The quality of the video on the left is significantly higher than that on the right, due to the heavy noise in the dark areas and overexposure of the bright areas.

Ⅱ 、More detail about OUR VQA MODEL

1、Detail about the data preprocess.

SDR video frames are normalized from [0,255] to [0,1] when pre-trained and online learning, while HDR video frames are normalized from [0,1023] to [0,1].

2、 More detail about feature Normalization, Concatenation & Regression.

The BN means nn.BatchNorm1d(·) and concatenation mean·) in pytorch.

The specific implementation of Maximum Normalization is as follows:

The left side of the figure represents the n-th ITMV. The right side of the figure represents a certain feature descriptor. The formula of Maximum Normalization is as follows:

3、Computational complexity analysis with deep learning models in comparison methods.

Since our online learning requires 15 iterations, the FLOPs are multiplied by 15.
The computational complexity are computed base on a 4K 50fps HDR video with
6 seconds.