No icon

End-to-End Blind Image Quality Assessment Using Deep Neural Networks

End-to-End Blind Image Quality Assessment Using Deep Neural Networks


We propose a multi-task end-to-end optimized deep neural network (MEON) for blind image quality assessment (BIQA). MEON consists of two sub-networks—a distortion identification network and a quality prediction network—sharing the early layers. Unlike traditional methods used for  training multi-task networks, our training process is performed in two steps. In the first step, we train a distortion type identification sub-network, for which large-scale training samples are readily available. In the second step, starting from the pretrained early layers and the outputs of the first sub-network, we train a quality prediction sub-network using a variant of the stochastic gradient descent method. Different from most deep neural networks, we choose biologically inspired generalized divisive normalization (GDN) instead of rectified linear unit as the activation function. We empirically demonstrate that GDN is effective at reducing model parameters/layers while achieving similar quality prediction performance. With modest model complexity, the proposed MEON index achieves state-of-the-art performance on four publicly available benchmarks. Moreover, we demonstrate the strong competitiveness of MEON against state-of-the-art BIQA models using the group maximum differentiation competition methodology.

Existing System:

DNN has shown great promises in many vision tasks, end-to-end optimization of BIQA is challenging due to the lack of sufficient ground truth samples for training. Note that the largest subject-rated image quality assessment (IQA) database contains only 3, 000 annotations, while digital images live in a space of millions of dimensions. Previous DNN-based BIQA methods tackle this challenge in three ways. Methods of the first kind directly inherit the architectures and weights from pre-trained networks for general image classification tasks followed by fine-tuning. The performance and efficiency of such networks depend highly on the generalizability and relevance of the tasks used for pre-training.

The second kind of methods work with image patches by assigning the subjective mean opinion score (MOS) of an image to all patches within it. This approach suffers from three limitations. First, the concept of quality without context (e.g., the quality of a single 32 × 32 patch) is not well defined.

Second, local image quality within context  varies across spatial locations even when the distortion is homogeneously applied. Third, patches with similar statistical behaviors (e.g., smooth and blurred regions) may have substantially different quality [20]. Methods of the third kind make use of full-reference IQA (FR-IQA) models for quality annotation. Their performance is directly affected by that of FR-IQA models, which may be inaccurate across distortion levels and distortion types.

Other methods for generating training data involve the creation of synthetic scores  and discriminable image pairs (DIP), both of which rely on FR-IQA models and may suffer from similar problems.

Proposed System:

We believe the performance improvement arises because 1) the proposed novel learning framework has the quality prediction subtask regularized by the distortion identification subtask; 2) images instead of patches are used as inputs to reduce the label noise; 3) the pre-training step enables the network to start from a more task-relevant initialization, resulting in a better local optimum.

Comment As:

Comment (0)