B-spline Texture Coefficients Estimator for Screen Content Image Super-Resolution

Daegu Gyeongbuk Institute of Science and Technology (DGIST)
* Equal Contribution

CVPR 2023 (Highlight)

Input (60px)

Input 1

BTC (60-270px)

Output 1

Input (70px)

Input 2

BTC (70-210px)

Output 2

Abstract

Screen content images (SCIs) include many informative components, e.g., texts and graphics. Such content creates sharp edges or homogeneous areas, making a pixel distribution of SCI different from the natural image. Therefore, we need to properly handle the edges and textures to minimize information distortion of the contents when a display device's resolution differs from SCIs. To achieve this goal, we propose an implicit neural representation using B-splines for screen content image super-resolution (SCI SR) with arbitrary scales. Our method extracts scaling, translating, and smoothing parameters of B-splines. The followed multilayer perceptron (MLP) uses the estimated B-splines to recover high-resolution SCI. Our network outperforms both a transformer-based reconstruction and an implicit Fourier representation method in almost upscaling factor, thanks to the positive constraint and compact support of the B-spline basis. Moreover, our SR results are recognized as correct text letters with the highest confidence by a pre-trained scene text recognition network.

Motivation

Comparison on naturalness value distribution between SCIs and natural images (NIs).


The informative components of SCIs, such as text and graphics, create sharp edges and homogeneous areas, resulting in a pixel distribution distinct from that of NIs (bottom row). These differences can cause previous Super-Resolution (SR) methods (e.g., LTE) to risk under- or overshooting at the discontinuities in SCIs. Therefore, it is crucial to properly handle the edges and textures of SCIs to minimize information distortion and artifacts when reconstructing high-resolution SCIs. To address this challenge, we propose B-spline Texture Coefficients estimator (BTC).

Qualitative Comparison

for integer number scales (\(\times\)3, \(\times\)4, \(\times\)5, and \(\times\)7)

We report the scene text recognition (STR) results of the red-highlighted boxes using a pre-trained STR network.

Input
MetaSR
[Hu 2019]
LIIF
[Chen 2021]
ITSRN
[Yang 2021]
LTE
[Lee 2022]
BTC
(ours)
GT
Prediction
Wealoy
Weakly
Weakly
Weakly
Weakly
Weakly
Avg. Conf.(%)
85.46
98.48
99.69
99.64
99.69
99.80
Prediction
Decision
Decision
Declsion
Decision
Decision
Decision
Avg. Conf.(%)
93.78
92.42
96.07
96.39
96.63
99.90
Prediction
2008/1
2038/1
2008/1
20S8/1
2018/1
2018/1
Avg. Conf.(%)
97.36
99.83
99.88
93.87
99.94
99.98
Prediction
Countries
Cowntries
Countries
Gowntries
Countries
Countries
Avg. Conf.(%)
99.91
97.82
99.87
95.48
99.91
99.91


for fraction number scales (\(\times\)1.49 and \(\times\)6.25)

MetaSR
[Hu 2019]
LIIF
[Chen 2021]
ITSRN
[Yang 2021]
LTE
[Lee 2022]
BTC
(ours)
GT

Quantitative Comparison

Quantitative comparison on SCI1K test set, SCID, and SIQAD (PSNR (dB)). All methods are trained on SCI1K train set. The best and second results are in red and blue, respectively. RDN trains different models for each scale. MetaSR, LIIF, ITSRN, LTE, and BTC use one model for all scales, and the five models utilize RDN as an encoder. The number of training parameters of RDN is estimated without its upsampling layer.

Overall Pipeline of BTC

Step 1. We estimate the scaling (coef), translating (knot), and smoothing (dilation) parameters of the B-splines from the low-resoltuion (LR) input image.

Step 2. We utilize B-spline representation by using the estimated B-spline feature vectors that are closest to the query point in the high-resolution (HR) coordinate.

Step 3. The subsequent implicit neural representation (INR) decoder infers the RGB values of the query point using the B-spline represenation results.

BibTeX


    @inproceedings{pak2023b,
      title     = {Textual Query-Driven Mask Transformer for Domain Generalized Segmentation},
      author    = {Pak, Byeonghyun and Lee, Jaewon and Jin, Kyong Hwan},
      booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
      pages     = {10062--10071},
      year      = {2023}
    }