H.6.5.2. Computer vision
Kourosh Kiani; Razieh Rastgoo; Alireza Chaji; Sergio Escalera
Abstract
Image inpainting, the process of restoring missing or corrupted regions of an image by reconstructing pixel information, has recently seen considerable advancements through deep learning-based approaches. Aiming to tackle the complex spatial relationships within an image, in this paper, we introduce ...
Read More
Image inpainting, the process of restoring missing or corrupted regions of an image by reconstructing pixel information, has recently seen considerable advancements through deep learning-based approaches. Aiming to tackle the complex spatial relationships within an image, in this paper, we introduce a novel deep learning-based pre-processing methodology for image inpainting utilizing the Vision Transformer (ViT). Unlike CNN-based methods, our approach leverages the self-attention mechanism of ViT to model global contextual dependencies, improving the quality of inpainted regions. Specifically, we replace masked pixel values with those generated by the ViT, utilizing the attention mechanism to extract diverse visual patches and capture discriminative spatial features. To the best of our knowledge, this is the first instance of such a pre-processing model being proposed for image inpainting tasks. Furthermore, we demonstrate that our methodology can be effectively applied using a pre-trained ViT model with a pre-defined patch size, reducing computational overhead while maintaining high reconstruction fidelity. To assess the generalization capability of the proposed methodology, we conduct extensive experiments comparing our approach with four standard inpainting models across four public datasets. The results validate the efficacy of our pre-processing technique in enhancing inpainting performance, particularly in scenarios involving complex textures and large missing regions.
H.6.5.2. Computer vision
M. Karami; A. Moosavie nia; M. Ehsanian
Abstract
In this paper we address the problem of automatic arrangement of cameras in a 3D system to enhance the performance of depth acquisition procedure. Lacking ground truth or a priori information, a measure of uncertainty is required to assess the quality of reconstruction. The mathematical model of iso-disparity ...
Read More
In this paper we address the problem of automatic arrangement of cameras in a 3D system to enhance the performance of depth acquisition procedure. Lacking ground truth or a priori information, a measure of uncertainty is required to assess the quality of reconstruction. The mathematical model of iso-disparity surfaces provides an efficient way to estimate the depth estimation uncertainty which is believed to be related to the baseline length, focal length, panning angle and the pixel resolution in a stereo vision system. Accordingly, we first present analytical relations for fast estimation of the embedded uncertainty in depth acquisition and then these relations, along with the 3D sampling arrangement are employed to define a cost function. The optimal camera arrangement will be determined by minimizing the cost function with respect to the system parameters and the required constraints. Finally, the proposed algorithm is implemented on some 3D models. The simulation results demonstrate significant improvement (up to 35%) in depth uncertainty in the achieved depth maps compared with the traditional rectified camera setup.