Earth observation satellites usually acquire a high-resolution image with a very limited number of spectral bands along with a lower-resolution image that accurately encodes the spectral responses of objects in the scene. Satellite image fusion, also known as pansharpening or hypersharpening depending on the characteristics of the data, aims to combine the spatial and spectral information into a single high-resolution multispectral or hyperspectral image. The resulting image is then used in a wide variety of remote sensing applications, for which low ground sampling distance and detailed description of the chemical-physical composition of the objects may be required. In this chapter, we review the state of the art of satellite image fusion, emphasizing the crucial role that the modeling of the problem plays in performance and analyzing how deep learning has changed the paradigm. This study enables comprehending the evolution of various approaches and their respective outcomes. We establish a fair comparison process, standardizing a general strategy to train and evaluate fusion methods. Quantitative and qualitative comparisons are conducted on several datasets with distinct resolutions and sensor characteristics. The source code used for the comparison are published freely for non-commercial use.
Pansharpening is the fusion process that combines the geometry of a high-resolution panchromatic image with the spectral information encoded in a low-resolution multispectral image. We introduce a back-projection method to minimize the reconstruction error between the target image and the output produced by the Brovey pansharpening model. We replace the back-projection kernel with a residual network that incorporates a nonlocal module, exploiting self-similarity and built upon the multi-head attention mechanism. Experimental validation showcases that our method achieves state-of-the-art results.
This work proposes a learning-based semantic segmentation approach to detect floating plastic litter on the marine surface using Sentinel-2 satellite data. We adopt a convolutional network with a reduced number of parameters and, apart from the multispectral data, we feed specific indexes to assist plastic segmentation. Our approach compares favorably against other learning-based methods tailored for the same task.
Unlike the human vision system, which is able to recognize color under different illumination conditions, the information captured by a digital camera highly depends on the light reflected by the objects in the scene. The Retinex theory assumes that a digital image can be decomposed into illumination and reflectance component. In this work, we propose two variational models to solve the ill-posed inverse problem of estimating illumination and reflectance from a given observation. In both approaches, nonlocal regularization exploiting image self-similarities is used to estimate the reflectance, since it is assumed to contain fine details and texture. The difference between both models comes from the selected prior for the illumination component. The Sobolev norm, which promots smooth solutions, and the total variation semi-norm, which favours piecewise constant solutions, are independently proposed. A theoretical analysis of the resulting energy functionals is provided in suitable functional spaces.
Classical variational methods for solving image processing problems are more interpretable and flexible than pure deep learning approaches, but their performance is limited by the use of rigid priors. Deep unfolding networks combine the strengths of both by unfolding the steps of the optimization algorithm used to estimate the minimizer of an energy functional into a deep learning framework. In this paper, we propose an unfolding approach to extend a variational model exploiting self-similarity of natural images in the data fidelity term for single-image super-resolution. The proximal, downsampling and upsampling operators are written in terms of a neural network specifically designed for each purpose. Moreover, we include a new multi-head attention module to replace the nonlocal term in the original formulation. A comprehensive evaluation covering a wide range of sampling factors and noise realizations proves the benefits of the proposed unfolding techniques. The model shows to better preserve image geometry while being robust to noise.
The rapid expansion of technology has popularized the use of image processing techniques in several fields. However, images captured under low-light conditions impose significant limitations on the performance of these methods. Therefore, improving the quality of these images by discounting the effect of the illumination is crucial. In this paper, we present a low-light image enhancement method based on the Retinex theory. Our approach estimates illumination and reflectance in two steps. First, the illumination is obtained as the minimizer of an energy functional involving total variation regularization, which favours piecewise smooth solutions. Afterwards, the reflectance component is computed as the minimizer of an energy involving contrast-invariant nonlocal regularization and a fidelity term preserving the largest gradients of the input image.
The fusion of multi-source data with different spatial and spectral resolutions is a crucial task in many remote sensing and computer vision applications. Model-based fusion methods are more interpretable and flexible than pure data-driven learning networks, however, their performance depends greatly on the established fusion model and the hand-crafted prior. In this work, we propose an end-to-end trainable model-based network for hyperspectral and panchromatic image fusion. We introduce an energy functional that takes into account classical observation models and incorporates a high-frequency details injection constraint. The resulting optimization function is solved by a forward-backward splitting algorithm and unfolded into a deep-learning framework that uses two modules trained in parallel to ensure both data observation fitting and constraint compliance. Extensive experiments are conducted on the remote-sensing hyperspectral PRISMA dataset and on the CAVE dataset, proving the superiority of the proposed deep unfolding network both qualitatively and quantitatively.
Pansharpening aims to fuse the geometry of a high-resolution panchromatic image with the color information of a low-resolution multispectral image to generate a high-resolution multispectral image. Classical variational methods are more interpretable and flexible than pure deep learning approaches, but their performance is limited by the use of rigid priors. In this paper, we efficiently combine both techniques by introducing a shallow residual network to learn the regularization term of a variational pansharpening model. The proposed energy includes the classical observation model for the multispectral data and a constraint to preserve the geometry encoded in the panchromatic. The experiments demonstrate that our method achieves state-of-the-art results.
In this work, we introduce a novel variational model for image restoration. In particular, we study the suitability of exploit- ing self-similarity of natural images in the fidelity term. Tra- ditionally, this cue has been used for the regularization term, promoting to align similarities in the degraded image with similarities in the restored one. In contrast, our proposed non- local data-fidelity term penalizes deviations of patches after having suffered from the degradation process if they are sim- ilar in the degraded image. Experiments on super-resolution, denoising and depth filtering show the competitiveness of this new formulation with respect to traditional nonlocal regular- ization terms and recent learning-based methods.
The fusion of multisensor data has attracted a lot of attention in computer vision, particularly among the remote sensing community. Hyperspectral image fusion consists in merging the spectral information of a hyperspectral image with the geometry of a multispectral one in order to infer an image with high spatial and spectral resolutions. In this paper, we propose a variational fusion model with a nonlocal regularization term that encodes patch-based filtering conditioned to the geometry of the multispectral data. We further incorporate a radiometric constraint that injects the high frequencies of the scene into the fused product with a band per band modulation according to the energy levels of the multispectral and hyperspectral images. The proposed approach proved robust to noise and aliasing. The experimental results demonstrate the performance of our method with respect to the state-of-the-art techniques on data acquired by commercial hyperspectral cameras and Earth observation satellites.
Local patch-based algorithms for image registration fail to accurately match points in areas not discriminative enough, mainly textureless regions. These methods normally involve a validation process and provide a non-completely dense solution. In this paper, we propose a novel refinement and completion approach for registration. The proposed model combines single image nonlocal densification with classical variational image registration. We associate a total variation regularization with a nonlocal term to provide a smooth solution leveraging the image geometry. We show experiments on public stereo and optical flow datasets to filter and densify incomplete depth maps and motion fields. Extensive comparisons against existing and state-of-the-art depth/motion fields densification approaches demonstrate the competitive performance of the introduced method. Additionally, we illustrate how our method can deal with other tasks, such as filtering and interpolation of depth maps from RGBD data and depth upsampling.
Demosaicking and denoising are key steps in the camera imaging chain for both images and videos. The reconstruction errors during these stages will have undesirable effects on the final result if not handled properly. Demosaicking provokes the spatial and color correlation of noise, which is afterwards enhanced by the processing pipeline. This structured noise generally degrades the image quality and, for dark scenes with low signal to noise ratio, prevents the correct interpretation of the image. When trying to mitigate such structured noise on already processed data, denoising methods attenuate details and texture. We present a video processing chain, consisting of a novel strategy for the removal of noise at the camera sensor and a novel video demosaicking algorithm. In both cases, a spatio-temporal patch-based filter with motion compensation is introduced. The experimental results, including real examples, illustrate the performance of the proposed chain, avoiding the creation of interpolation artefacts and colored spots.
The classical multi-image super-resolution model assumes that the super-resolved image is related to the low-resolution frames by warping, convolution and downsampling. State-of-the-art algorithms either use explicit registration to fuse the information for each pixel in its trajectory or exploit spatial and temporal similarities. We propose to combine both ideas, making use of inter-frame motion and exploiting spatio-temporal redundancy with patch-based techniques. We introduce a non-linear filtering approach that combines patches from several frames not necessarily belonging to the same pixel trajectory. The selection of candidate patches depends on a motion-compensated 3D distance, which is robust to noise and aliasing. The selected 3D volumes are then sliced per frame, providing a collection of 2D patches which are finally averaged depending on their similarity to the reference one. This makes the upsampling strategy robust to flow inaccuracies and occlusions. Total variation and nonlocal regularization are used in the deconvolution stage. The experimental results demonstrate the state-of-the-art performance of the proposed method for the super-resolution of videos and light-field images. We also adapt our approach to multimodal sequences when some additional data at the desired resolution is available.
Pansharpening techniques aim at fusing a low-spatial resolution multispectral (MS) image with a higher spatial resolution panchromatic (PAN) image to produce an MS image at high spatial resolution. Despite significant progress in the field, spectral and spatial distortions might still compromise the quality of the results. We introduce a restoration strategy to mitigate artifacts of fused products. After applying the principal component analysis transform to a pansharpened image, the chromatic components are filtered conditionally to the geometry of PAN. The structural component is then replaced by the locally histogram-matched PAN for spatial enhancement. Experimental results illustrate the efficiency of the proposed restoration chain.
Local algorithms for stereo fail to match accurately points in areas not discriminative enough, mainly texture-less regions. We propose a method for filtering incorrect depth estimates and filling in those areas which have not been matched. The proposed model combines total variation regularization with a non local term taking advantage of image self similarity. The method can be easily generalized to deal with other tasks, as for example stereo upsampling. The method is compared with state-of-the-art stereo methods, showing competitive results on the Middlebury stereo database.
The demosaicking provokes the spatial and color correlation of noise, which is afterwards enhanced by the imaging pipeline. The correct removal previous or simultaneously with the demosaicking process is not usually considered in the literature. We present a novel joint demosaicking and denoising algorithm for image sequences. The proposed algorithm uses a spatio-temporal patch method modifying all pixels, including those of the Bayer CFA. However, only original values are considered for averaging. The experimentation, including real examples, illustrates how a joint denoising and demosaicking algorithm avoids the creation of artifacts and colored spots in the final image.
We propose a new convex variational model for hyperspectral and multispectral image fusion. Our approach introduces nonlocal regularization conditioned to the geometry of the multispectral image and incorporates a constraint forcing the fusion product and the multispectral data to share modulated high frequencies. The proposed method is compared with state-of-the-art fusion techniques, showing competitive results for several quality metrics on different data.
Most satellites decouple the acquisition of a panchromatic image at high spatial resolution from the acquisition of a multispectral image at lower spatial resolution. Pansharpening is a fusion technique used to increase the spatial resolution of the multispectral data while simultaneously preserving its spectral information. In this paper, we consider pansharpening as an optimization problem minimizing a cost function with a nonlocal regularization term. The energy functional which is to be minimized decouples for each band, thus permitting the application to misregistered spectral components. This requirement is achieved by dropping the, commonly used, assumption that relates the spectral and panchromatic modalities by a linear transformation. Instead, a new constraint that preserves the radiometric ratio between the panchromatic and each spectral component is introduced. An exhaustive performance comparison of the proposed fusion method with several classical and state-of-the-art pansharpening techniques illustrates its superiority in preserving spatial details, reducing color distortions, and avoiding the creation of aliasing artifacts.
The goal of super-resolution is to fuse several low-resolution images of the same scene into a single one with increased resolution. The classical formulation assumes that the super-resolved image is related to the low-resolution frames by warping, convolution and subsampling. Algorithms divide into those using explicit registration and those avoiding it. The first ones combine for each pixel the information in its estimated trajectory. The second ones exploit both spatial and temporal redundancy. We propose to combine both ideas, making use of optical flow and exploiting spatio-temporal redundancy with patch-based techniques. The proposed non-linear filtering takes into account patch similarities, automatically correcting the flow inaccuracies and avoiding the need of occlusion detection. Total variation and nonlocal regularization are used for the deconvolution stage.
Optical flow methods try to estimate a dense correspondence field describing the motion of the objects in an image sequence. We introduce novel nonlocal regularizing constraints for variational optical flow computation. While the use of similarity weights has been restricted to the regularization term so far, the proposed data terms permit to implicitly use the image geometry in order to regularize the flow and better locate motion discontinuities. The experimental results illustrate the superiority of the new constraints with respect to the classical brightness constancy assumption as well as to nonlocal regularization strategies.
Common satellite imagery products consist of a panchromatic image at high spatial resolution and several misregistered spectral bands at lower resolution. Pansharpening is the fusion process by which a high-resolution multispectral image is inferred. We propose a variational model for which pan-sharpening is defined as an optimization problem minimizing a cost function with nonlocal regularization. We incorporate a new term preserving the radiometric ratio between the panchromatic and each spectral band. The resulting model is channel-decoupled, thus permitting the application to misregistered spectral data. The experimental results illustrate the superiority of the proposed method to preserve spatial details, reduce color artifacts, and avoid aliasing.
Even after two decades, the total variation (TV) remains one of the most popular regularizations for image processing problems and has sparked a tremendous amount of research, particularly on moving from scalar to vector-valued functions. In this paper, we consider the gradient of a color image as a three-dimensional matrix or tensor with dimensions corresponding to the spatial extent, the intensity differences between neighboring pixels, and the spectral channels. The smoothness of this tensor is then measured by taking different norms along the different dimensions. Depending on the types of these norms, one obtains very different properties of the regularization, leading to novel models for color images. We call this class of regularizations collaborative total variation (CTV). On the theoretical side, we characterize the dual norm, the subdifferential, and the proximal mapping of the proposed regularizers. We further prove, with the help of the generalized concept of singular vectors, that an l^∞ channel coupling makes the most prior assumptions and has the greatest potential to reduce color artifacts. Our practical contributions consist of an extensive experimental section, where we compare the performance of a large number of collaborative TV methods for inverse problems such as denoising, deblurring, and inpainting.
In this paper, we propose a novel framework for restoring color images using nonlocal total variation (NLTV) regularization. We observe that the discrete local and nonlocal gradient of a color image can be viewed as a 3D matrix or tensor with dimensions corresponding to the spatial extend, the differences to other pixels, and the color channels. Based on this observation we obtain a new class of NLTV methods by penalizing the l^p,q,r norm of this 3D tensor. Interestingly, this unifies several local color total variation (TV) methods in a single framework. We show in several numerical experiments on image denoising and deblurring that a stronger coupling of different color channels – particularly, a coupling with the l^∞ norm – yields superior reconstruction results.
Image restoration is the problem of recovering an original image from an observation of it in order to extract the most meaningful information. In this paper, we study this problem from a variational point of view through the minimization of energies composed of a quadratic data-fidelity term and a nonsmooth nonconvex regularization term. In the discrete setting, existence of minimizer is proved for arbitrary linear operators. For this kind of problems, fully segmented solutions can be found by minimizing objective nonconvex functionals. We propose a dual formulation of the model by introducing an auxiliary variable with a double function. On one hand, it marks the edges and it ensures their preservation from smoothing. On the other hand, it makes the criterion half-linear in the sense that the dual energy depends linearly on the gradient of the image to be recovered. This leads to design an efficient optimization algorithm with wide applicability to several image restoration tasks such as denoising and deconvolution. Finally, we present experimental results and we compare them with TV-based image restoration algorithms.
In this paper, we propose a new dual algorithm for the minimization of discrete nonconvex functionals, called half-linear regularization. Our approach alternates the calculation of a explicit weight with the minimization of a convex functional with respect to the solution. This minimization corresponds to the weighted total variation which is solved via the well-known Chambolle’s algorithm. Finally, we present experimental results by applying it to some image restoration problems as denoising and deconvolution.
Most common cameras use a CCD sensor device measuring a single color per pixel. The other two color values of each pixel must be interpolated from the neighboring pixels in the so-called demosaicking process. State-of-the-art demosaicking algorithms take advantage of interchannel correlation locally selecting the best interpolation direction. These methods give impressive results except when local geometry cannot be inferred from neighboring pixels or channel correlation is low. In these cases, they create interpolation artifacts. We introduce a new algorithm involving nonlocal image self-similarity in order to reduce interpolation artifacts when local geometry is ambiguous. The proposed algorithm introduces a clear and intuitive manner of balancing how much channel-correlation must be taken advantage of. Comparison shows that the proposed algorithm gives state-of-the-art methods in several image bases.
Pansharpening refers to the fusion process of inferring a high-resolution multispectral image from a high-resolution panchromatic image and a low-resolution multispectral one. In this paper we propose a new variational method for pansharpening which incorporates a nonlocal regularization term and two fidelity terms, one describing the relation between the panchromatic image and the high-resolution spectral channels and the other one preserving the colors from the low-resolution modality. The nonlocal term is based on the image self-similarity principle applied to the panchromatic image. The existence and uniqueness of minimizer for the described functional is proved in a suitable space of weighted integrable functions. Although quite successful in terms of relative error, state-of-the-art pansharpening methods introduce relevant color artifacts. These spectral distortions can be significantly reduced by involving the image self-similarity. Extensive comparisons with state-of-the-art algorithms are performed.
Denoising is the problem of removing the inherent noise from an image. The standard noise model is additive white Gaussian noise, where the observed image f is related to the underlying true image u by the degradation model f=u+η, and η is supposed to be at each pixel independently and identically distributed as a zero-mean Gaussian random variable. Since this is an ill-posed problem, Rudin, Osher and Fatemi introduced the total variation as a regularizing term. It has proved to be quite efficient for regularizing images without smoothing the boundaries of the objects. This paper focuses on the simple description of the theory and on the implementation of Chambolle’s projection algorithm for minimizing the total variation of a grayscale image. Furthermore, we adapt the algorithm to the vectorial total variation for color images. The implementation is described in detail and its parameters are analyzed and varied to come up with a reliable implementation.