A Nonlocal Denoising Algorithm for ManifoldValued Images Using Second Order Statistics
Abstract
Nonlocal patchbased methods, in particular the Bayes’ approach of Lebrun, Buades and Morel [41], are considered as stateoftheart methods for denoising (color) images corrupted by white Gaussian noise of moderate variance. This paper is the first attempt to generalize this technique to manifoldvalued images. Such images, for example images with phase or directional entries or with values in the manifold of symmetric positive definite matrices, are frequently encountered in realworld applications. Generalizing the normal law to manifolds is not canonical and different attempts have been considered. Here we focus on a straightforward intrinsic model and discuss the relation to other approaches for specific manifolds. We reinterpret the Bayesian approach of Lebrun et al. [41] in terms of minimum mean squared error estimation, which motivates our definition of a corresponding estimator on the manifold. With this estimator at hand we present a nonlocal patchbased method for the restoration of manifoldvalued images. Various proof of concept examples demonstrate the potential of the proposed algorithm.
sectioning \setkomafonttitle \setkomafontdescriptionlabel \subtitleExtended Version
1 Introduction
In many situations where measurements are taken the obtained data are corrupted by noise,
and typically one uses a stochastic model to describe the recorded data.
If there are several, independent factors that may have an influence on the data acquisition,
the central limit theorem suggests to model the noise as additive white Gaussian noise.
This is also the standard noise model one encounters in image analysis, see, e.g., [31].
One might of course wonder whether this noise modeling is realistic and in fact,
in many situations the image formation process already suggests a nonGaussian model,
e.g. Poisson noise in the case where images are obtained based on photon counting with a CCD device.
But also in these cases, in order to benefit from the rich knowledge and all the appealing properties of the normal distribution,
one often tries to transform the image in such a way that the assumption of Gaussian white noise is at least approximately fulfilled.
For instance, for the Poisson noise this can be achieved by the so called Anscombe transform [4].
Much effort has been spent on the denoising of images corrupted with white Gaussian noise
and a huge amount of methods have been proposed in the literature.
Among others we mention variational models with total variation regularizers [60] and many extensions thereof,
denoising based on sparse representations over learned dictionaries [26],
nonlocal means [30, 63, 80, 77] and their generalizations [17, 23, 38, 63],
the piecewise linear estimator from Gaussian mixture models (PLE, EPLE) [80, 73]
and SURE guided Gaussian mixture models [74],
patchordering based wavelet methods [56],
the expected patch loglikelihood (EPLL) algorithm [81]
or better
its multiscale variant [52],
BM3D [21] and BM3DSAPCA [22],
and
the nonlocal Bayes’ algorithm of Lebrun et al. [40, 41].
The latter can be viewed as an optimized reinterpretation of the two step image denoising method (TSID) [64, 78]
in a Bayesian framework.
For a recent review of the denoising problem and the different denoising principles
we refer to [42].
Currently, nonlocal patchbased methods
achieve the best results and the quality of the denoised images
has become excellent for moderate noise levels.
Even more, based on experiments with a set of 20.000 images containing about patches
the authors of [47] conjecture that for natural images,
the recent patchbased denoising methods might already be close to optimality.
Their conjecture points in the same direction as the paper of Chatterjee et al. [16], who raised the question „Is denoising dead?”.
The situation described above completely changes when dealing with manifoldvalued images instead of realvalued ones,
a situation which is frequently encountered in applications.
For instance, images with values on the circle (periodic data)
appear in interferometric synthetic aperture radar [13, 24],
in applications involving the phase of Fourier transformed data [10],
or when working with the huecomponent of an image in the HSV color space.
Spherical data play a role when dealing with 3D directional information [39, 70]
or
in the representation of a color image in the chromaticitybrightness (CB) color space [15].
SO(3)valued data appear in electron backscattered tomography [8, 9].
Finally, to mention only a few examples,
images with symmetric positive definite matrices as values are handled in
DTMRI imaging [18, 28, 69, 75, 77]
or when covariance matrices are associated to image pixels [68].
Recently, some methods for the denoising of manifoldvalued images have been suggested,
among them variational approaches using embeddings in higher dimensional spaces [59] or
based on (generalized) TVregularization [7, 11, 46, 66, 76].
In this paper, we aim at generalizing the nonlocal patchbased denoising of Lebrun et al. [40, 41] to manifoldvalued images.
However, for general manifolds, already the question of how to define Gaussian white noise (or, more general, a normal distribution)
is not canonically solved.
Different approaches have been proposed in the literature,
either by making use of characterizing properties of the realvalued normal distribution as for instance in [51, 53]
or by restricting to particular manifolds such as spheres, see, e.g. [48],
the simplex [49], or symmetric positive definite matrices [61].
In this paper, we adopt a simple model for a normal distribution and in particular for Gaussian white noise on a manifold
and discuss its relationship to existing models.
We review the minimum mean squared error estimator in the Euclidean setting,
which coincides with those of the Bayesian approach in [41]
under the normal distribution assumption.
This motivates our definition of a corresponding estimator on the manifold and gives rise
to a nonlocal patchbased method for the restoration of manifoldvalued images.
The outline of this paper is as follows:
in Section 2 we reinterprete the nonlocal Bayes algorithm of Lebrun et al. [40, 41]
in a minimum mean square error estimation setting. This review in the Euclidean setting is necessary to understand
its generalization to manifoldvalued images.
In Section 3 we introduce the notation on manifolds.
Then, in Section 4 we detail the nonlocal patchbased denoising algorithm for manifold valued images.
This requires to precise what we mean by the normal law on the manifolds of interest.
We discuss the relation between this model and other existing ones.
In Section 5 we provide several numerical examples to demonstrate that our denoising approach
is indeed computationally manageable.
Examples, yet academical, for cyclic and directional data, and for images with values in the manifold of symmetric positive definite matrices
show the potential of nonlocal techniques for manifoldvalued images.
Specific realworld applications are not within the scope of this paper.
Finally, we draw conclusions
and initiate further directions of research in Section 6.
2 Nonlocal PatchBased Denoising of RealValued Images
In this section we consider the nonlocal Bayesian image denoising method of Lebrun et al. [40, 41]. In contrast to these authors we prefer to motivate the method by a minimum mean square estimation approach. One reason is that the best linear unbiased estimator in (9) has a similar form as the MMSE, but does not rely on the assumption that the random variables are jointly normally distributed. This leaves potential for future work, e.g. when extending the model to other distributions than the normal distribution.
2.1 Minimum MeanSquare Estimator
Let be a probability space and and two random vectors. We wish to estimate given , i.e., we seek an estimator such that approximates . A common quality measure for this task is the mean square error , which gives rise to the definition of the minimum mean square estimator (MMSE)
where denotes the algebra generated by and stands for all measurable random variables , see, e.g., [43]. Under weak additional regularity assumptions on the estimator , the LehmannScheffé theorem [44, 45] states that the general solution of the minimization problem is determined by
In general it is not possible to give an analytical expression of the MMSE. One exception constitutes of the normal distribution. Recall that a random vector is normally distributed with mean and covariance matrix , , if and only if there exists a random vector , whose components are independent realvalued standard normally distributed random variables and a matrix , such that , where is the rank of the covariance matrix . If has full rank, then the probability density function (pdf) of with respect to the Lebesgue measure is given by
(1) 
where denotes the determinant of . In view of the next section it is useful to recall some properties of the normal distribution.
Remark 2.1.
(Properties of Gaussian distribution on )

The Gaussian density function in (1) maximizes the entropy
over all density functions on with fixed mean and covariance matrix . 
Let , , be i.i.d. realizations of an absolutely continuous distribution having first and second moments, denoted by and . Then the likelihood function reads as and the maximum likelihood (ML) estimator is defined as
It holds that
(2) if and only if the density function is of the form (1), see, e.g., [25, 65]. For the normal distribution the ML estimator of the covariance matrix reads as
(3) 
The density function of the standard normal distribution with the identity matrix is the kernel of the heat equation.
In order to compute the MMSE estimator for Gaussian random variables we need to determine the conditional distribution of given . It is well known that, if and are jointly normally distributed, i.e.,
then the conditional distribution of given is normally distributed as well and reads as
where
As a consequence we obtain for normally distributed random vectors the MMSE estimator
(4) 
In our situation (denoising) fits into the above framework if we set
(5) 
where we assume that and are independent and is known. Then , and by the independence of and further and
(6) 
Now, the MMSE of given in (4) becomes
(7)  
(8) 
Two remarks may be useful to see the relation to other estimators.
Remark 2.2.
(Relation between MMSE and BLUE)
The estimator of the general form
(9) 
makes also sense for more general distributions. It is known as best linear unbiased estimator (BLUE), as it is an unbiased estimator which has minimum meansquare error among all affine estimators. For jointly normally distributed and it coincides with .
Remark 2.3.
(Relation between MMSE and MAP)
The MMSE can also be derived in a Bayesian framework under a Gaussian prior (see, e.g. [27]), which is detailed in the following.
Let , , , where and are independent.
This implies and , so that the respective densities are given by
and
By Bayes’ formula we have
and therewith, the maximum a posteriori (MAP) estimate reads as
Setting the gradient to zero results in
Observing that and , we obtain finally
In practical applications, the parameters and are unknown and need to be estimated using realizations (observations) of . Here we use the ML estimators given in (2) and (3). Note that the ML estimator for the covariance matrix is slightly biased. Instead we could also deal with an unbiased estimator by replacing the averaging factor by . However, the numerical difference is negligible for large .
Summarizing our findings, we obtain the following empirical estimator
(10) 
Remark 2.4 (Positive definiteness of the empirical covariance matrix).
Equation (10) contains via (6) the hidden assumption that However, based on the empirical covariance matrix it is not necessarily ensured that is positive semidefinite and thus a valid covariance matrix. There are different ways to overcome this problem, e.g. replacing negative eigenvalues by a small positive value as for instance proposed in [57], compare also the discussion in [40, Section 3.5] or [79, page 406]. In our numerical experiments we did not observe that this issue had negative impacts on the results.
2.2 Denoising Using the MMSE Approach
Next we describe how the results of the previous section can be used for image denoising. To this aim, let be a discrete grayvalue image, defined on a grid . By a slight abuse of notation we also write , where for the columnwise reshaped version of the image. It will be always clear from the context to which notation we refer. We assume that the image is corrupted with white Gaussian noise, i.e.,
where
is now a realization of . Based on we wish to reconstruct the original image .
We use the fact that natural images are to some extend selfsimilar,
i.e., small similar patches may be found several times in the image, and that for these patches locally
a normality assumption holds approximately true, see, e.g. [81].
To formalize this idea, consider
an
neighborhood (patch) centered at , where , .
After vectorization this corresponds to a realization of an dimensional
normally distributed random vector , where
.
This patch is referred to as a reference patch in the following.
Similar patches are interpreted as other realizations of
.
There are several strategies to define similar patches.
Take for example, for a fixed ,
the nearest patches with respect to the Euclidean distance
in a search window around , where
, .
Let denote the set of centers of patches similar to .
Then the estimates of the expectation value (2) and the covariance (3) become
The obtained estimates are then used to restore the reference patch and all its similar patches with (10) as:
(11) 
Proceeding as above for all pixels yields a variable number of estimates for each pixel. Therewith, the final estimate at pixel is obtained as an average over all patches containing the pixel (aggregation). There are some finetuning steps that were partly also considered in [40, 41]. This is summarized in the following remark.
Remark 2.5.
(Fine tuning steps)

Boundaries: Special attention has to be paid to patches at the boundaries of an image. There are at least two possibilities: Either, one extends the image, e.g. by mirroring, or one considers only patches lying completely inside the image together with appropriately smaller search windows. To our opinion the second strategy is preferable since it does not introduce artificial information. However, it leads to less estimates at the boundaries of the image, but we observed that this does not yield visible artifacts in practice.

Flat areas: Flat areas, where differences between patches are only caused by noise, require a special consideration, as it is very likely that the estimated covariance matrix will not have full rank. In this case, the patches are better denoised by only using their mean. Flat areas might be detected using the empirical variance of the patches, which is close to .

Second step: The similarity of patches and the covariance structure of the patches can be better estimated using the first step denoised image as an oracle image for a second step.

Acceleration: To speed up the denoising procedure, each patch that has been used (and therefore denoised at least once) in a group of similar patches is not considered as reference patch anymore. Nevertheless, it may be denoised several times by being potentially chosen in other groups.
The whole denoising procedure is given in Algorithm 1.
We would like to point out the differences between the two steps, which look at first glance very similar: Step 2 uses the
denoised image from Step 1 in order to find similar patches and to estimate the covariance matrix, but reuses the original noisy image for the other computations, i.e, for the mean patch and the restored image.
The overall approach can be generalized to images with values in , , in a straightforward way, dealing now with dimensional random vectors, where . In particular, RGBcolor images () can be denoised in this way. At this point, considering the three color channels independently does usually not yield good results as there is a significant correlation between the red, the green, and the blue color channel. This correlation is correctly taken into account in the threevariate setting. As an alternative, Lebrun et al. [41] suggested to work in the socalled color space [50], which is a variant of the YUV space where transform from the RGB space is orthogonal and thus does not change the noise statistics. This color system separates geometric from chromatic information and thereby decorrelates the color channels, so that treating them independently does not create noticeable color artifacts as it would be the case in the RGB space.
3 Random Points on Manifolds
Instead of valued images we are now interested in images having values in a dimensional manifold . We start by introducing the necessary notation in Riemannian manifolds. In our numerical examples we will deal with images having components on the sphere equipped with the Euclidean metric of the embedding spaces , , and the manifold of positive definite matrices , , with the affine invariant metric. For these manifolds the specific expressions of the following quantities are given in Appendix B. Further, we will consider the open probability simplex , , with the RaoFisher metric obtained from the categorial distribution and the hyperbolic manifold , , equipped with the Minkowski metric. Besides many textbooks on differential geometry the reader may have a look into Pennec’s paper [53] to get an overview. We adapted our notation to this paper.
Manifolds
If not stated otherwise, let be a complete, connected dimensional Riemannian manifold. All of the previously mentioned manifolds are complete, except for the probability simplex. Observe that we will work with patches of dimensional manifolds such that we finally deal with product manifolds of dimension with the usual product metric. By we denote the tangent space of at and by the Riemannian metric. Let , , , be the geodesic starting from with . Since is complete, the exponential map with
is welldefined for every . The exponential map realizes a local diffeomorphism (exponential chart) from a “sufficiently small neighborhood” of the origin of into a neighborhood of . To precise how large this “small neighborhood” can be chosen, we follow the geodesic from to infinity. It is either minimizing all along or up to a finite time and not any longer afterwards. In the latter case, is called cut point and the corresponding tangent vector is called tangential cut point. The set of all cut points of all geodesics starting from is the cut locus and the set of corresponding vectors the tangential cut locus. Then the open domain around bounded by the tangential cut locus is the maximal domain for which the exponential chart at is injective. It is connected and starshaped with respect to and . This allows to define the inverse exponential map as
For the sphere the cut locus of is just its antipodal point . Thus the tangential cut locus is the ball with radius around and its boundary. For Hadamard manifolds which are complete, simplyconnected manifolds with nonpositive sectional curvature [6], as or , we have that .
The Riemannian metric yields a distance function
on the manifold by
and a measure written
in local coordinates by
, where
.
and
Random Points
Let be a probability space and the Borel algebra on (with respect to ). A measurable map is called a random point on . We consider only absolutely continuous random points with probability density , i.e., for all and . The variance of with respect to a given point is defined as
(12) 
and local minimizers of are called Riemannian centers of mass [36]. For a discussion of the existence and uniqueness of global minimizers, known as Fréchet expectation or means of see, e.g., [1, 36, 37]. For Hadamard manifolds with curvature bounded from below the Riemannian center of mass exists and is unique. For the spheres , if the support of is contained in a geodesic ball of radius , then the Riemannian center of mass is unique within this ball and it is the global minimizer of (12). In the following we assume that the variance is finite and the cut locus has a probability measure zero at any point . Then a necessary condition for to be a Riemannian center of mass is
(13) 
For Hadamard manifolds with curvature bounded from below this condition is also sufficient.
Assuming that the mean is known,
we define the covariance matrix of (with respect to ) by
(14) 
In practice, typically and are unknown and need to be estimated. Given observations of a random point , we estimate the mean point by
(15) 
which is according to [12] a consistent estimator of and can be computed by a gradient descent algorithm, see, e.g. [2]. An estimator for the covariance matrix reads as
(16) 
4 Nonlocal PatchBased Denoising of ManifoldValued Images
In this section we propose an NLMMSE denoising algorithm for manifoldvalued images.
To this end, we have to specify what we mean by “normally distributed” random points on manifolds.
In contrast to the vector space setting, there does not exist a canonical definition
of a normally distributed random vector on a manifold
since various properties characterizing the normal distribution
on as those in Remark 2.1,
cannot be generalized to the manifold setting in a straightforward way.
Here we rely on a simple approach which transfers normally distributed zero mean random vectors
on tangent spaces via the exponential map to the manifold.
Based on this definition we will see how the NLMMSE from Section 2
carries over to manifoldvalued images.
4.1 Gaussian Random Points
In the following, we describe the Gaussian model used in this paper for Hadamard manifolds and spheres and discuss its relation to other models for small variances.
For each tangent space with fixed orthonormal basis we can identify the element with the local coordinate vector , which establishes an isometry between and . Note that also the expressions in (14) and (16) are basis dependent, but assuming a fixed basis the relation skipped for simplicity of notation. Now, let and let be the linear isometric mapping
(17) 
Let . Since is continuous, we have for any that is a Borel set and for any integrable function it holds
(18) 
where . Conversely, for any Borel set and any integrable function we see that and
(19) 
where If is standard normally distributed on with pdf , then
(20) 
is a random point on . For Hadamard manifolds, we have so that for ,
(21) 
Thus, has the pdf
(22) 
Note that by incorporating the factor into the density function we avoid problems as discussed in [35]. By construction and (13) it follows directly that the mean of is and the covariance (14) is . We consider as normally distributed on and write . In other words, is normally distributed on with mean and covariance if is standard normally distributed on .
If as it is the case for spheres, we assume that up to a set of Lebesgue measure zero , where is an index set and . Further, we suppose that there are diffeomorphisms such that for it holds . Then, in order to obtain the pdf of in (20), we have to replace in (22) by the wrapped function
(23) 
Now, we follow the same lines as in the Euclidean setting and agree that is normally distributed with mean and positive definite covariance if , respectively,
(24) 
and write .
The following proposition shows how the pdf
of a normally distributed random point (20) looks for
various onedimensional manifolds.
Proposition 4.1.
The pdf of a random point is given by

the lognormal distribution for ,
with respect to the measure on ;

the wrapped Gaussian distribution for ,
(25) with respect to the parameterization
and the Lebesgue measure ; 
the wrapped, even shifted Gaussian distribution for ,
(26) with respect to the parameterization , , and Lebesgue measure .
The proof of the proposition is given in the Appendix A.
The above definition (24) of normally distributed random points has the advantage that it adopts the affine invariance of the Gaussian distribution known from the Euclidean setting via the tangent space. Moreover, it is easy to sample from the distribution.
Remark 4.2 (Sampling from ).
Sampling of a distributed random variable can be performed as follows: i) sample from in , ii) apply which by (17) requires only the knowledge of an orthogonal basis in , and iii) map the result by to .
For the onedimensional manifolds in Proposition 4.1, the pdfs in (i)  (iii) are the kernels of the heat equations with the corresponding LaplaceBeltrami operators. This is in general not true for higher dimensions [33]. However, numerical experiments show that samples from the GaussWeierstrass kernel on [29, p. 112] and from the heat kernel on [67, p. 107] are very similar. For kernel density estimations on special Hadamard spaces we refer also to [19] and for kernels in connection with dithering on the sphere we refer to [32].
Neither the maximizing entropy nor the ML estimation property from Remark 2.1 generalize to the above setting. In [53] Pennec showed that under certain conditions on the pdf of a random point on that maximizes the entropy given prescribed mean value and covariance is of the form
(27) 
For the concrete definition of and the noise simulation according to this model in the case of see [61] and the Appendix C. For the models coincide by Proposition 4.1. For this distribution it is clear that the Riemannian center of mass is given by and that the ML estimator for is the empirical Karcher mean. Figure 1, left shows a image of realizations of normally distributed noise on by our model , where . For comparison, in Figure 1, right the same is shown for realizations of , also for . The noise looks visually very similar, which is also confirmed by Table 1. The first two rows of Table 1 present the estimated mean based on (15), the covariance matrix based on (16) and the estimated standard deviation for our noise model and those of Said et al..
Example 4.3.
Let be the hyperbolic manifold equipped with the Minkowski metric . The distance reads as and
(28)  
(29) 
We parametrize as
(30) 
First, we compute the pdf (22) of an distributed random point, where . We obtain