Implementation#
This Python library was developed based on the existing Matlab implementation for 1d and 2d, which was used as the primary reference (albeit possibly in earlier versions previously stored at the same locations), and the original paper as a secondary source, mostly to understand the nomenclature and general idea behind the method.
The Matlab implementation was first rewritten in Python to give the
exact same result, including intermediate steps, for the same test
case: a univariate normal sampling. See the verification
folder in the source-code repository for details.
Subsequently, the Python code was refactored in order to blend in
better with the existing software ecosystem, namely by leveraging
SciPy’s forward and backward discrete cosine transformation for one
or n dimensions, dct
/dctn
and idct
/idctn
,
as well as NumPy’s histogram
and
histogram2d
, instead of the custom versions
the Matlab reference employs.
The reference uses a cosine transformation with a weight for the very first component that is different from the one in any of the four types of the transformation supported by SciPy. There is an easy work-around for that, which is used in the current code. It should however be possible to rewrite the algorithm in a more elegant way, one that avoids the work-around altogether.
The Matlab implementation also bins the data somewhat differently in 1d vs. the 2d case. This minor inconsistency was removed. The change is arguably insignificant as far as the final results are concerned, but is a deviation nonetheless. In practical use, based on a handful of tests, both implementations yield indiscernible results.
The 2d density is returned in matrix index order, also known as
Cartesian indexing, in which the first index (the matrix row) refers
to the x-coordinate and the second index (the matrix column) to y.
This is consistent with results returned by other kernel density
estimations, such as SciPy’s, as well as NumPy’s 2d-histogram function.
When saving or displaying the 2d density as an image, a different
memory layout is expected and the index order has to be reversed: y
before x. This comes down to a simple transposition, i.e. adding
.T
in the code.
In very broad strokes, the method is this:
Bin the data on a regular grid.
Transform to Fourier space.
This leaves Gaussian kernels intact.
Gaussians are also elementary solutions to the diffusion equation.
Leverage this to define condition for optimal smoothing.
Find optimum by iteration in Fourier space.
Smooth transformed data with optimized Gaussian kernel.
Reverse transformation to obtain density estimation.