Thursday, May 21, 2015

Image convolution with the DFT multiplication property

Determine if image needs to be processed in sections or at once.
If in sections, need to keep track of boundaries (history).

Using FFT

The result of multiplying the DFTs of two matrices is equivalent to taking the circular convolution of one matrix with the other. The larger matrix is the image, the smaller one is the kernel.

Circular convolution in 2D wraps in the horizontal, vertical, and diagonal directions. It would be equivalent to linear convolution by the following image with a 3x3 kernel:

  78 | 2345678
  89 | 3456789
  67 | 1234567
  78 | 2345678
  89 | 3456789

The largest square above is the original image. If this image were circularly convolved with this kernel:

1 1 1
1 1 1
1 1 1

The resulting image would be:

array([[39, 27, 36, 45, 54, 63, 51],
       [39, 27, 36, 45, 54, 63, 51],
       [39, 27, 36, 45, 54, 63, 51]])

Performing the convolution with the original image in the frequency domain gives:

array([[51, 39, 27, 36, 45, 54, 63],
       [51, 39, 27, 36, 45, 54, 63],
       [51, 39, 27, 36, 45, 54, 63]])
Note that the columns are rotated by one to the right. Linear convolution of images is usually done starting with the center element of the kernel in the upper left corner of the image. This is done so that the output image will have the same dimensions as the input. The boundaries are often zero-padded.

This is only a practical convention, however. True convolution starts with the upper left corner of the kernel - the kernel is flipped in both directions - in the upper left corner of the image. This is analogous to 1D convolution. We have no control over how the convolution is performed when applying the DFT multiplication property. We're contrained by the mathematics: y = IFFT(X*H). So we have to set up the input matrices correctly to get the desired result.

In this case, the linear convolution was performed with the kernel hanging off the edge by one element in every direction. Or in other words, so that the center element (the anchor) was aligned with each boundary element.

On the other hand, multiplication in the frequency domain is equivalent to convolving with the kernel only overlapping the top row or left-most column on the boundary, and with no boundary overlap on the bottom row and right-most column. Again, this is analogous to 1D convolution. Of course, this is circular convolution, so this is just one way of visualizing it. The kernel is really only overlapping the image at all times, and there is no boundary padding.

If linear convolution were performed this way, there would be a pronounced boundary effect only on the top and left edges of the output, resulting in a biasing effect. Typically, we want to balance the boundary effect to all edges.
So how do we set up the input matrices for convolution via the Y=X*H property? We want to the output image to be the same as for linear convolution with a zero-padded image, e.g.:

0 0 0 0 0 0 0 0 0
0 1 2 3 4 5 6 7 0
0 2 3 4 5 6 7 8 0
0 3 4 5 6 7 8 9 0
0 0 0 0 0 0 0 0 0

We will also need to slice the output image, in order to remove unwanted boundary content and preserve the dimensions. This is directly analogous to removing filter delay from 1D signals.

First of all, the kernel needs to be zero-padded, so that it has the same dimensions as the input image. Otherwise, their DFTs couldn't be multiplied. Some scientific computing packages (e.g Matlab and Numpy) will zero-pad for you, according to the FFT order(s) passed to them.

The simplest way to zero-pad is to place the kernel in the upper left corner of a zeroed buffer. This is what the automatic zero-padding in NumPy will do. That way, after the matrix is flipped, convolution will start with the upper left corner of the kernel. There are other types of padding that can be used, but we'll use zeros for simplicity.

We need to zero-pad the input image in order to preserve the input dimensions in the output. The only difference here from direct convolution is that both image dimensions should be zero-padded to be powers of two, for FFT optimization.

Note that the result of DFT multiplication will be circular convolution with the zero-padded image, not the original, as the algorithm doesn't "know" what part is the original. Remember that circular convolution stops at the bottom and right edges, rather than sliding off them like linear convolution. So we'll want to zero-pad the bottom and right edges the same as for direct convolution: so that the center element of the kernel will overlap the edge elements.

A note here regarding terminology. We'll define "direct convolution" as convolution performed directly, i.e. without any transforms. In "linear convolution" the kernel slides off the edge of the image, without wrapping around. Unless otherwise stated, direct convolution is assumed to be linear.

So for an MxN kernel, the bottom and right edges should be padded with (M-1)/2 and (N-1)/2 (assuming M and N odd) rows/columns of zeros, respectively. We would then pad the upper and left boundaries to make the dimensions powers of two. This will make the slicing algorithm much simpler than if we tried to make the padding the same on all edges.

With linear convolution, we'd only want to remove (M-1)/2 rows and (N-1)/2 columns from the top and left of the output, respectively. This is, of course, analogous to removing filter delay from a 1D signal. It actually is filter delay, we just don't call it "delay" because it's an image instead of a time-varying signal.

For a KxL image, the vertical "delay" will be:
    nextpowof2(K + min_zero_pad - 1) - (K + min_zero_pad - 1) + 2*(linear_delay)
    nextpowof2(K+M-1) - (K+M-1) + 2*(M-1)/2

    = nextpowof2(K+M-1) - K
We start with minrows, which is the number of rows in the image plus the zero-padding, then extend it to a power of 2:

    minrows = (M-1)/2 + K + (M-1)/2 = K + M - 1
Not suprisingly, this is the formula for the length of a 1D convolution. Likewise, the horizontal "delay" will be:

    nextpowof2(L+N-1) - (L+N-1) + 2*(N-1)/2 = nextpowof2(L+N-1) - L
Again, this is only for odd kernel dimensions.

Note that if we hadn't zero-padded the input image before computing the FFT, it would have "chopped off" part of the right and bottom edges of the output image, when compared to direct convolution. It also would have shifted the image down and to the right, with some unintended boundary effect to the top and left. this boundary effect results from the kernel wrapping around the input image.

Understanding circular convolution on images

To better understand how circular convolution works in 2D, recall that circular convolution is based on the idea of periodicity. So imagine an infinite number of adjacent copies of the image:

  1234567 | 1234567 | 1234567
  2345678 | 2345678 | 2345678
  3456789 | 3456789 | 3456789
  1234567 | 1234567 | 1234567
  2345678 | 2345678 | 2345678
  3456789 | 3456789 | 3456789
  1234567 | 1234567 | 1234567
  2345678 | 2345678 | 2345678
  3456789 | 3456789 | 3456789

If we convolve a kernel with the center matrix, overlapping values from adjacent matrices will create the wrap-around effect discussed previously.

Friday, April 3, 2015

Sailfish OS: A truly open mobile OS?

Like many others, I've often wished that there was a mobile OS that was truly open, or at least closer to it than Android. Something with a user experience more like what you get with desktop Linux.

There is, of course, Ubuntu Touch, but I just haven't been all that impressed with it from a technical standpoint. Moreover, it seems that the Ubuntu project is endlessly setting ambitious milestones that it seldom reaches.

While researching the subject, I came across Sailfish OS, which seemed quite promising. Jolla, the creator of Sailfish OS, makes quite an issue about how much more "open source" their OS is than other more popular Mobile OSes. They even go so far as to call Sailfish an "open source" OS.

Technically, it's quite appealing. It has a swipe driven interface that does more than just take aspects of other mobile OSes and put them in an open source package. It solves a critical problem faced by anyone wanting to adopt an alternative mobile OS, by adding a compatibility layer for Android apps. So what's there not to like?

Well it turns out that, in spite of Jolla's marketing claims, it's not really open source. Here's a graphic showing the proprietary and open source components of Sailfish OS. Turns out that it's really just a proprietary user interface built on top of an open source plumbing layer, which isn't entirely open source either. Even the parts that are open source mostly come from outside of Jolla, most notably the Linux kernel.

So Sailfish, it appears, is really no more "open source" than Android. Perhaps even less so. Here's a discussion in Jolla's user forums that shows the frustration of would-be Sailfish adopters, resulting from Jolla's reticence regarding their plans for open sourcing their product. The comments made in that thread focus on open source as a sort of philosophical ideal. However, I think it overlooks a much more important aspect of open source software: Security.

One of the greatest benefits of open source software is that potential security flaws are scrutinized by so many different people. It's not dependent on the competence of developers within a particular company to evaluate vulnerabilities. It also makes it much more difficult for government entities to coerce developers into creating "back doors" in their software.

What this means for Sailfish, is that it's potentially much less secure than Android or iOS, because of the relatively small number of in-house developers reviewing the code.

Moreover, Sailfish is still a young project undergoing heavy development. But, if something doesn't work, all users can do is sit around and wait for the developers to fix it. What this all means is that Sailfish fails to provide its users with two of the most critical benefits of an open source project.

Perhaps, the one strong draw Sailfish still has is that its app ecosystem is built around Qt and HTML5, instead of Java. However, this is only a partial victory, as Android apps can also be developed with these toolkits, albeit with JAVA dependencies.

So, what does this mean for the future of open source OSes on mobile platforms? Sailfish is still, potentially, a strong contender for the hearts and minds of open source enthusiasts, if they'll provide a clear roadmap for open-sourcing their product, be more upfront in explaining why some components haven't been open-sourced, and do a better job of promoting community involvement.

There's also Tizen which, in spite of the "black mark" it bares because of it's association with big industry, seems closer to being truly open source. However, one major component, the SDK, has a closed-source license.

Of course, there are at least two truly open source alternatives: Ubuntu Touch and Plasma Active. However, they are in many ways technically inferior to the others, at least for now.

The specialized hardware and heated patent wars in the mobile world also present a significant barrier to the propagation of open-source on such platforms. So it may just be matter of giving it time, and waiting for things to cool down. As the mobile form factor becomes increasingly ubiquitous it will likely become more difficult for companies to justify their territorial claims.

For now, Samsung's dominance in the mobile hardware world probably gives Tizen the better chance at becoming a successful mobile OS. But still, it perhaps won't be that long before we see popular community-driven mobile distributions, as functional as Archlinux, Fedora, or Debian are on the desktop. And to that end, I think Ubuntu at least deserves some kudos for its pioneering efforts.