I have always been very excited by the idea of computer vision. At the same time I find the concept of teaching a computer to sift through a grid of pixels, whilst gaining some sort of an insight into their meaning quite astounding. When we as humans look at a scene we can immediately identify objects of interest to us. We look at a boat and immediately can tell that it is a boat and not for example a chair. We can do this, even if we have never seen that particular boat before. Computers struggle with this kind of visual finesse, although there are examples (see Assimo) that are already performing very well.
Back to the point: I was looking at this situation and thought to myself that humans find it incredibly easy to establish important features in any image. Take for example the image to the left, we can see clearly the line where the water meets land at the far edge of the lake, we can see trees, we can see a ridge and its reflection, and we can see clouds. These geometries are made visible to use because of contrast and colour differences between regions in the image. I decided to think of a way for a computer to do this. I'm sure the method has been done before under a different name, but I couldn't find any material on it so I've decided to call the method "Regional Differencing".
The method is fairly simple, but I am happy with the results. First the image is split into square regions of a constant size. The colour of these regions is equal to the average colour of the pixels contained within it. The result of this is a kind of pixilation as can be seen in the picture to the right. Now taking the colour values of every single pixel in the previous image, we can compare the value to that of the average regional colour.
The result of this is quite interesting because areas that don't fit in with their surroundings are clearly distinguishable in the image. In this article I'd like to argue that this is part of the reason why humans are so good at picking out the geometries mentioned above. Humans can happily distinguish between different regions, even where differences are relatively subtle. Combining this ability with memory of collections of shapes, we can identify objects very successfully.
So lets take a look at some of the results (as always click the first one to launch the application and try it out yourself):
The above 3 images show three different threshold levels for a constant pixel size, but if you run the app yourself you can also vary the pixel size yielding interesting results. At high thresholds it is interesting that the first objects that are made out are the trees and the line at the end of the lake, whereas the last objects made out are the clouds. Looking back I wonder if I found myself looking at these points before anything else.
There are many other ways to produced these kinds of data, including the Sobel operator method shown in various previous posts but sometimes it is nice to take a new approach.
No comments:
Post a Comment