Experimentalized: shape recognition

Showing posts with label shape recognition. Show all posts

Saturday, 17 July 2010

Real Time Feature Point Tracking in AS3

Its a bit late, so instead of posting a long informative post about how feature point tracking works I thought I'd just upload some of my results from this evening. I think its quite exciting stuff. The image shows feature point tracking working relatively well on my pupils, and also on other facial features (the last two images). I am very pleased with the result.

A related project I have also been working on this evening is optical flow which I am close to getting

working, here is a preview of what is to come:

Both methods are used in tracking objects in moving images and have been implemented in real time. I will post sample movies and some explanations shortly so you can play around with them yourself.

Friday, 2 July 2010

Computer Vision and the Hough Transform

I was working on computer vision yesterday and went to my girlfriend's graduation ceremony today so haven't had a chance to upload. I decided to take another look at computer vision - and in this case a slightly more intelligent one than previously. The method I took a look at is known as the Hough transform. I think that the simplest and most effective way to explain how the Hough Transform works is using a bullet pointed list, but before I begin you should know that it is used to extract information about shape features from an image. In this case a line / all the lines present in an image.

We start with an image in which we want to detect lines:

First of all apply a Canny Line Detector to the image isolating edge pixels:

At this stage, to reduce computation required in the next stage I decided to sample edge pixels instead of using all of them. The results for s=0.2 and s=0.9 are shown below:

Then, for every possible pair of sampled edge pixels, calculate the angle of the line between it, as well as the distance to the origin. Plot a cumulative bitmap graph of angle against distance. The way I did this was to increase the pixel value in that location each time a distance/angle occurred. Recall that a line can be defined by an angle and distance to an origin. The resulting bitmap is known as the Hough Transform:

By increasing contrast and playing around with thresholds in the Hough transform it is clear that 8 distinct regions are visible, we know that the detector has done its job as there are 8 lines in the original image:

Remember that each point provides us with data about the line angle and distance from the origin, so lets use the flash API to draw some lines:

As you can see the result isn't too bad at all, we get 8 lines, although they are a bit diffuse. The next step is to look into grouping algorithms to only pick single pixel points from the final hough transforms for each lines. In this version of the program a line is drawn for each pixel, which results in an angle and position range for each line. Pretty close though.

So thats how a Hough transform works for line detection. Just for fun, heres a real time hough transform on a webcam stream. Just click the image below to launch it.

Thursday, 18 February 2010

Face Recognition Algorithms

Over the last few days I have been looking at a various algorithms and methods used in face recognition processes. The first method I looked at uses a colour space known as TSL (closely related to HSL - hue, saturation, luminosity). This colour space was developed with the intention of being far closer to the way the human brain sees colours than RGB and other computationally preferred systems. Taking a webcam stream and converting all pixels into TSL colour space with [R,G,B] -> [T,S,L], I found that although in certain lighting conditions the system can differentiate between skin colour and other surfaces, the method is far from ideal. For example the cream walls in my house can often be identified as skin colour, which clearly shows an issue. It is also clear from the images that areas of my face that are in shade are not recognised as skin. These issues are best addressed using methods that do not depend on colour.

In my research I found two viable methods for use in face recognition, using eigenfaces, and fisherfaces.

Eigenfaces seemed to be a more commonly used method, so I decided to follow that route first (albeit roughly, as I'm sure you all know by now I like to do things my own way when I program). In order to recognise a face in an image, the computer has to be trained to know what a face looks like. The first step involved writing a class to import .pgm files from the CMU face database. Here is a sample of what the faces looked like when imported. All images are in the frontal pose, with similar lighting conditions, and varying facial expressions.

The next three steps involved creating an average face, and taking the resulting image and calculating it's vertical and horizontal gradients. The average face simply sums all of the pixels at each location for each face, and takes the average value. To calculate the gradient the difference between two adjacent pixels is taken in either the vertical or the horizontal direction. Computers find it easy to see vertical and horizontal lines (I have already written some basic shape detection software which uses these kinds of algorithms) so I thought this might be a good idea to use these as comparisons with found faces. I planned on using a kind of probability test, with a threshold as to the likeliness that any part of the image is a face, by comparing it to the mean face, the horizontal gradient face, and the vertical gradient face.

The three faces found are shown below for this database. Clearly one could find the mean face of all people wearing glasses, or all men, or all women, and this would affect it's final appearance. Therefore it could theoretically be simple to build in gender testing using webcams (assuming a complete lack of androgyny which clearly there is not....), but a probabilistic approach could still be taken.

The mean face looks incredibly symmetric and smooth. This is perfection folks, and its kind of frightening! The idea behind using these for face recognition is relatively simple. Scan an image taking the difference between an overlaid mean face, and the region of the image being scanned. If the difference is below a threshold it means that the images are similar. This means it is likely that there is a face where you are checking. To ensure it is a face consider the horizontal and vertical gradients of the mean and compare them. If they are similar to within a certain threshold it is very likely you have found a face!

I'll come back to you when I have some working flash files and source code!

Wednesday, 13 May 2009

Shape Recognition

I find some of the concepts behind artificial intelligence such as shape recognition and human interaction amazing. A few months ago I saw a program called Big Ideas on TV by James May which talked about some of the worlds most intelligent human like robots. One of the most influential pioneer in the field is Professor Ishiguro from Osaka Japan who has produced some unbelievably lifelike robots; one of which can be seen standing next to him in the photo below:

But which one is he?! Anyway on James May's Big Ideas, one robot stuck me in particular. A robot called Asimo. Asimo was originally created by Honda and there are several versions of the creature around. One of these has the ability to walk and move around almost exactly like a human being, including being able to walk up and down stairs and even run! This is all well and good but how am I going to integrate that into a flash experiment!? The Asimo on which my next project is based has the ability to see. It does this through a complex form of shape analysis of data captured by two cameras in its head. Not only can Asimo "see" preprogrammed shapes but it can learn patterns in what it sees and use the patterns to determine what things that it may never even have seen before could be. It can also link objects it sees to words and phrases which describe them, so there is at least a small degree of human interaction.

To the flash project! A lot of people have a webcam these days so I thought it might be cool to try something similar. Nothing too complicated at first.

Now the experiment requires a few green shapes, so it might be a good idea to print out the following, or if you are the owner of a mobile phone with a large screen just load up the images onto that, and that should work as well! The three shapes the camera can detect are below. A circle, a triangle and a rectangle. I chose these as virtually every other shape we see can be made up of some combination of these.

Once you have printed these images out and ideally separated them, we can begin. Below is the flash movie:

click here to launch it

Obviously this is no Asimo; firstly it is colour blind and anyone wearing a green jumper then being accused of being a rectangle should be aware of this by now, secondly it can't actually learn the shapes, it simply uses an algorithm to determine the shape by calculating the number of corners it has, or in the case of the circle by colour subtraction.

Having said that its not a bad shot. A nice little feature I've added is that the program can determine when you hold up too many green shapes resulting in a rather disgruntled tone saying: "Please only show me one shape at a time". In order to have a fully functional shape detector however an algorithm would have to be determined to find co-ordinates for all the lines and vertices in an image, calculate all colour bounding shapes and so on. Not so easy in flash, but it might be doable.

Oh and the voice is recorded from textEdit's speech function on my computer and isn't my own.

Enjoy!

Pages