Computer Vision
Ongoing Projects
The ongoing projects are listed on the Research page. Click here to jump to the page.
Sub-Pixel Edge Localization and Orientation Correction
The traditional approach to edge detection (e.g. canny) using image derivatives localizes edges at the maxima of the gradient
magnitude |grad I| in the direction of the gradient grad I/|grad I|, which gives grad(|grad I|) . grad I/|grad I| = 0.
In Cartesian coordinates, this condition can be written as F(x,y) = Ix^2*Ixx + 2*Ix*Iy*Ixy + Iy^2*Iyy = 0,
which involves up to second-order derivatives.
However, the edge orientation is simply taken as the orthogonal to the image gradient, which only involves first order derivatives.
This is why the orientations of the edges as computed by the gradient operator are incorrect. The tangent computation needs to involve one higher order gradient than the computation to localize the edges. Hence it needs to involve third-order derivatives.
The tangent to the edge contour can be correctly computed by the gradient of F(x,y) at the zero level set. The orientation of the edge thus involves up to third-order derivatives. For this reason we have chosen to call our edge detector the "Third-order orientation detector". In the image above, edgels computed by the traditional method (in red) are compared to those computed by the third-order orientation operator (in green). Notice the consistency of the third-order edges with respect to the edge curves.
Download the Matlab Toolbox for the Third-order edge detector.
Vehicle Category Recognition from Aerial Video Using Shape and Appearance Information
The goal of the project was to identify the category (SUV, pickup truck, sedan, etc.) of vehicles on a highway from video acquired by a UAV
(a hot air balloon in our case).
The video had to be registered to the ground plane first before any motion segmentation could be done to extract the contour of the vehicles. The video to the right shows the results of motion segmentation on the registered video.
The shapes of motion boundaries were to be used to identify the vehicles. However, we discovered quickly that from typical aerial vantage points
the shapes of the vehicle contours are not very diagnostic of the category of the vehicles even when motion segmentation is excellent.
However, using the appearance information alone (for example using correlation with previously stored images of vehicles) proved to be ineffective also. There were so many different makes and models of vehicles that the intra-category variations tended to be of the same order of magnitude as inter-category variations.
The solution that we came up with was to augment the shape of the vehicle contours with the appearance information. The idea was to
construct a dense correspondence between the interior of the two shapes being matched based on the shape (contour) correspondence so
that the appearance could be compared using a mutual information paradigm. To put it simply, by warping one vehicle's image to align
its contour with another vehicle's contour, we were able to reduce the differences between the appearances of vehicles of different
sizes/makes of the same category while exaggerating the differences between the vehicles of different categories. This gave us the
necessary leverage to tease the categories apart.
We used the medial axis description of the vehicle shapes and an in-house algorithm to find alignments between these medial axes to ultimately find a dense alignment between the image pixels of the vehicles.
Given a dense alignment, the mutual information between the intensities of the pixels as well as gradient directions of edges was used
to compare them. 
For generic recognition systems, the appearances of vehicles are viewpoint dependent which makes this framework unrealistic. However, in our case, we had active telemetry from the UAV giving us complete information about the viewing directions, lighting conditions, etc. so that our vehicle database could be organized accordingly.
We implemented the project in several stages. The first stage was simply to prove the concept that shape augmented with appearance was indeed sufficient to categorize the vehicles. To this end we developed a synthetic vehicle database by photo-realistically rendering several dozens of makes and models of vehicles from 3d models from different viewpoints and illumination conditions. This gave us a test bed where there were no errors due to registration or segmentation.
The next stage was to generate a synthetic highway scene by animating the vehicles. This gave us a more realistic simulation of the contours derived from motion segmentation.

The third stage was to use real videos of highways from static cameras. To this end,
we installed several cameras a top a high-rise building next to a local highway.
And finally the fourth and final stage was to use video obtained from an aerial vehicle. We collected video from a hot-air balloon
and used a GPS tracking device for obtaining telemetry.
As expected, the performances dropped as we increased the difficulty of the setup. The incremental nature of the experiments did however allow us to debug and optimize the different components of the algorithm separately. Overall, we managed to obtain recognition rates of around 75-85% on our test videos.
Browse the Synthetic Vehicle Database online or download it. Publication:
Augmenting Shape with Appearance in Vehicle Category Recognition, O. C. Ozcanli, A.
Tamrakar, B. B. Kimia, J. L. Mundy. In Proceedings of the 2006 IEEE Computer Society
Conference on Computer Vision and Pattern Recognition, New York, pp. 935-942. [PDF][PPT].
Blood Vessel Segmentation in Retinal Images
This is an implementation of the following paper:
Staal et al, Ridge-Based Vessel Segmentation in Color images of the Retina, IEEE TRANSACTIONS ON MEDICAL IMAGING, VOL. 23, NO. 4, APRIL 2004 for a course project on Medical Imaging.
The main idea in the paper was to form image patches from the ridges detected in the retinal image. A two stage classfier was then used to classify the image pixels into vessel or no vessel. The first stage of the classifier determined the likelihood of an image patch being a vessel while the second classifier determined the likelihood of each pixel being a vessel given the likelihood of the patch it was on. I used AdaBoost for classification instead of the kNN classifier used by the authors and got similar results.
You can view my final presentation on this project.
Computational Geometry
Shock (Medial Axis) Computation from a Set of Circular Arc Segments
The computation of shock graphs (aka medial axis aka voronoi diagram) from a set of points, line segments and circular arc segments is a very challenging computational geometry problem. It's very hard, in general, to perform geometric computations on a finite precision system. Futhermore, there are always degeneracies due to particular geometric configurations that have to be dealt with in a special manner no matter the precision of the system.
There are several novel features in this algorithm, the most important of which is the expilict use of degeneracies. Most computational geometry algorithms shy away from degeneracies and often add random noise to the input to push it out of degenerate configurations. My algorithm, on the other hand, embraces degeneracies by actively looking for it and forcing even near degenerate configurations into degenerate ones.
It took me over a year to develop this algorithm and I have yet to published it.
GUI Environments
Brown Eyes GUI Environment for Video Processing Applications
The Brown Eyes video processing environment is the shared GUI environment for our entire vision lab. I was involved in its original design and development
and I still use it as my main development platform. It is built on the VXL libraries.
The two main philosophies behind the design are modularity and complete separation of the data structures/algorithms from the visualization elements. Thanks to its modularity, it's very easy to share code/visualization tools between different people in the lab as well as experiment with different algorithms on the fly. Standard data structures have been defined to wrap people's data structures and algorithms so that they can be used by others with the minimum of hassels.
The visualization of the data and intermediate results of algorithms are facilated by the tableaux interfaces which can be customized to display the same data in a multitude of ways according to the users needs. Layered tableaux facilitates the comparison of results from different algorithms. Various tools are also available and customizable to interact with the data in a visual manner making visual debugging very easy and useful.
Robotics
The LEMS ROVER
The LEMS-ROVER is a mobile stereo video collection robot. I designed and built it in January of 2005 to study active differential imagery.
It is built on a Pioneer 3-DX mobile platform with an ultrasonic array for obstacle avoidance. The imaging system rests on an X-Z-R platform from Velmex Inc, which consists of two linear motorized tracks (for left-right and up-down motion) and a motorized rotational platform to pan the stereo imaging system. The stereo imaging system itself consists of a pair of PointGrey Scorpion Cameras with Fujinon D16x7.3A-R11 motorized zoom lenses. They are each attached to independent Pan/Tilt Units from Directed Perception.
Each of these elements are independently controllable with a great degree of precision via a custom GUI. They can also be controlled from a gamepad controller. As expected, every single button and key combination on the gamepad was used up!
ALVIN : A Vision-based Autonomous Robot
This project started off as part of my senior design project. I and three other seniors decided to build a robot to enter into the
DARPA AUVSI IGVC Challenge.
The main challenge was to autonomously navigate an outdoor obstacle course complete with potholes and contruction barrels using a vision system.
We had hardware failures the first year and could not compete. The following year the Trinity Robotics Team took up the project and built a new chassis for the robot. Thus was born ALVIN II (pictured here).
ALVIN II relied on a single calibrated camera to detect obtacles and measure distances to them on the ground plane. We also employed a
kinematics simulator and a 3d obstacle course simulator while writing the control software (see below for more detail).
For more information, our design report and presentation to the 2001 IGVC is available for download.
3D Obstacle Course Simulator and Robot-Kinematics Simulator for ALVIN
This piece of software was born because by the time I was done building the robot, ALVIN, it was already January and there was a lot of snow on the ground.
I needed to write the control software and test it before the snow melted. So I came up the idea for this closed-loop simulator.
There were three computers involved -- the one on board the robot, one to simulate the kinematics of the robot and one to simulate the 3d obstacle course for the robot to navigate. In the picture above, the central monitor shows the navigation software running on the robot's computer. On the right monitor is the robot kinematics simulator (a screen shot is shown in the picture below). And on the left is the 3d scene of the obstacle course generated using openGL. The virtual camera of the simulated world is placed where the robot's camera would have been. I had the robot's camera looking at the 3d scene on the monitor (as shown in the picture).
The main computer on the robot communicated with the motor controllers via the serial port. So I could easily re-route these control signals to the
robot dynamics simulator. The kinematics of the robot was pretty simple and could be simulated fairly easily. The kinematics simulator would thus interpret
the control signals from the robot and compute the current location and heading of the robot in the simulated obstacle course and
subsequently pass this information to the 3d simulator. The 3d simulator would then render the scene from the current position of the robot in the 3d world.
Since the camera on the robot was looking at this scene in real time, the control software on the robot could
operate seamlessly with the simulation.
Autonomous Fire Fighting Robots: Otbot, Bob and Mini-Bob
Every year Trinity College in Hartford, Connecticut hosts the largest autonomous robotics contest called
the Trinity College Fire Fighting Home Robot Contest. Hundreds of robots from all over
the world compete in various challenges centered around designing autonomous fire-fighting robots that resides in your fire closet at home.
These robots are activated by a fire alarm or the smoke alarm and go round the house looking for a fire to put out. The idea is that these small household
robots could effectively put out a house fire while it's still small or at least keep it in check before the fire-fighters arrive.
For the competition, a small maze like the one shown on the right simulates the hallways and rooms and a candle simulates the fire.
When triggered by a fire alarm, the robot autonomously navigates the hallways searching for the candle. When it finds the flame, it puts it out
from a safe distance (see picture below of Bob closing in on the candle).
The robots employed modulated infrared sensors and untrasonic sensors for gauging distances from the walls. The flame sensors were also
infrared sensors albeit passive. Most of our robots were designed around the Motorola HC-11 or the HC-332 microcontrollers.
I was with the Trinity Robotics Team under the guidance of Prof. David Ahlgren all four years of my college and was the chief engineer my senior year. Here are a few pictures and 3d renderings of some of the robots that we designed and built.








