Videndo Aedificare is a project I’ve been working on as a part of my coursework for CS612, Advanced Topics in Computer Vision. The name means “By seeing, to build” (according to Google translate) and that is exactly what it attempts. The goal of the project is to build a rudimentary system that takes a real time webcam feed and builds a 3D model of the viewed scene.
Introduction
The project was inspired by a paper by Pan, Reitmayr and Drummond called ProFORMA: Probabilistic Feature-based On-line Rapid Model Acquisition. Actually, it might be more honest to say that the project was inspired by the video that the ProFORMA authors posted on YouTube.
For a discussion of ProFORMA and feature based approaches see my article, Feature Based Approaches: ProFORMA.
Videndo Aedificare (VA) does not use a feature based approach. I had wanted to at first, but decided that implementing a state of the art structure from motion system is beyond what is feasible in a semester project for a one person team (read: was talked down by my professor). Videndo Aedificare is built on my 3D graphics/computer vision toolkit, ZeroRay (which is a topic on my article todo list for this blog that keeps getting put off!). Its primary goal is to present a framework for exploring real time structure from motion algorithms. It provides a neat API to subscribe listeners to a connected webcam and classes to display results, either 2D images (which may be intermediate results for debugging), or 3D scenes. Videndo Aedificare uses the Ogre open source rendering engine to provide 3D views. Camera listeners implement a receiveFrame method, by which they are passed the current camera frame, and given time to operate on it. Often camera listeners have their own views to display results side by side with the raw camera feed.
Simple Builder

VA's Simple Builder. The webcam video stream picturing my monitor wearing a festive hat (left) beside the 3D rendering of the model constructed from the scene (right).
As a proof of concept, and to test my framework, I implemented the most naive scene reconstruction algorithm I could think of. It assumed that the intensity of a pixel was inversely proportional to the distance of that point to the camera. In other words, bright pixels are close to the camera, dark ones are far away. Simple Builder generates a polygonal mesh from each frame by first making the image greyscale and then interpreting it as a height map. The maximum and minimum heights are parameters and the greyscale values are interpreted between them. The visual effect is rather interesting. (more…)