A scene rendered by ZeroRay illustrating reflection and refraction. The glass orbs both reflect and transmit light through them.
ZeroRay began its life as a ray tracer. Building a ray tracer from scratch was one of the large projects for a graduate level computer science course I took called Image Computation. I ended up using ZeroRay in a lot of other projects and it has evolved into a ray tracer/computer vision (CV) toolkit with a suite of deep applications that are both interesting, and illustrate how to use ZeroRay.
I have released ZeroRay as open source software, under the BSD license. The BSD License is pretty much the most liberal open source license. The basic jist is that you can use the code for anything you want, be it open or closed source, commercial or non-commercial. All you have to do is give the author credit for using their library. You can visit the ZeroRay project page at sourceforge. At the time of this writing, I haven’t prepared any convenient downloadable packages so you will have to check the code out of the svn repository. To do so click Develop on the project page and follow the instructions there.
If you’re wondering, “What’s a Ray Tracer?” check out my previous article What is Ray Tracing?
Back to what exactly ZeroRay is. ZeroRay is a software library (or more precisely a set of software libraries). That means it is a set of tools that are used by programmers to create applications. A simple application could be one that, when executed, draws a pre-set picture and saves it to an image file. A more complex application could be a program that allows you to set up a 3D scene by dragging objects into it a low quality rendering of the scene and then invokes the ZeroRay ray tracer to generate a high quality version of the image. As a bit of an aside: ray tracing is a way to generate very high quality computer graphics, however it is significantly more computationally intensive than the techniques used in real-time renderers, like the sort that are used in video games. To make the ZeroRay ray tracer widely useful to computer artists (who generally are not programmers), it would need an application similar to the second example. But ZeroRay is more than a ray tracer. Read the rest of this entry »
“PolyViz is a tool for visualizing n-dimensional polyhedra. Motivated by the difficulty of reasoning about the iteration spaces of nested loops with many levels, PolyViz allows users to visualize polyhedral representations of those iteration spaces.”
- PolyViz Sourceforge project description
I started writing PolyViz for an advanced high performance computing (HPC) course last spring (early 2010). The course focused on the polyhedral model, a way of thinking about tightly nested loops in computer programs as geometric shapes. In the polyhedral model, each loop has an index variable, like in a standard for loop, such as i, j or k. Each index variable iterates over a range of values, this set of values is called its domain. Each index variable is interpreted as a dimension of the polyhedron. Values that are included in the domain are within the polyhedron, all other values are not.
PolyViz on Sourceforge – See the files section to download source packages or the manual, which is very good and covers background materials and the mathematics involved, along with instructions on how to use the software. Read the rest of this entry »
I think Google Chrome is a great browser, but even great browsers sometimes fail. In fact, I just had two tabs crash on me. They happened to both be Google Docs tabs so Google can’t pin the blame on anyone else. However, instead of just sitting there dumbly, Chrome informed me that the tabs weren’t responding and gave me the option to close them. This is part of Google’s whole strategy for Chrome. The renderer for every tab is a separate process, so that if something goes wrong, they can be killed individually rather than crashing the whole browser.
While being able to kill the tabs was nice, it’s not what I’m writing about. What I’m writing about is Chrome’s brilliant strategy of, when it fails, being too cute for you to be mad at it. It’s a master stroke. Software that exploits people’s emotions to get people to like it even when it misbehaves! Software has finally evolved to the level of a two year old. The image above is a screenshot of the page Chrome showed me after I chose to kill the tabs. How can you be mad when looking into those poor pixel-art eyes?
Apache logo, courtesy of The Apache Software Foundation
For an unrelated project, I’m writing a section about open source licenses and I happened across language in the Apache license that gives users the right to “publicly display [and] publicly perform” the work. Immediately after reading those words, I sat there for a good five minutes thinking up creative ways to turn the source code of the most popular open-source web server in the world into a performance piece. Unfortunately none my ideas for a song and dance rendition of the Apache source code were very good (in fact, the very concept is wretched. It’d be intolerable to sit though). Though a copyleft, source-code based, visual art installation could be pretty cool… with pages of code pasted to the walls, floor and ceiling. Errant CRTs jutting from the mass of lacquered paper, scanning over the codebase in amber on black. The guts of the enabler of the modern web on display in all their hideous monotony.
A screenshot of the Google N-Gram Viewer in action. Plot shows the relative frequency of the words internet, computer, telephone and robot from 1850 to 2000.
For those unfamiliar with the project, Google Books is an attempt to digitize every book ever written. The project began in 2002 and many libraries, universities and publishers got on board. In 2004 the project stirred up abit ofcontroversy with lawsuits against Google charging that Google Books (then called Google Print) violated the copyrights of the books it scanned.
History aside, the reason I’m writing about this is that Google has released a load of data on the frequency of occurrence of words and phrases in the entire body of books it has scanned. The goal of publishing the data seems to be to allow academics to research the evolution of language, as used in books. However, beyond just making the raw data available, Google has provided a neat little webapp that is easy enough for anyone to use. It’s fun to play with! As pictured above, I made a plot of a few technology related words (with rather predictable results).
An image of the statute of Athena rendered by ZeroRay.
There are two approaches to the computer graphics problem:
For each object in a scene, project it onto the screen, then color in all the pixels that it covers.
For each pixel on the screen figure out which object in the scene it points at, then color the pixel the color of that object.
The first approach is the one used by mainstream, real-time graphics toolkits like OpenGL and DirectX. The second approach is ray tracing. This may seem like a fine distinction, but this choice determines what sort of things end up being hard or easy later. Read the rest of this entry »
This article is a blog conversion of a manual I wrote for a journalism course this semester. Here’s a the original PDF: How To Make Your Mac Read To You.
Introduction
It’s late at night. Your eyes are blurry from hours of reading at your computer screen but you have pages more to go before tomorrow’s deadline. Wouldn’t it be nice if you could close your eyes and have your computer read those last few pages to you?
You just finished that paper you had so much trouble writing. You don’t have time to find someone to proofread it before tomorrow and you’ve been staring at it for so long you know you’ll read right over any mistakes. Wouldn’t it be nice if your computer could read it to you out loud, making those silly grammatical mistakes sound obvious?
If you‘re using Apple’s OS X, you can do both of those things easily. Apple was one of the early adopters of speech synthesis in 1984 and support for text to speech has been in their operating systems ever since. OS X has been shipped with all Macintosh computers since 2002. Unfortunately Apple is fond of moving the location of speech related menu items between versions, making users find them again. This document will teach you how to assign speech actions to a quick key combination in OS 10.6 “Snow Leopard” and how to use the command line tool “say” to create audio files of text to listen to at your leisure. Read the rest of this entry »
An example of a modern, real-time, feature-based, structure from motion system is ProFORMA by Pan, Reitmayr and Drummond. To clear up the acronym, ProFORMA stands for Probabilistic Feature-based On-line Rapid Model Acquisition. Which is a mouthful. The video is impressive, though it is worth noting that the example model, with its simple geometry and texture rich surface, is ideal for the system [2].
Like most state-of-the-art structure from motion techniques, Pan, Reitmayr and Drummond’s approach is feature based [1]. Image features is an entire class of research in computer vision. The premise is that rather than operating on the entire image, in the form of raw pixels, it is smarter to pick “interesting” parts of the image and just consider the information around those spots. The process of finding interesting spots is called feature (or interest point) detection. There are many types of feature detectors (bearing all sorts of different names). Some look for bright or dark blobs, some look for points with a lot of local texture and some decide whether a point is interesting on some other criteria entirely. ProFORMA uses the FAST corner detector as their feature detector. Read the rest of this entry »
An example of a background/foreground segmentation from Janusz Konrad at Boston University.
After working for quite a while on the “motion detection” algorithms described in my last article, I was clued in to background subtraction. Or rather, it finally hit me why, when explaining what I was working on, people kept saying, “Oh, you’re doing background subtraction.”
Background subtraction is the keyword for a relatively well explored nook of computer vision. The motivation for background subtraction research is that, in an image, there is usually a part of the image that you care about (the foreground), and a part that you don’t care about (the background) and it would be nice to focus only on the parts that we care about. There are many justifiable ways to divide a single image into foreground and background sections if we use the criteria that the foreground is “things we care about” and the background is “things we don’t care about.” The colloquial distinction is that the foreground is usually closer to the camera, in focus, and more interesting than the background. The last is where subjectivity enters the equation. In the field of background subtraction and in the context of video, the consensus is that the background is the part of the image that does not belong to a sizable moving object. This is still a big vague but different algorithms have different ideas about what constitutes a background. Read the rest of this entry »
Videndo Aedificare is a project I’ve been working on as a part of my coursework for CS612, Advanced Topics in Computer Vision. The name means “By seeing, to build” (according to Google translate) and that is exactly what it attempts. The goal of the project is to build a rudimentary system that takes a real time webcam feed and builds a 3D model of the viewed scene.
Introduction
The project was inspired by a paper by Pan, Reitmayr and Drummond called ProFORMA: Probabilistic Feature-based On-line Rapid Model Acquisition. Actually, it might be more honest to say that the project was inspired by the video that the ProFORMA authors posted on YouTube.
Videndo Aedificare (VA) does not use a feature based approach. I had wanted to at first, but decided that implementing a state of the art structure from motion system is beyond what is feasible in a semester project for a one person team (read: was talked down by my professor). Videndo Aedificare is built on my 3D graphics/computer vision toolkit, ZeroRay (which is a topic on my article todo list for this blog that keeps getting put off!). Its primary goal is to present a framework for exploring real time structure from motion algorithms. It provides a neat API to subscribe listeners to a connected webcam and classes to display results, either 2D images (which may be intermediate results for debugging), or 3D scenes. Videndo Aedificare uses the Ogre open source rendering engine to provide 3D views. Camera listeners implement a receiveFrame method, by which they are passed the current camera frame, and given time to operate on it. Often camera listeners have their own views to display results side by side with the raw camera feed.
Simple Builder
VA's Simple Builder. The webcam video stream picturing my monitor wearing a festive hat (left) beside the 3D rendering of the model constructed from the scene (right).
As a proof of concept, and to test my framework, I implemented the most naive scene reconstruction algorithm I could think of. It assumed that the intensity of a pixel was inversely proportional to the distance of that point to the camera. In other words, bright pixels are close to the camera, dark ones are far away. Simple Builder generates a polygonal mesh from each frame by first making the image greyscale and then interpreting it as a height map. The maximum and minimum heights are parameters and the greyscale values are interpreted between them. The visual effect is rather interesting. Read the rest of this entry »