<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Cutthroat Studios Developer&#039;s Blog</title>
	<atom:link href="http://www.cutthroatstudios.com/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.cutthroatstudios.com/blog</link>
	<description>Articles, explanations and observations</description>
	<lastBuildDate>Sat, 19 Feb 2011 10:24:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>ZeroRay &#8211; Ray Tracer, Computer Vision Toolkit</title>
		<link>http://www.cutthroatstudios.com/blog/2011/02/zeroray/</link>
		<comments>http://www.cutthroatstudios.com/blog/2011/02/zeroray/#comments</comments>
		<pubDate>Sat, 19 Feb 2011 10:22:46 +0000</pubDate>
		<dc:creator>Jess</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[3D Graphics]]></category>
		<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Computer Vision]]></category>
		<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Ray Tracer]]></category>
		<category><![CDATA[Ray Tracing]]></category>
		<category><![CDATA[Sourceforge]]></category>
		<category><![CDATA[ZeroRay]]></category>

		<guid isPermaLink="false">http://www.cutthroatstudios.com/blog/?p=87</guid>
		<description><![CDATA[ZeroRay began its life as a ray tracer. Building a ray tracer from scratch was one of the large projects for a graduate level computer science course I took called Image Computation. I ended up using ZeroRay in a lot of other projects and it has evolved into a ray tracer/computer vision (CV) toolkit with [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_223" class="wp-caption alignleft" style="width: 235px"><a href="http://www.cutthroatstudios.com/blog/wp-content/uploads/2010/12/2010-12-21_ShoeboxBilinear.jpg"><img class="size-medium wp-image-223" title="2010-12-21_ShoeboxBilinear" src="http://www.cutthroatstudios.com/blog/wp-content/uploads/2010/12/2010-12-21_ShoeboxBilinear-225x300.jpg" alt="A scene rendered by ZeroRay" width="225" height="300" /></a><p class="wp-caption-text">A scene rendered by ZeroRay illustrating reflection and refraction. The glass orbs both reflect and transmit light through them.</p></div>
<p>ZeroRay began its life as a ray tracer. Building a ray tracer from scratch was one of the large projects for a graduate level computer science course I took called Image Computation. I ended up using ZeroRay in a lot of other projects and it has evolved into a ray tracer/computer vision (CV) toolkit with a suite of deep applications that are both interesting, and illustrate how to use ZeroRay.</p>
<p>I have released ZeroRay as open source software, under the BSD license. The BSD License is pretty much the most liberal open source license. The basic jist is that you can use the code for anything you want, be it open or closed source, commercial or non-commercial. All you have to do is give the author credit for using their library. You can visit the <a href="http://http://sourceforge.net/projects/zeroray" target="_blank">ZeroRay project page</a> at sourceforge. At the time of this writing, I haven&#8217;t prepared any convenient downloadable packages so you will have to check the code out of the svn repository. To do so click <em>Develop</em> on the project page and follow the instructions there.</p>
<p>If you&#8217;re wondering, &#8220;What&#8217;s a Ray Tracer?&#8221; check out my previous article <a href="http://www.cutthroatstudios.com/blog/2010/12/what-is-ray-tracing/">What is Ray Tracing?</a></p>
<p>Back to what exactly ZeroRay is. ZeroRay is a software library (or more precisely a set of software libraries). That means it is a set of tools that are used by programmers to create applications. A simple application could be one that, when executed, draws a pre-set picture and saves it to an image file. A more complex application could be a program that allows you to set up a 3D scene by dragging objects into it a low quality rendering of the scene and then invokes the ZeroRay ray tracer to generate a high quality version of the image. As a bit of an aside: ray tracing is a way to generate very high quality computer graphics, however it is significantly more computationally intensive than the techniques used in real-time renderers, like the sort that are used in video games. To make the ZeroRay ray tracer widely useful to computer artists (who generally are not programmers), it would need an application similar to the second example. But ZeroRay is more than a ray tracer.<span id="more-87"></span></p>
<h2>More Than a Ray Tracer</h2>
<p>There are a number of high-quality, mature ray tracers in the world. Why build another? Well, as I mentioned, I started it as a project for a class, but what ZeroRay really offers is its combination of 3D rendering and a computer vision (CV) toolkit. Both libraries are a part of ZeroRay and use the same set of foundational classes, vectors, matrices, images, etc. The image class is full-featured and implements a number of useful algorithms as well as simple tools like annotating or marking images as is often needed to test CV algorithms. The CV toolkit also implements convolution/correlation providing both brute force algorithms and Fast Fourier Transform (FFT) based algorithms built with <a href="http://www.fftw.org/" target="_blank">FFTW</a>.</p>
<p>ZeroRay is not nearly as fully-featured as <a href="http://opencv.willowgarage.com/wiki/" target="_blank">OpenCV</a>, the current go-to CV toolkit, but ZeroRay provides a well-designed, modern, object-oriented foundation for CV. It contains all the building blocks that a student of CV would have to spend several months writing before they started their work.</p>
<div id="attachment_259" class="wp-caption alignright" style="width: 310px"><a href="http://www.cutthroatstudios.com/blog/wp-content/uploads/2011/02/opalescent_1440x927.jpg"><img class="size-medium wp-image-259" title="opalescent_1440x927" src="http://www.cutthroatstudios.com/blog/wp-content/uploads/2011/02/opalescent_1440x927-300x193.jpg" alt="ZeroRay Opalescent Render" width="300" height="193" /></a><p class="wp-caption-text">A ZeroRay Rendering of glass spheres lit by various coloured light sources.</p></div>
<p>The Image class is especially useful. It is implemented in a speed-optimized but still relatively straightforward manner. Users can load an image and begin working with pixel data directly without having to read a chapter in the manual first (as one might to begin working with OpenCV images). ZeroRay reads and saves in png format. In memory, the Image class (zr::Image) can represent many different types of pixel data, greyscale, RGB, RGBA, all in discrete (4 bits per channel) or floating point (32 or 64 bits per channel) format. Floating point images are important because many CV algorithms and transformations need to be performed with much more numerical precision than images are usually stored in. Without this extra precision iterated calculations would result in very large rounding errors as results are rounded off at each step of the way.</p>
<h2>Applicatons Module</h2>
<p>The applications module contains a set of example programs that use either the zeroray (ray tracing) or zrvision (CV) libraries. However, these aren&#8217;t just simple example programs, each one is a project I completed for my CV coursework. They&#8217;re pretty cool and show what can be done with ZeroRay.</p>
<p><strong>zrExample &#8211; </strong>This is the ray tracer demo. It has a handful of scenes (defined in code) that it can render either to the screen or to a file, at arbitrarily large resolutions. This is the place to look to learn how to set up and render a scene with the ray tracer. It is easy to modify one of the pre-defined scenes, re-compile and render the new scene.</p>
<p><strong>AeroQuest &#8211; </strong>The task was to take a shaked, hand-held, video of a plane landing and stabilise the picture so that the plane stays in the center by using correlation. The end result is a command line executable that takes in a set of images, a template bounding box in the first frame, and outputs a set of images with the template held in the center. Or at least it tries to. The technique fails if the object in the template changes size or shape dramatically (this is an expected shortcoming).</p>
<p><strong>WineAndCheese &#8211; </strong>A classifying algorithm than gives a set of images scores on a &#8220;Grapes vs. Cheese&#8221; score. For a given set of pictures where some are grapes and some are cheese it gives all the grape images a higher &#8220;grape score&#8221; than the cheese. It works by using a circular hough transform to count the number of circles in an image. The more circles, the higher the grape score. Not expected to work generally on all arbitrary images of grapes or cheese.</p>
<p><strong>VidendoAedificare &#8211; </strong>See my previous article <a href="http://www.cutthroatstudios.com/blog/2010/12/videndo-aedificare/">Videndo Aedificare</a>.</p>
<p><img class="alignnone" title="-Jess" src="/skins/sig-jess.gif" alt="-Jess" width="80" height="55" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cutthroatstudios.com/blog/2011/02/zeroray/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PolyViz Polyhedral Visualizer</title>
		<link>http://www.cutthroatstudios.com/blog/2011/01/polyviz-polyhedral-visualizer/</link>
		<comments>http://www.cutthroatstudios.com/blog/2011/01/polyviz-polyhedral-visualizer/#comments</comments>
		<pubDate>Mon, 24 Jan 2011 23:19:47 +0000</pubDate>
		<dc:creator>Jess</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[3D Graphics]]></category>
		<category><![CDATA[HPC]]></category>
		<category><![CDATA[Ogre]]></category>
		<category><![CDATA[Polyhedral Model]]></category>
		<category><![CDATA[PolyViz]]></category>
		<category><![CDATA[Visualization]]></category>
		<category><![CDATA[wxWidgets]]></category>

		<guid isPermaLink="false">http://www.cutthroatstudios.com/blog/?p=250</guid>
		<description><![CDATA[&#8220;PolyViz is a tool for visualizing n-dimensional polyhedra. Motivated by the difficulty of reasoning about the iteration spaces of nested loops with many levels, PolyViz allows users to visualize polyhedral representations of those iteration spaces.&#8221; - PolyViz Sourceforge project description I started writing PolyViz for an advanced high performance computing (HPC) course last spring (early [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_251" class="wp-caption alignleft" style="width: 310px"><a href="http://www.cutthroatstudios.com/blog/wp-content/uploads/2011/01/2010-11-17_Viewer.png"><br />
<img class="size-medium wp-image-251" title="PolyViz Viewer" src="http://www.cutthroatstudios.com/blog/wp-content/uploads/2011/01/2010-11-17_Viewer-300x222.png" alt="PolyViz Viewer" width="300" height="222" /></a><p class="wp-caption-text">The PolyViz Viewer displaying three polyhedra.</p></div>
<p>&#8220;PolyViz is a tool for visualizing n-dimensional polyhedra. Motivated by the difficulty of reasoning about the iteration spaces of nested loops with many levels, PolyViz allows users to visualize polyhedral representations of those iteration spaces.&#8221;</p>
<p>- PolyViz Sourceforge project description</p>
<p>I started writing PolyViz for an advanced high performance computing (HPC) course last spring (early 2010). The course focused on the polyhedral model, a way of thinking about tightly nested loops in computer programs as geometric shapes. In the polyhedral model, each loop has an index variable, like in a standard for loop, such as i, j or k. Each index variable iterates over a range of values, this set of values is called its domain. Each index variable is interpreted as a dimension of the polyhedron. Values that are included in the domain are within the polyhedron, all other values are not.</p>
<p><a href="https://sourceforge.net/projects/polyviz/" target="_blank"><strong>PolyViz on Sourceforge</strong></a> &#8211; See the files section to download source packages or the manual, which is very good and covers background materials and the mathematics involved, along with instructions on how to use the software.<span id="more-250"></span></p>
<p>For simple loops this usually results in some manner of rectangular shape (though it may be 3, 4 or higher dimensional depending on how deep the loop nest goes). However, for loops with more complex domains, the polyhedrons become more complex themselves.</p>
<p>The polyhedral model shows its worth when one tries to optimise a loop nest. There are many program transformations that may improve the performance of a loop nest, such as shifting the order of loops or skewing the loop in some way. These transformations are especially important when trying to parallelize a loop nest. Traditionally, these transformations are applied manually to the loop code by skilled programmers. In the polyhedral model, many of these transformations can be expressed as functions that can be applied to the polyhedral domain to arrive at the modified program. Of course, the program must be translated into and out of the polyhedral model in order to leverage this mathematical elegance.</p>
<p>This is where PolyViz can help. It can be very difficult, even for experts, to look at loop bounds in a nest of for loops and imagine the shape of the implied polyhedral iteration space. However, it is necessary to do this in order to reason about the effects of program transformations on important program attributes like data dependencies.</p>
<p>PolyViz allows programmers to type in the bounds of a polyhedron, in the form of a series of inequalities, and instantly view a model of the polyhedron. If the polyhedron is high dimensional (more than 3 dimensions), the user chooses which dimensions he or she would like to view directly and which will be parameterized. Parameterized dimensions are fixed at a given value. The result is essentially a rendering of a 3 dimensional slice of a higher dimensional object. See the manual for a great analogy (involving a koosh ball) that explains this concept.</p>
<p>Jon Roelofs and I co-wrote the PolyViz library software as part of the HPC course. I wrote the PolyViz Viewer GUI application last summer for Prof. Sanjay Rajopadhye with the support of the National Science Foundation.</p>
<p>PolyViz uses <a href="http://www.ogre3d.org/" target="_blank">Ogre</a>, an exceptional open source real-time 3D graphics engine, to provide visualizations. The PolyViz Viewer uses <a href="http://www.wxwidgets.org/" target="_blank">wxWidgets</a> to provide a native GUI experience on OS X, Windows and Linux (GTK).</p>
<p><img class="alignnone" title="-Jess" src="/skins/sig_jess.gif" alt="-Jess" width="80" height="55" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cutthroatstudios.com/blog/2011/01/polyviz-polyhedral-visualizer/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>When Chrome Fails</title>
		<link>http://www.cutthroatstudios.com/blog/2011/01/when-chrome-fails/</link>
		<comments>http://www.cutthroatstudios.com/blog/2011/01/when-chrome-fails/#comments</comments>
		<pubDate>Wed, 05 Jan 2011 02:52:14 +0000</pubDate>
		<dc:creator>Jess</dc:creator>
				<category><![CDATA[Off Topic]]></category>
		<category><![CDATA[Aw Snap!]]></category>
		<category><![CDATA[Browser]]></category>
		<category><![CDATA[Chrome]]></category>
		<category><![CDATA[Google Chrome]]></category>

		<guid isPermaLink="false">http://www.cutthroatstudios.com/blog/?p=239</guid>
		<description><![CDATA[I think Google Chrome is a great browser, but even great browsers sometimes fail. In fact, I just had two tabs crash on me. They happened to both be Google Docs tabs so Google can&#8217;t pin the blame on anyone else. However, instead of just sitting there dumbly, Chrome informed me that the tabs weren&#8217;t [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_246" class="wp-caption alignnone" style="width: 588px"><a href="http://www.cutthroatstudios.com/blog/wp-content/uploads/2011/01/2011-01-04_ChromeFailure.png"><img class="size-full wp-image-246" title="2011-01-04_ChromeFailure" src="http://www.cutthroatstudios.com/blog/wp-content/uploads/2011/01/2011-01-04_ChromeFailure.png" alt="Aw, Snap! Chrome tab crashes." width="578" height="177" /></a><p class="wp-caption-text">Aw, Snap! Chrome tab crashes.</p></div>
<p>I think Google Chrome is a great browser, but even great browsers sometimes fail. In fact, I just had two tabs crash on me. They happened to both be Google Docs tabs so Google can&#8217;t pin the blame on anyone else. However, instead of just sitting there dumbly, Chrome informed me that the tabs weren&#8217;t responding and gave me the option to close them. This is part of Google&#8217;s whole strategy for Chrome. The renderer for every tab is a separate process, so that if something goes wrong, they can be killed individually rather than crashing the whole browser.</p>
<p>While being able to kill the tabs was nice, it&#8217;s not what I&#8217;m writing about. What I&#8217;m writing about is Chrome&#8217;s brilliant strategy of, when it fails, being too cute for you to be mad at it. It&#8217;s a master stroke. Software that exploits people&#8217;s emotions to get people to like it even when it misbehaves! Software has finally evolved to the level of a two year old. The image above is a screenshot of the page Chrome showed me after I chose to kill the tabs. How can you be mad when looking into those poor pixel-art eyes?</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cutthroatstudios.com/blog/2011/01/when-chrome-fails/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Apache, The Performance Server</title>
		<link>http://www.cutthroatstudios.com/blog/2010/12/apache-the-performance-server/</link>
		<comments>http://www.cutthroatstudios.com/blog/2010/12/apache-the-performance-server/#comments</comments>
		<pubDate>Sun, 26 Dec 2010 20:37:06 +0000</pubDate>
		<dc:creator>Jess</dc:creator>
				<category><![CDATA[Off Topic]]></category>
		<category><![CDATA[Apache]]></category>
		<category><![CDATA[Apache License]]></category>
		<category><![CDATA[License]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[Performance]]></category>

		<guid isPermaLink="false">http://www.cutthroatstudios.com/blog/?p=236</guid>
		<description><![CDATA[For an unrelated project, I&#8217;m writing a section about open source licenses and I happened across language in the Apache license that gives users the right to &#8220;publicly display [and] publicly perform&#8221; the work. Immediately after reading those words, I sat there for a good five minutes thinking up creative ways to turn the source [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_237" class="wp-caption alignright" style="width: 213px"><a href="http://www.cutthroatstudios.com/blog/wp-content/uploads/2010/12/The-Apache-Software-Foundation.gif"><img class="size-full wp-image-237 " title="The Apache Software Foundation" src="http://www.cutthroatstudios.com/blog/wp-content/uploads/2010/12/The-Apache-Software-Foundation.gif" alt="The Apache Software Foundation" width="203" height="61" /></a><p class="wp-caption-text">Apache logo, courtesy of The Apache Software Foundation</p></div>
<p>For an unrelated project, I&#8217;m writing a section about open source licenses and I happened across language in the Apache license that gives users the right to &#8220;publicly display [and] publicly perform&#8221; the work. Immediately after reading those words, I sat there for a good five minutes thinking up creative ways to turn the source code of the most popular open-source web server in the world into a performance piece. Unfortunately none my ideas for a song and dance rendition of the Apache source code were very good (in fact, the very concept is wretched. It&#8217;d be intolerable to sit though). Though a copyleft, source-code based, visual art installation could be pretty cool&#8230; with pages of code pasted to the walls, floor and ceiling. Errant CRTs jutting from the mass of lacquered paper, scanning over the codebase in amber on black. The guts of the enabler of the modern web on display in all their hideous monotony.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cutthroatstudios.com/blog/2010/12/apache-the-performance-server/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Books</title>
		<link>http://www.cutthroatstudios.com/blog/2010/12/google-books/</link>
		<comments>http://www.cutthroatstudios.com/blog/2010/12/google-books/#comments</comments>
		<pubDate>Wed, 22 Dec 2010 14:54:35 +0000</pubDate>
		<dc:creator>Jess</dc:creator>
				<category><![CDATA[Off Topic]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Google Books]]></category>
		<category><![CDATA[Language]]></category>
		<category><![CDATA[N-Gram]]></category>
		<category><![CDATA[Statistics]]></category>
		<category><![CDATA[Word Frequency]]></category>

		<guid isPermaLink="false">http://www.cutthroatstudios.com/blog/?p=231</guid>
		<description><![CDATA[For those unfamiliar with the project, Google Books is an attempt to digitize every book ever written. The project began in 2002 and many libraries, universities and publishers got on board. In 2004 the project stirred up a bit of controversy with lawsuits against Google charging that Google Books (then called Google Print) violated the [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_232" class="wp-caption alignleft" style="width: 310px"><a href="http://www.cutthroatstudios.com/blog/wp-content/uploads/2010/12/ComputerIntenetTelephoneRobot.png"><img class="size-medium wp-image-232" title="ComputerIntenetTelephoneRobot" src="http://www.cutthroatstudios.com/blog/wp-content/uploads/2010/12/ComputerIntenetTelephoneRobot-300x112.png" alt="Google N-Gram Viewer" width="300" height="112" /></a><p class="wp-caption-text">A screenshot of the Google N-Gram Viewer in action. Plot shows the relative frequency of the words internet, computer, telephone and robot from 1850 to 2000.</p></div>
<p>For those unfamiliar with the project, Google Books is an attempt to digitize every book ever written. The project began in 2002 and many libraries, universities and publishers got on board. In 2004 the project <a href="http://seattletimes.nwsource.com/html/businesstechnology/2002622398_paul14.html" target="_blank">stirred up a</a> <a href="http://www.macworld.com/article/47564/2005/10/googleprint.html" target="_blank">bit of</a> <a href="http://news.cnet.com/Googles-battle-over-library-books/2100-1025_3-5907506.html" target="_blank">controversy</a> with lawsuits against Google charging that Google Books (then called Google Print) violated the copyrights of the books it scanned.</p>
<p>History aside, the reason I&#8217;m writing about this is that Google has released a load of data on the frequency of occurrence of words and phrases in the entire body of books it has scanned. The goal of publishing the data seems to be to allow academics to research the evolution of language, as used in books. However, beyond just making the raw data available, Google has provided a <a href="http://ngrams.googlelabs.com/" target="_blank">neat little webapp</a> that is easy enough for anyone to use. It&#8217;s fun to play with! As pictured above, I made a plot of a few technology related words (with rather predictable results).</p>
<p><img class="alignnone" title="Jess" src="/skins/sigJess.gif" alt="-Jess" width="80" height="55" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cutthroatstudios.com/blog/2010/12/google-books/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What is Ray Tracing?</title>
		<link>http://www.cutthroatstudios.com/blog/2010/12/what-is-ray-tracing/</link>
		<comments>http://www.cutthroatstudios.com/blog/2010/12/what-is-ray-tracing/#comments</comments>
		<pubDate>Tue, 21 Dec 2010 07:55:11 +0000</pubDate>
		<dc:creator>Jess</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[3D Graphics]]></category>
		<category><![CDATA[Camera Model]]></category>
		<category><![CDATA[Computer Graphics]]></category>
		<category><![CDATA[Lighting]]></category>
		<category><![CDATA[Pinhole Camera]]></category>
		<category><![CDATA[Ray Tracer]]></category>
		<category><![CDATA[Ray Tracing]]></category>
		<category><![CDATA[Recursion]]></category>
		<category><![CDATA[Reflection]]></category>
		<category><![CDATA[Refraction]]></category>
		<category><![CDATA[ZeroRay]]></category>

		<guid isPermaLink="false">http://www.cutthroatstudios.com/blog/?p=212</guid>
		<description><![CDATA[There are two approaches to the computer graphics problem: For each object in a scene, project it onto the screen, then color in all the pixels that it covers. For each pixel on the screen figure out which object in the scene it points at, then color the pixel the color of that object. The [...]]]></description>
			<content:encoded><![CDATA[<div class="wp-caption alignright" style="width: 310px"><a href="/images/Athene_1440x900.jpg"><img title="Athene" src="/images/thumb_Athene.jpg" alt="Statute of Athena rendered by ZeroRay." width="300" height="188" /></a><p class="wp-caption-text">An image of the statute of Athena rendered by ZeroRay.</p></div>
<p>There are two approaches to the computer graphics problem:</p>
<ol>
<li>For each object in a scene, project it onto the screen, then color in all the pixels that it covers.</li>
<li>For each pixel on the screen figure out which object in the scene it points at, then color the pixel the color of that object.</li>
</ol>
<p>The first approach is the one used by mainstream, real-time graphics toolkits like OpenGL and DirectX. The second approach is ray tracing. This may seem like a fine distinction, but this choice determines what sort of things end up being hard or easy later.<span id="more-212"></span></p>
<h2>Camera Models</h2>
<p><strong>Inquisitive Reader:</strong> So how does that #2 thing you said make a picture?</p>
<p><strong>Jess: </strong>I&#8217;ll tell you how.</p>
<p><strong>IR: </strong>So you keep saying.</p>
<p><strong>J: </strong>Ahem. Well. To begin, you must have a model of a camera. By model, I mean a geometrical way of describing how an image is actually formed. In real life, this is somewhat complicated. But since we&#8217;re not in real life, we can assume very simple model: <a href="http://en.wikipedia.org/wiki/Pinhole_camera" target="_blank">the pinhole camera</a>.</p>
<div class="wp-caption alignleft" style="width: 230px"><a href="http://en.wikipedia.org/wiki/File:Pinhole-camera.svg"><img title="Pinhole Camera Model" src="http://upload.wikimedia.org/wikipedia/commons/thumb/3/3b/Pinhole-camera.svg/220px-Pinhole-camera.svg.png" alt="Pinhole Camera Model" width="220" height="150" /></a><p class="wp-caption-text">An illustration of a pinhole camera courtesy of Wikipedia.</p></div>
<p><strong>IR: </strong>That&#8217;s a nice picture, and accurate I&#8217;m sure, however, I couldn&#8217;t help but notice that the tree inside the box is upside down.</p>
<p><strong>J: </strong>Quite observant of you! The place where the image of the tree is formed is called the <strong>image plane</strong>. We&#8217;ll cleverly do away with this problem by placing the image plane in front of the pinhole by the same distance that it is behind in the illustration. This way, the light rays will not have crossed paths at the pinhole yet, and the image will not be upside down.</p>
<p><strong>IR:</strong> Wait, that doesn&#8217;t seem quite right&#8230; I&#8217;m sure there&#8217;s a problem here but I can&#8217;t put my finger on it.</p>
<p><strong>J: </strong>The reason we can get away with this is because, as I mentioned before, we aren&#8217;t in real life. In real life the light-sensitive imaging surface must be contained in a dark box, so that light coming from other directions, doesn&#8217;t affect the image. Since we are simulating the light ourselves, we can choose to only simulate the light in front of the image plane.</p>
<p><strong>IR:</strong> Simulating light&#8230; maybe we&#8217;re finally getting to the good part.</p>
<p><strong>J:</strong> That we are. We call the point where the rays come together in the illustration, the <strong>focal point</strong>. To generate an image we create rays that begin at the focal point and go through each point on the image plane. We then <em>trace</em> them back out into the scene and see what they hit. This isn&#8217;t really simulating light, as I said earlier, but working backwards from the camera and saying, &#8220;Now if light were to hit the image plane here, where must it have come from?&#8221; Where the light came from determines what color it is, ergo what color we will make a given spot on the image.</p>
<p><strong>J:</strong> To make an image out of all of this, we map our image onto the image plane. Since an image is essentially a grid of pixels, we imagine that grid superimposed on the image plane. Then we cast rays from the focal point through each grid cell, moving over the entire image plane, each ray determining the color of one pixel.</p>
<p><strong>IR: </strong>It still seems as if there are a few things missing here. How do you figure out what the ray hits in the scene? How do you know what color light the ray should be anyway? How do lights and shadows work? I&#8217;m not at all satisfied by your hand waving.</p>
<p><strong>J:</strong> That&#8217;s it. I&#8217;m tired of your carping. I&#8217;ll answer your questions, but I&#8217;ll do so in non-dialog form.</p>
<p><strong>IR:</strong> No! Wait! I.. [zop!]</p>
<p>With a spray of orange sparks and a puff of acrid smoke, he was gone, and I was left to my narrative, uninterrupted.</p>
<h2>Intersecting the Scene</h2>
<div id="attachment_217" class="wp-caption alignright" style="width: 310px"><a href="http://www.cutthroatstudios.com/blog/wp-content/uploads/2010/12/mc_wireframe.jpeg"><img class="size-medium wp-image-217" title="mc_wireframe" src="http://www.cutthroatstudios.com/blog/wp-content/uploads/2010/12/mc_wireframe-300x225.jpg" alt="Master Chief Wireframe" width="300" height="225" /></a><p class="wp-caption-text">A wireframe render of the Master Chief model from Halo 2.</p></div>
<p>Once you have a ray, travelling from the focal point, through the image plane, out into the scene, the next thing you need to know is what object the ray hits. The simplest way to determine this is to check each object in the scene and see if the ray intersects it.</p>
<p>Every type of object that exists in the scene must have a function to test whether a ray intersects it and to find where the intersection occurs. <a href="/blog/2010/12/zeroray/">ZeroRay</a> supports a number of primitive shapes, planes, spheres, rectangular prisms and billboards to be precise. Furthermore, it supports any mesh made up of triangles. This is pretty much the same set of objects supported by a non-ray-tracing graphics engine. Most of the objects you see in computer graphics are meshes of triangles (sometimes quadrilaterals are allowed, they are easily broken into 2 triangles along the diagonal). Some advanced rendering engines support more exotic primitives, but we&#8217;re not going to talk about them.</p>
<p>We are only interested in the intersection nearest to the camera. Any intersections that are further away will be blocked from the camera&#8217;s view by the closer object. The object of nearest intersection is the object that determines the color of the point in the image (so far. There are other considerations that are left for later).</p>
<h2>Lights and Materials</h2>
<p>Now that we know which object controls the color of a pixel, how do we figure out what the color is? Every object also has a property called the <strong>surface material</strong>, so named because it represents what type of material the object is made of. The surface material and lighting are the basic contributors to the color that we will ascribe to the ray. A surface material itself has a number of properties, the simples of which is diffuse color. Diffuse color is the color that light takes on when it is scattered diffusely (in all directions) after hitting the object. Another color property is specular color. For more details on diffuse vs specular lighting see my article <a href="http://www.cutthroatstudios.com/blog/2008/04/dynamicskies-normalmapping/"><strong>Dynamic Skies and Normal Mapping</strong></a> (scroll down to the sections entitled <em>Some Background on Lighting in 3D Graphics).</em></p>
<p>The surface material tells us the basic color of the object. To produce more realistic renderings we need to take lighting into account. A light is a very simple construct. It has simply a position and a color (and maybe a few attenuation properties, i.e. how strong its light is some distance away). The way the light is used is a bit more complex. Once we&#8217;ve determined the point of our camera ray intersection, we move on to lighting calculations. For each light in the scene, we cast a ray from the light to the camera ray intersection point. We find the nearest intersection in the scene for the light rays. If the nearest intersected object isn&#8217;t the same object as the camera ray intersection object, then that point is shadowed from that light by some other object. If the point is shadowed by another object, that light has no contribution to that point&#8217;s color. If the object is the same, then we know that there is nothing blocking that light and we perform the calculation described in the <strong>Dynamic Skies and Normal Mapping</strong> article to determine how much light that point receives.</p>
<h2>Reflection, Refraction: Recursion</h2>
<div id="attachment_220" class="wp-caption alignright" style="width: 310px"><a href="http://www.cutthroatstudios.com/blog/wp-content/uploads/2010/12/2010-12-20_ReflectionExample.png"><img class="size-medium wp-image-220" title="2010-12-20_ReflectionExample" src="http://www.cutthroatstudios.com/blog/wp-content/uploads/2010/12/2010-12-20_ReflectionExample-300x239.png" alt="ZeroRay Reflection Example" width="300" height="239" /></a><p class="wp-caption-text">An early ZeroRay render showing the checkerboard floor reflecting on the shiny planet orbs.</p></div>
<p>The beauty of the ray tracing approach to graphics is that we can perform this whole process again to calculate reflections and refractions. We can use the angle of the surface the ray hit and the angle of incidence to determine which direction the ray would reflect off the surface. Since we&#8217;re tracing the light backwards, what we&#8217;re actually doing is figuring out the direction that light must have come from, if it were to have hit that point from somewhere else and reflected into the camera. So we send our algorithm off in that direction as if this ray were the original ray from the camera. Once we get a color back from that calculation, we add it to the color at the original point subject to a reflectivity parameter. More reflective objects show the reflected light brighter while less reflective objects show the reflected light more mutely.</p>
<p>This process of setting up a calculation that then performs the same calculation within itself is called recursion, and it is a very powerful method of computation. If we allowed infinite levels of reflection our ray tracer would never finish calculating the color of any point, so we usually set some hard limit on the number of recursive reflection calculations (like 3 or so).</p>
<div id="attachment_223" class="wp-caption alignleft" style="width: 235px"><a href="http://www.cutthroatstudios.com/blog/wp-content/uploads/2010/12/2010-12-21_ShoeboxBilinear.jpg"><img class="size-medium wp-image-223" title="2010-12-21_ShoeboxBilinear" src="http://www.cutthroatstudios.com/blog/wp-content/uploads/2010/12/2010-12-21_ShoeboxBilinear-225x300.jpg" alt="A scene rendered by ZeroRay" width="225" height="300" /></a><p class="wp-caption-text">A scene rendered by ZeroRay illustrating reflection and refraction. The glass orbs reflect and transmit light through them.</p></div>
<p>Another way we can exploit this same trick is if we have objects that transmit light through themselves, like glass. We can use the ray angles to calculate the ray that the light would travel after passing through the object and recursively calculate the contribution of light passing through the object to the color of the point.</p>
<p><img class="alignnone" src="/skins/sigJess.gif" alt="-Jess" width="80" height="55" /></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cutthroatstudios.com/blog/2010/12/what-is-ray-tracing/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>How to Make Your Mac Read to You</title>
		<link>http://www.cutthroatstudios.com/blog/2010/12/make-your-mac-read-to-you/</link>
		<comments>http://www.cutthroatstudios.com/blog/2010/12/make-your-mac-read-to-you/#comments</comments>
		<pubDate>Thu, 16 Dec 2010 10:28:03 +0000</pubDate>
		<dc:creator>Jess</dc:creator>
				<category><![CDATA[Off Topic]]></category>
		<category><![CDATA[OS X]]></category>
		<category><![CDATA[Say]]></category>
		<category><![CDATA[Shortcut]]></category>
		<category><![CDATA[Speak Text]]></category>
		<category><![CDATA[Speech Synthesis]]></category>
		<category><![CDATA[Universal Access]]></category>
		<category><![CDATA[Voice]]></category>

		<guid isPermaLink="false">http://www.cutthroatstudios.com/blog/?p=198</guid>
		<description><![CDATA[This article is a blog conversion of a manual I wrote for a journalism course this semester. Here&#8217;s a the original PDF: How To Make Your Mac Read To You. Introduction It&#8217;s late at night. Your eyes are blurry from hours of reading at your computer screen but you have pages more to go before tomorrow&#8217;s [...]]]></description>
			<content:encoded><![CDATA[<p>This article is a blog conversion of a manual I wrote for a journalism course this semester. Here&#8217;s a the original PDF: <a href="http://www.cutthroatstudios.com/blog/wp-content/uploads/2010/12/MakeYourMacReadToYou.pdf">How To Make Your Mac Read To You</a>.</p>
<h2>Introduction</h2>
<p style="padding-left: 30px;"><em><span style="color: #000000;">It&#8217;s late at night. Your eyes are blurry from hours of reading at your computer screen but you have pages more to go before tomorrow&#8217;s deadline. Wouldn&#8217;t it be nice if you could close your eyes and have your computer read those last few pages to you?</span></em></p>
<p style="padding-left: 30px;"><em><span style="color: #000000;">You just finished that paper you had so much trouble writing. You don&#8217;t have time to find someone to proofread it before tomorrow and you&#8217;ve been staring at it for so long you know you&#8217;ll read right over any mistakes. Wouldn&#8217;t it be nice if your computer could read it to you out loud, making those silly grammatical mistakes sound obvious?</span></em></p>
<p><span style="color: #000000;">If you</span>&#8216;re using Apple&#8217;s OS X, you can do both of those things easily. Apple was one of the early adopters of speech synthesis in 1984 and support for text to speech has been in their operating systems ever since. OS X has been shipped with all Macintosh computers since 2002. Unfortunately Apple is fond of moving the location of speech related menu items between versions, making users find them again. This document will teach you how to assign speech actions to a quick key combination in OS 10.6 &#8220;Snow Leopard&#8221; and how to use the command line tool &#8220;say&#8221; to create audio files of text to listen to at your leisure.<span id="more-198"></span></p>
<h2>Universal Access</h2>
<p>Many of OS X&#8217;s speech features are presented within the context of <em>Universal Access</em> which comprises extensions to the user interface that aid users with vision disabilities. The Universal Access tools are designed for users with disabilities, so they assume that if you want text on the screen spoken to you, you will want <strong>all</strong> of the text on the screen spoken to you all of the time. VoiceOver is a feature with Universal Access that narrates whatever you pass your mouse cursor over. Since we just want the computer to speak some specific things, we will not use VoiceOver or Universal Access but they are worth addressing because many sources that you may come across regarding Apple&#8217;s speech tools will direct you to Universal Access.</p>
<h2>Speak Selected Text Shortcut</h2>
<div id="attachment_200" class="wp-caption alignright" style="width: 277px"><a href="http://www.cutthroatstudios.com/blog/wp-content/uploads/2010/12/GetToSystemPrefs.png"><img class="size-full wp-image-200 " title="Get To System Prefs" src="http://www.cutthroatstudios.com/blog/wp-content/uploads/2010/12/GetToSystemPrefs.png" alt="How to get to the System Preferences on OS X" width="267" height="270" /></a><p class="wp-caption-text">A screen shot showing how to access the System Preferences on OS X</p></div>
<p>Next we will create a keyboard shortcut to speak highlighted text in any application. First, open the System Preferences. You can do this by opening the Apple menu at the top left of the screen and selecting &#8220;System Preferences&#8230;&#8221; (see :ref:`Figure 1 &lt;fig1&gt;`). Now select the &#8220;Speech&#8221; icon from the system preferences. You will now be confronted with the Speech preferences dialog. Check the third checkbox (see :ref:`Figure 2 &lt;fig2&gt;`) and press the &#8220;Set Key&#8230;&#8221; button directly to the right of the third checkbox. Finally, press a key combination that you would like to use for this action, preferably one that is not used by any other common action. Control-S is a simple choice that is not commonly used elsewhere (not to be confused with Command-S which is very commonly used).</p>
<p>You now have a keyboard shortcut to speak any text you like. If you are reading this document on your computer try it by highlighting this sentence and pressing your chosen keyboard shortcut. Here&#8217;s a fun sentence to try: &#8220;I am a computer. Boop beep boop be doop beep!&#8221; If you want to stop the computer in the middle of speaking press the shortcut again.</p>
<div id="attachment_201" class="wp-caption alignnone" style="width: 678px"><a href="http://www.cutthroatstudios.com/blog/wp-content/uploads/2010/12/SpeechPrefs.png"><img class="size-full wp-image-201" title="Speech Prefs" src="http://www.cutthroatstudios.com/blog/wp-content/uploads/2010/12/SpeechPrefs.png" alt="OS X Speech preferences dialog" width="668" height="487" /></a><p class="wp-caption-text">The Speech Preferences dialog. See the third checkbox labeled &quot;Speak selected text when the key is pressed.&quot;</p></div>
<p><strong>Advanced Tools</strong></p>
<h3>An Introduction to Say</h3>
<p>Now that we&#8217;ve covered the basics, there are some more advanced tools that can be useful. The program &#8220;say&#8221; is included with OS X. It is designed to be run at the command line and can convert text you type directly or the entire contents of a text file to speech that is either immediately played or saved to a file. To use a command line tool you will need to use a program called Terminal. To find it go to the Applications folder, then the Utilities folder, then double click on Terminal. Or go to the spotlight search bar on the top right of the screen and type Terminal. When the Terminal app appears on the search list, press enter.</p>
<p>Once we have a terminal window open we can play with say. Try typing:</p>
<pre>say "I can make you say whatever I want."</pre>
<p>You should hear the computer speak the text in quotes. The default mode, as you just heard, is to speak the text directly through your speakers (or your selected sound device if you have changed it). Lets say we want to make an audio file to listen to later, or to send to someone else, or perhaps because we&#8217;d like to be able to pause and resume the speech. We can do that using the following command:</p>
<pre>say "I can make you say whatever I want." -o speech.wav.</pre>
<p>The text in quotes will be spoken into a file called speech.wav. You can also use Apple&#8217;s AAC compression file format by using the extension m4a rather than wav. The output audio file is saved in the folder that you executed &#8220;say&#8221; in. If you are not sure where that is, type:</p>
<pre>pwd</pre>
<p>This will print out Terminal&#8217;s *working directory*, the address of the folder that Terminal is currently in. The address is a list of all the folders leading to the working directory separated by slashes, much like what you see in web addresses after the .com (or .org, or .net).</p>
<h3>&#8216;Say&#8217; an Entire Document</h3>
<p>Making your computer speak a short sentence that you have to type at the command line is not much of a convenience. However, you can tell &#8220;say&#8221; to create an audio file of an entire text file. This is much more useful as you can convert the entire document and sit back and listen. If you have to get up you can pause it, if you miss something you can rewind a bit and hear it again. The one snag is that &#8220;say&#8221; can only handle simple text files. It will not work on pdf or Microsoft Word documents. If you have a pdf or Word document that you want to have spoken to you, you can open the document and copy the text into TextEdit (in the Applications folder). You may need to clean up the pasted text as many pdf documents have headers and/or footers that will get pasted in the middle of the text of your document. Having a sentence interrupted to hear that you&#8217;re on page three of seven can be quite distracting. You can manually scan and delete unwanted sections in TextEdit and save the cleaned up file.</p>
<p>This next part may be a bit unfamiliar to those who do not often use the command line. You will have to change Terminal&#8217;s working directory to the folder where you have saved the document you want to &#8220;say&#8221;. To do this, you can use the &#8220;cd&#8221; (change directory) command to move to the desired folder. You can enter the entire path at once such as:</p>
<pre>cd Documents/SayStuff</pre>
<p>or you can move incrementally with the commands:</p>
<pre>cd Documents</pre>
<pre>cd SayStuff</pre>
<p>Both of these sequences will bring you to the same directory. Once you are in the folder with your text document, use &#8220;say&#8221; like this:</p>
<pre>say -f MyDocument.rtf -o MyDocument.m4a</pre>
<p>The file name after -f is the name of your text document (the extension may be other than rtf). The file name after -o is the name that the audio output file will be given. The output file name does not need to be similar to the input text document&#8217;s name. For long documents &#8220;say&#8221; may take a while to process.</p>
<h2>Conclusion</h2>
<p>OS X has an array of text to speech tools that are useful even to those without vision impairments. Those tools are not always easy to locate and many users are unaware of their existence. Now that you know how to speak highlighted text and create audio files of documents it&#8217;s time to give your eyes a rest and let your ears do the walking.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cutthroatstudios.com/blog/2010/12/make-your-mac-read-to-you/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Feature Based Approaches: ProFORMA</title>
		<link>http://www.cutthroatstudios.com/blog/2010/12/feature-based-approaches-proforma/</link>
		<comments>http://www.cutthroatstudios.com/blog/2010/12/feature-based-approaches-proforma/#comments</comments>
		<pubDate>Tue, 14 Dec 2010 23:39:02 +0000</pubDate>
		<dc:creator>Jess</dc:creator>
				<category><![CDATA[Research]]></category>
		<category><![CDATA[Computer Vision]]></category>
		<category><![CDATA[FAST Corner Detector]]></category>
		<category><![CDATA[ProFORMA]]></category>
		<category><![CDATA[Structure from Motion]]></category>
		<category><![CDATA[Videndo Aedificare]]></category>

		<guid isPermaLink="false">http://www.cutthroatstudios.com/blog/?p=178</guid>
		<description><![CDATA[An example of a modern, real-time, feature-based, structure from motion system is ProFORMA by Pan, Reitmayr and Drummond. To clear up the acronym, ProFORMA stands for Probabilistic Feature-based On-line Rapid Model Acquisition. Which is a mouthful. The video is impressive, though it is worth noting that the example model, with its simple geometry and texture rich surface, is [...]]]></description>
			<content:encoded><![CDATA[<p>An example of a modern, real-time, feature-based, structure from motion system is ProFORMA by Pan, Reitmayr and Drummond. To clear up the acronym, ProFORMA stands for Probabilistic Feature-based On-line Rapid Model Acquisition. Which is a mouthful. The video is impressive, though it is worth noting that the example model, with its simple geometry and texture rich surface, is ideal for the system <a href="/blog/2010/12/feature-based-approaches-proforma/#2">[2]</a>.</p>
<p><object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="350" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="src" value="http://www.youtube.com/v/vEOmzjImsVc&amp;feature" /><embed type="application/x-shockwave-flash" width="425" height="350" src="http://www.youtube.com/v/vEOmzjImsVc&amp;feature"></embed></object></p>
<p>Like most state-of-the-art structure from motion techniques, Pan, Reitmayr and Drummond&#8217;s approach is feature based <a href="/blog/2010/12/feature-based-approaches-proforma/#1">[1]</a>. Image features is an entire class of research in computer vision. The premise is that rather than operating on the entire image, in the form of raw pixels, it is smarter to pick &#8220;interesting&#8221; parts of the image and just consider the information around those spots. The process of finding interesting spots is called feature (or interest point) detection. There are many types of feature detectors (bearing all sorts of different names). Some look for bright or dark blobs, some look for points with a lot of local texture and some decide whether a point is interesting on some other criteria entirely. ProFORMA uses the <a title="FAST Corner Detector" href="http://mi.eng.cam.ac.uk/~er258/work/fast.html" target="_blank">FAST corner detector</a> as their feature detector.<span id="more-178"></span></p>
<p>There are two fundamental components to every feature:</p>
<ol>
<li>Feature location</li>
<li>Feature descriptor.</li>
</ol>
<p>Feature location is discovered during feature detection.  The feature descriptor is some way of describing what that feature is, as opposed to simply where it is. The most obvious feature descriptor is simply the pixels around the feature. Computer vision researchers have developed an array of less obvious feature descriptors that have neat properties, but a discussion of them is beyond the scope of this article. ProFORMA uses the 5&#215;5 grid of pixels centered on the feature location as its feature descriptor (mean and variance normalized pixels that is).</p>
<p>Why features? ProFORMA needs feature detection because they want to find the same spots on the model from frame to frame, even though that spot will not be at the same pixel coordinates. Finding the same spots and seeing how much they have moved in pixel coordinates is how they determine how the model must be moving. Now, say that we&#8217;ve found 50 features in each frame 1 and frame 2. Some of the features have moved. Since they&#8217;re not at the same positions in the image, how do we know which feature in frame 2 was feature x in frame 1? This is where feature descriptors come in. We can compare the feature descriptor of feature x from frame 1 with the feature descriptor of every feature in frame 2. When we find one that matches we can assume that it is the same feature as in frame 1 and we can measure how far that spot on the model moved between frames. In this way we can figure out how many points on the model moved between frames. From this, given what we know about the shape of the model, we can infer the motion of the entire model.</p>
<p><strong>Author&#8217;s Note: </strong>This article is one of three that compose my report on the Videndo Aedificare project. The other two are <a href="/blog/2010/12/background-subtraction/">Background Subtraction</a> and <a href="/blog/2010/12/videndo-aedificare/">Videndo Aedificare</a>.</p>
<h2>References</h2>
<p><!-- p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Monaco} --><a name="1"></a>[1] Q. Pan, G. Reitmayr, and T. Drummond. ProFORMA: Probabilistic Feature-based On-line Rapid Model Acquisition. In Proc. 20th British Machine Vision Conference (BMVC), London, September 2009. <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.157.7093&amp;rep=rep1&amp;type=pdf" target="_blank">Download</a></p>
<p><a name="2"></a>[2] Pan, Qi. &#8220;ProFORMA: Probabilistic Feature-based On-line Rapid Model Acquisition.&#8221; 18 July 2009. Online video clip. YouTube. <a href="http://www.youtube.com/watch?v=vEOmzjImsVc" target="_blank">http://www.youtube.com/watch?v=vEOmzjImsVc</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cutthroatstudios.com/blog/2010/12/feature-based-approaches-proforma/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Background Subtraction</title>
		<link>http://www.cutthroatstudios.com/blog/2010/12/background-subtraction/</link>
		<comments>http://www.cutthroatstudios.com/blog/2010/12/background-subtraction/#comments</comments>
		<pubDate>Sun, 12 Dec 2010 06:09:05 +0000</pubDate>
		<dc:creator>Jess</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Background Model]]></category>
		<category><![CDATA[Background Subtraction]]></category>
		<category><![CDATA[Computer Vision]]></category>
		<category><![CDATA[Motion Detection]]></category>
		<category><![CDATA[Running Gaussian Average]]></category>
		<category><![CDATA[Videndo Aedificare]]></category>

		<guid isPermaLink="false">http://www.cutthroatstudios.com/blog/?p=156</guid>
		<description><![CDATA[After working for quite a while on the &#8220;motion detection&#8221; algorithms described in my last article, I was clued in to background subtraction. Or rather, it finally hit me why, when explaining what I was working on, people kept saying, &#8220;Oh, you&#8217;re doing background subtraction.&#8221; Background subtraction is the keyword for a relatively well explored [...]]]></description>
			<content:encoded><![CDATA[<div class="wp-caption alignleft" style="width: 290px"><a href="http://iss.bu.edu/jkonrad/Research/VSNs/Background_Subtraction/background_subtraction.html" target="_blank"><img class="  " title="Background Subtraction" src="/images/2010-12-11_BGSubtractionBU.jpg" alt="Sidewalk Background Subtraction" width="280" height="356" /></a><p class="wp-caption-text">An example of a background/foreground segmentation from Janusz Konrad at Boston University.</p></div>
<p>After working for quite a while on the &#8220;motion detection&#8221; algorithms described in <a title="Videndo Aedificare" href="/blog/2010/12/videndo-aedificare/">my last article</a>, I was clued in to background subtraction. Or rather, it finally hit me why, when explaining what I was working on, people kept saying, &#8220;Oh, you&#8217;re doing background subtraction.&#8221;</p>
<p>Background subtraction is the keyword for a relatively well explored nook of computer vision. The motivation for background subtraction research is that, in an image, there is usually a part of the image that you care about (the foreground), and a part that you don&#8217;t care about (the background) and it would be nice to focus only on the parts that we care about. There are many justifiable ways to divide a single image into foreground and background sections if we use the criteria that the foreground is &#8220;things we care about&#8221; and the background is &#8220;things we don&#8217;t care about.&#8221; The colloquial distinction is that the foreground is usually closer to the camera, in focus, and more interesting than the background. The last is where subjectivity enters the equation. In the field of background subtraction and in the context of video, the consensus is that the background is the part of the image that does not belong to a sizable moving object. This is still a big vague but different algorithms have different ideas about what constitutes a background.<span id="more-156"></span></p>
<h2>Running Gaussian Average</h2>
<p><!-- p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Monaco} -->The simplest background model I found uses a running gaussian average. That’s fancy talk for, at every frame, each pixel of the background model is equal to a weighted sum of the pre-existing background pixel and the same pixel in the new frame. In pseudocode:</p>
<p><code> imgBG := the background image (blank to start)<br />
a := weight that balances fast updating and background stability<br />
For every frame:<br />
&nbsp;&nbsp;&nbsp;&nbsp;imgF := the current frame<br />
&nbsp;&nbsp;&nbsp;&nbsp;For every pixel p:<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;imgBG[p] = a * imgBG[p] + (1 - a) * imgF[p]<br />
</code></p>
<p>Azarbayejani, Wren and Pentland created a person tracker called Pfinder in the mid 1990&#8242;s that utilized a running gaussian average for background subtraction <a href="#1">[1]</a>. Pfinder located and tracked people and parts of people such as their head and hands, for gesture recognition. They even had an interface to <a href="http://en.wikipedia.org/wiki/Wolfenstein_3D" target="_blank">Wolfenstein3D</a> where the user held a toy gun and moved their bodies to control the game (think Microsoft Kinect circa 1996).</p>
<h2>Temporal Median Filter</h2>
<p>The temporal median filter is almost the same as the background model I use in Videndo Aedificare (what I called the Evolving Background Model, EBM, in <a href="/blog/2010/12/videndo-aedificare/">the last post</a>). In similar words, my EBM would be called a temporal mean filter, as I used the arithmetic mean rather than the median.</p>
<p>A temporal median filter, as described by Cucchiara in the 2003 paper Detecting Moving Objects, Ghosts, and Shadows in Video Streams, takes N frames of previous video and, for each pixel, finds the median all N values <a href="#2">[2]</a>. This median becomes the value of the background model at each point. Rather than simply considering the last N frames, Cucchiara samples the video stream every 10 frames. In other words, when a new frame is added to the background model at time t, the set of frames considered consists of frames,</p>
<p>{t, t &#8211; 10, t &#8211; 20, &#8230; , t &#8211; 10 * n }.</p>
<p><!-- p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Monaco} -->As I mentioned, this is similar to how my background model works, but there are a few differences. Instead of finding the median (the value in the middle) of n frames, I calculated the mean (average) of all n frames. Rather than adding a new frame to the background model every 10 frames, or even every K frames, I added a frame to the background model every 1 second. I chose to use wall-clock time rather than frames so that the real-time response of my algorithm would be more independent of the speed of the computer running it.</p>
<h2>Pixel Comparison and Color Spaces</h2>
<div class="wp-caption alignright" style="width: 242px"><a href="/images/2010-12-11_SpinningCellphone.jpg"><img class=" " style="margin: 10px;" title="Spinning Cellphone" src="/images/2010-12-11_SpinningCellphone.jpg" alt="A spinning cellphone being detected" width="232" height="184" /></a><p class="wp-caption-text">A spinning cellphone being detected. I feature examples with my cell phone because it spins nicely, allowing me to get my hands out of the frame while it is still in motion.</p></div>
<p>Another difference between my method and most in the literature is in how they compared pixels. All of the examples in the literature, that operated on color images, used a color space distance to compare pixels. That is, they treated a color value as a 3D point in space and when comparing two pixels, they compare the straight-line (euclidean) distance between the two points. For instance RGB color space is just like a regular 3D space with the axis labelled r, g, b rather than the conventional x, y, z. In this way, they can consider two pixels different if their distance is above a certain threshold. I didn&#8217;t think of doing this! Instead, I compare each color channel pairwise between two pixels and decide that the pixels are different if the difference between ANY of the color channels is above a threshold. This is slightly faster than computing euclidean distance. The difference in the computation amounts to this, the straight line color space distance effectively draws a sphere about a point and says that any point within this sphere is not different enough to consider a change, while all the points outside of it are different. My distance measure effectively draws an axis-aligned cube about the point, in color space, and says any pixel falling within the cube is similar enough, all points outside are different.</p>
<h2>The Wrap Up</h2>
<p>Being dense and re-inventing a few background subtraction techniques ended up buying me a better understanding of this problem. If I had gone straight to the literature I would have picked out an algorithm straight from someone else&#8217;s paper. It might have saved me a lot of time, but not only would I not have learned as much, but I probably would have picked an overly complicated algorithm. Beyond those that I wrote about above, there are some pretty complicated background subtraction algorithms based on techniques such as mixture of gaussians, kernel density estimation or eigenbackgrounds <a href="#3">[3]</a>. Some of these more complicated techniques are very effective and I probably would have thought that the simpler techniques would be insufficient. As it turned out, my simple temporal mean filter was sufficient for my problem. Upon reading the literature it was validating to discover that many people had arrived at similar conclusions (and in papers less than ten years old!).</p>
<p><strong>Author&#8217;s Note: </strong>This article is one of three that compose my report on the Videndo Aedificare project. The other two are <a href="/blog/2010/12/videndo-aedificare/">Videndo Aedificare</a> and <a href="/blog/2010/12/feature-based-approaches-proforma/">Feature Based Approaches: ProFORMA</a>.</p>
<h2>References</h2>
<p><!-- p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Monaco} p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 11.0px Monaco; min-height: 15.0px} --><a name="1"></a>[1] C. Wren, A. Azarbayejani, T. Darrell, and A.P. Pentland, &#8220;Pfinder: real-time tracking of the human body,&#8221; IEEE Trans. on Patfern Anal. and Machine Intell., vol. 19, no. 7, pp. 780-785, 1997. <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.54.214&amp;rep=rep1&amp;type=pdf" target="_blank">Download</a></p>
<p><a name="2"></a>[2] R. Cucchiara, C. Grana, M. Piccardi, and A. Prati, “Detecting moving objects, ghosts, and shadows in video streams,” IEEE Trans. on Pattern Anal. and Machine Infell., vol. 25, no. 10, pp. 1337-1442, 2003. <a href="http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.59.4666&amp;rep=rep1&amp;type=pdf" target="_blank">Download</a></p>
<p><a name="3"></a>[3] M. Piccardi, &#8220;Background subtraction techniques: a review,&#8221; IEEE Intl. Conf. on Systems, Man and Cybernetics, pp 3099-3104, 2004. <a href="http://profs.sci.univr.it/~cristanm/teaching/sar_files/lezione4/Piccardi.pdf" target="_blank">Download</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cutthroatstudios.com/blog/2010/12/background-subtraction/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Videndo Aedificare</title>
		<link>http://www.cutthroatstudios.com/blog/2010/12/videndo-aedificare/</link>
		<comments>http://www.cutthroatstudios.com/blog/2010/12/videndo-aedificare/#comments</comments>
		<pubDate>Wed, 08 Dec 2010 20:13:25 +0000</pubDate>
		<dc:creator>Jess</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Background Detection]]></category>
		<category><![CDATA[Feature]]></category>
		<category><![CDATA[Feature Detection]]></category>
		<category><![CDATA[Ogre]]></category>
		<category><![CDATA[ProFORMA]]></category>
		<category><![CDATA[Structure from Motion]]></category>
		<category><![CDATA[Videndo Aedificare]]></category>
		<category><![CDATA[ZeroRay]]></category>

		<guid isPermaLink="false">http://www.cutthroatstudios.com/blog/?p=145</guid>
		<description><![CDATA[Videndo Aedificare is a project I&#8217;ve been working on as a part of my coursework for CS612, Advanced Topics in Computer Vision. The name means &#8220;By seeing, to build&#8221; (according to Google translate) and that is exactly what it attempts. The goal of the project is to build a rudimentary system that takes a real [...]]]></description>
			<content:encoded><![CDATA[<p><!-- p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 15.0px Helvetica} p.p2 {margin: 0.0px 0.0px 0.0px 0.0px; font: 15.0px Helvetica; min-height: 18.0px} -->Videndo Aedificare is a project I&#8217;ve been working on as a part of my coursework for CS612, Advanced Topics in Computer Vision. The name means &#8220;By seeing, to build&#8221; (according to Google translate) and that is exactly what it attempts. The goal of the project is to build a rudimentary system that takes a real time webcam feed and builds a 3D model of the viewed scene.</p>
<h2>Introduction</h2>
<p>The project was inspired by a paper by Pan, Reitmayr and Drummond called <em>ProFORMA: Probabilistic Feature-based On-line Rapid Model Acquisition</em>. Actually, it might be more honest to say that the project was inspired by the <a title="ProFORMA Video" href="http://www.youtube.com/watch?v=vEOmzjImsVc&amp;fmt=22" target="_blank">video that the ProFORMA authors posted on YouTube</a>.</p>
<p>For a discussion of ProFORMA and feature based approaches see my article, <a href="/blog/2010/12/feature-based-approaches-proforma/">Feature Based Approaches: ProFORMA</a>.</p>
<p>Videndo Aedificare (VA) does <em>not</em> use a feature based approach. I had wanted to at first, but decided that implementing a state of the art structure from motion system is beyond what is feasible in a semester project for a one person team (read: was talked down by my professor). Videndo Aedificare is built on my 3D graphics/computer vision toolkit, <a href="http://sourceforge.net/projects/zeroray/" target="_blank">ZeroRay</a> (which is a topic on my article todo list for this blog that keeps getting put off!). Its primary goal is to present a framework for exploring real time structure from motion algorithms. It provides a neat API to subscribe listeners to a connected webcam and classes to display results, either 2D images (which may be intermediate results for debugging), or 3D scenes. Videndo Aedificare uses the <a title="Ogre3D" href="http://www.ogre3d.org/" target="_blank">Ogre open source rendering engine</a> to provide 3D views. Camera listeners implement a receiveFrame method, by which they are passed the current camera frame, and given time to operate on it. Often camera listeners have their own views to display results side by side with the raw camera feed.</p>
<h2>Simple Builder</h2>
<div class="wp-caption aligncenter" style="width: 546px"><img title="VA Simple Builder" src="/images/2010-12-06_SimpleBuilder.png" alt="Videndo Aedificare Simple Builder" width="536" height="200" /><p class="wp-caption-text">VA&#39;s Simple Builder. The webcam video stream picturing my monitor wearing a festive hat (left) beside the 3D rendering of the model constructed from the scene (right).</p></div>
<p>As a proof of concept, and to test my framework, I implemented the most naive scene reconstruction algorithm I could think of. It assumed that the intensity of a pixel was inversely proportional to the distance of that point to the camera. In other words, bright pixels are close to the camera, dark ones are far away. Simple Builder generates a polygonal mesh from each frame by first making the image greyscale and then interpreting it as a height map. The maximum and minimum heights are parameters and the greyscale values are interpreted between them. The visual effect is rather interesting.<span id="more-145"></span></p>
<p>Since each frame is changed into geometry in real time, the grey patch you see on the right hand side of the image above would shimmer and shift as you watched it. The human vision system infers a lot of information even from the height map. My first reaction to the Simple Builder was &#8220;Wow&#8230; that looks good! How did such a stupid idea work?!&#8221; But it actually didn&#8217;t. Rotating the 3D view around, you can see that the height map only rarely reflects the true geometry of the scene, even in a simplified sense. The one exception was skin. Skin often had the correct contour as in the hand in the image above, but as you can see the wall in the background appears in the foreground of the height map. The speckle texture is an effect of camera noise as no noise reduction was performed ont he image before extracting the height map.</p>
<h2>Scene Builder</h2>
<p>The other builder that I worked on, I creatively called <em>Scene Builder</em>. My roadmap was to have it proceed in the following manner:</p>
<ol>
<li>Background Detection</li>
<li>Locate Motion/Object</li>
<li>Refine Object</li>
</ol>
<p>I ended up spending quite a while on background detection, as it seemed crucial to have a robust way to locate the portion of the image that was moving. The reason that I bring the word &#8220;robust&#8221; into it, is that the video stream captured by the iSight camera in my MacBook Pro was very noisy. I&#8217;m not sure how noisy it is compared to cameras in general, or even other webcams, but I was surprised by the amount of gaussian (random) noise in my images. The noise could be reduced by good lighting conditions. Unfortunately, good lighting conditions seemed to be synonymous with daylight and I often ended up working on this project at night. Needless to say, developing a noise-tolerant algorithm became a priority.</p>
<p>With the goal of computing which regions had changed between two frames I needed two things: a way to obtain an image that contained only the background and a way to compare two images that returned a region of pixels. I&#8217;ll talk about the comparison method first.</p>
<h2>Image Comparison</h2>
<p>At first, I chose the simplest algorithm I could think of. A toleranced pixel by pixel comparison of the pixel intensities where I saved the minimum and maximum points that were different by at least the tolerance, epsilon. This yields an axis-aligned bounding box around the changed region. To be noise tolerant this technique required an epsilon of around 54 to get the detected region to stop jumping at shadows (er, at noise, it technically should jump at shadows). Since pixel values range between 0 and 255, an epsilon of 54 is 21% of the total range.</p>
<p>I was dissatisfied with how large the tolerance had to be so, after receiving a suggestion, I added the requirement that, for a pixel to be &#8220;moving&#8221; all of its neighboring pixels must also have changed. This is reasonable because it is unlikely that a real object moving in front of the camera will only affect one pixel. A one-pixel change is much more likely to be noise. Even if the change is due to a real object, it is so tiny that we probably aren&#8217;t interested in it anyway (this is not a <a title="Flydra" href="http://www.its.caltech.edu/~astraw/research/flydra/" target="_blank">fly tracking algorithm</a>). This change vastly improved the noise robustness of the comparison. Previously I&#8217;d had to use pixel intensity as it was more noise tolerant than my intended comparison, comparing each colour channel and considering the two pixels different if any of the differences between channels was at least epsilon. With the neighbor consulting comparison, the stability actually improved by looking at all channels rather than the intensity. I was also able to drop epsilon back from 54 to around 10 to 15.</p>
<div class="wp-caption aligncenter" style="width: 543px"><img class="     " title="Image Comparison" src="/images/2010-12-07_ImageComparison01_small.png" alt="Image Comparison" width="533" height="200" /><p class="wp-caption-text">The evolving background model&#39;s detection of a box thrown into the field of view. The red rectangle is oversized because the box had just travelled into the frame from the upper left and bounced once.</p></div>
<h2>Background Models</h2>
<p>The section describes the series of background models I created as I explored this problem. This is a background subtraction problem. For a discussion of background subtraction in general and how my approaches relate to others in the literature see my article <a href="/blog/2010/12/background-subtraction/">Background Subtraction</a>.</p>
<p><strong>The Simple Background Model</strong></p>
<p>The background model went through several iterations. My first attempt, which I called the Simple Background Model (SBM) took N frames during an initialization period. During this period the user is required to present the camera with only the background. This led to a somewhat awkward testing workflow which began with me hiding under my desk during the initialization period and then popping up to wave my hands at the camera. This earned me a lot of weird looks and some interesting conversations at the office. The frames captured during the initialization phase are stored, and at the end of the phase they are averaged (using the arithmetic mean) into a single background image. This accumulation and averaging removed a lot of the noise from the background model.</p>
<p>Surprisingly, this simple model performed quite well. It found a rectangle around any new objects introduced into the scene and the rectangle was stable around the new object (it didn&#8217;t jitter or jump around). The problem that arose is that this model was very sensitive to tiny changes in the &#8220;background&#8221; of the scene over time. The SBM considered any difference from the background an &#8220;object&#8221; (very loosely). For example, in the above screenshot, the papers at the far side of the desk move very gradually as air currents in the room shifted. This is a very tiny change, but the model is sensitive to any change of even a few pixels. So after a few minutes, what was logically still the background, changed enough that the SBM thought it wasn&#8217;t background. I needed a more sophisticated model of the background of an image.</p>
<p><strong>The Clever Background Model</strong></p>
<p>For the next background model, I thought I was being very clever. It turned out that the clever idea I&#8217;d thought of didn&#8217;t work. So I scrapped the whole idea and went back to the simple idea that did work, the SBM, and tried to figure out how to make it evolve over time.</p>
<p><strong>The Evolving Background Model</strong></p>
<p>I called the third background model the Evolving Background Model (EBM). It operated on the same principles as the SBM; it captured a series of images and averaged them. However, the EBM kept all of its images through the entire run in a queue. After a brief initialization period where it filled its frame queue, it took another image every few frames. When it took a new image, it threw out its oldest image and multiplied every pixel in the image by 1 / N, where N is the number of images it was keeping. That way, when asked to make a comparison, it simply summed all N images together to arrive at the average background image. As images are constantly being replaced every few frames, the background model adapts to changes in the background. In fact, if you sit in front of the camera and hold very very still, it will begin to classify you as part of the background. This seemed great. No more hiding under the desk, no more weird looks around the office. The problem was, that now I was detecting motion, not just a change from the background. I thought this was what I wanted, but it turned out that it had its own pitfalls.</p>
<p>The reason I&#8217;m trying to detect foreground objects in the video feed is that I want to determine their size and location so that I can begin to generate a 3D model of them. The problem with detecting motion is that if a new object is placed in the scene, unless it is constantly perturbed, after a time it is no longer detected (since it isn&#8217;t moving). It becomes part of the background. This effect causes the detected region to jitter and move a lot, even if you hold up an object and rotate it in front of the camera. Some regions of the object have the same appearance and as you move the object it is possible for a solid coloured region to pass through the same pixel for several frames, resulting in a conclusion that that pixel is not moving. The EBM does very well at detecting motion and telling you where it is, but not at forming a neat, stable, bounding box around the object that is causing that motion.</p>
<h2>Jitter Solutions</h2>
<p>My first idea to solve this problem was to keep a running average of the detected region, or of the generated model geometry. But the more I thought about it, the more I began to think that this was a bad solution. Introducing a running average in my background model was going to necessitate introducing a running average into my object location? I would be chasing my tail. Going back to the SBM results in a stable detected object region, but reintroduces the problems of small background changes throwing off the detection. The compromise that I settled on was using the EBM with slow update rates, for instance, giving the EBM a new frame only every second rather than every few frames. This slows the jitter of the detected region without giving up the ability to adapt to gradual background changes.</p>
<h2>Fitting Geometry</h2>
<div class="wp-caption aligncenter" style="width: 543px"><img title="Scene Builder" src="/images/2010-12-07_SceneBuilder01_small.png" alt="Screenshot of the SceneBuilder" width="533" height="200" /><p class="wp-caption-text">Scene Builder detecting a spinning cell phone (left) and fitting a cylinder between the camera and the background and texturing it with a portion of the live video (right).</p></div>
<p>I was very interested in this part of this project, but I ended up spending much of my time thinking about background models instead. Currently what the SceneBuilder does is generates a cylinder in front of the background image that roughly occludes the detected area. The cylinder is then textured with the part of the image in the detected region.</p>
<h2>Future Extensions</h2>
<ul>
<li>The texture that is placed on the object cylinder needs to be deformed so that the image does not appear distorted by the curved surface.</li>
<li>Detect features within the changed region and match them between frames to determine whether the object has rotated.</li>
<li>If the object has rotated, estimate the pose and add the new frame&#8217;s changed region&#8217;s texture to the part of the cylinder that was previously not exposed to the camera. In this way, texture the entire object as it is rotated.</li>
</ul>
<p><strong>Author’s Note: </strong>This article is one of three that compose my report on the Videndo Aedificare project. The other two are <a href="http://www.cutthroatstudios.com/blog/2010/12/background-subtraction/">Background Subtraction</a> and <a href="/blog/2010/12/feature-based-approaches-proforma/">Feature Based Approaches: ProFORMA</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cutthroatstudios.com/blog/2010/12/videndo-aedificare/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

