In this project I implemented methods to accurately model the flow of light in the real world by using a pathtracing alogirthm. The core routines include ray generation and scene intersection, bounding volume hierarchy (BVH), direct illumination, global illumination, and adaptive sampling. It was very rewarding to write an algorithm and then see it properly generate rays and create normal maps and BVH's. But then once I started to get my results from the direct illumination function, and next with global illumination, I was blown away at each step when my renders (which took a while!) were done. I enjoyed rendering individually a scene with direct lighting only, and indirect lighting only. Viewing these two output made the entire flow of things make way more sense. Plus, the indirect lighting only scene looks awesome (even though unrealistic).
The ray generation and primitive intersection parts of a rendering pipeline start with the Pinhole camera Model.
I started off by implementing the
num_samples
times, and in each loop
taking the original given pixel and adding to it an offset from a random sample (between 0 and 1). This way
I get different random points within my pixel (which ranges from [x, x+1] and [y, y+1]. Next I call the
camera's generate_ray(newPoint.x, newPoint.y)
passing in the newPoint which has the offset and
is normalized over the sampleBuffers width and height to give a point between (0,0) and (1, 1). This way
inside
of the camera class we can easily manipulate the point despite it's original mapped space. The
generate_ray()
returns a ray which I think use as an argument to
call est_radiance_global_illumination
which evaluates that ray's radiance and returns the
Spectrum.
In order to get the average I accumulate a sum of all the sample ray's radiance's and after my loop I divide
my sum by the number of samples and return the average.
The Camera::generate_ray()
, as mentioned above, gets the x and y coordinates of a pixel to cast
a ray through. I first mapped the [0,1] normalized space, to the space from [1,1], to take into account
the camera looking along the z axis. by multiplying each
coordinate by 2 and subtracting 1. Next I mapped those coordinates to the ray's direction in camera space by
using
the following two lsformulas:
x *= tan(radians(hFov) * 0.5)
y *= tan(radians(vFov) * 0.5)
Where hFov
and vFov
are field of view angles.
Finally, so that we can convert this vector into world space we must apply the given c2w
transform
matrix, and then normalize it. This is our ray's direction vector. Before returning the ray I set it to
begin
at the provided nClip
camera parameter and end at fClip
.
In order to get something to show up on the screen I had to implement the triangle intersection
Triangle::intersect()
. I used the Möller Trumbore algorithm, which achieves calculating fast
a raytriangle intersection. Given a ray, and 3 points, it defines 5 variables:
e1
which is the difference between p2 and p1, e2
which is the difference between
p3 and p1,
s
which is the difference between the ray's origin and p1, s1
which is the cross
product
of the ray's direction and e2, s2
which is the cross product of s and e1. With these 5 terms
defined
I can then calculate the variables t, the intersect, b1 and b2 which are values used for the barycentric
coordinate
interpolation. I calculate these values by doing the following transformations:
With the computed values for t, b1, and b2, I have to finally check whether or not my t value is actually within the bounds of the triangle or not. I do this by making sure that t is within the beginning and end of the ray's segment, that b1 and b2 are between 0 and 1, adn that b1+b2 is less than 1. if all fo these verify then the ray does in fact intersect the triangle.
Here are some normal shading dae files that were produced by using the implementation above:




Although the ray generation method for creating normal maps is cool, it's not very fast on it's own. Rendering images with thousands or more triangles takes very very long. Therefore I implemented the Bounding Volume Hierarchy technique that speeds up the ray tracing process by creating a tight (but not too tight) bound around the object I am tracing.
To construct BVH I followed these steps:
max_leaf_size
primitives, this means
that we are at a leaf node and I simply create a vector with all of the primitives and assign it to the
new node
and return that node.
bbox.extent
property and compare it's x, y, and z lengths. My split point axis becomes the biggest axis, and the
split point
itself if the center of it. Next I loop through all of my primitives once again and compare their
bounding box's centroid
point on the selected axis, with the calculated split point. If it is smaller than the split point, I
insert the
primitive into a vector I named left
, and if it is equal or bigger than the split point, I
insert it
into the vector right
.
left
or
right
vector ends up empty. In order to account for this scenario I placed all this logic
inside of a
while loop that keeps looping unless both the left
and right
vectors are not
empty.
If either vector is empty I compute a new split point to be at the mean of the vector that has all of
the
primitives in it. And I set the vector with content back to empty.
node>l
and node>r
. Finally I return the node.
The following pictures show a summary of the steps in splitting the bounding volume:






I also implemented a ray  bounding box intersect function that returns true if the ray intersects the box and false if it does not. The implementation uses the RayPlane intersection algorithm for an intersect perpendicular to a particular axis:
Therefore I calculated the min and max for each axis using the algorithm above. I also made sure that the
min was in facts smaller than the max, and if they were inverted I swapped them. Then I set my intersect's
min to the the max of {xmin, ymin, zmin}
and the intersect's max to the min of
{xmax, ymax, zmax}
. If the intersect's min is larger or equal than it's max, then there is no
intersect and I return false. Otherwise an intersect has been found and I return true. The following image
depicts this final intersection result clearly:
And here are some images that I was able to render in less than 2 seconds each (on my personal macbook pro):


Here is a table that compares the times in seconds of different processes done before and after BVH was implemented on the cow geometry:
Process  Before BVH  After BVH 
Collecting primitives  0.0004s  0.0004s 
Building BVH from 5856 primitives  0.0011s  0.0282s 
Rendering  191.3375s  0.2567s 
BVH traced  479977 rays  471153 rays 
Averaged Intersections per ray  4008.378  3.0121 
This table makes it easy and clear to see the difference between using a bounding volume hierarchy versus not. It makes sense that collecting the primitives takes the same amount of time, since the number of primitives has not changed between both implementations. It is also no surprise that the process of building BVH from the primitives is quicker before we implement BVH. Things get interesting when we look at the difference in rendering time. Once BVH was implemented, the rendering time improved by about 744%! That is an incredible improvement. Another improvement was the number of rays used, which decreased by about 8000 rays. Lastly, another important observation to make is how many averaged intersections a ray made, which decreased about 1,330%! Another impressive improvement made with the simple addition of BVH.
Direct illumination is when a scene is lit up by only the light sources in the scene. In order to achieve this I started off by implementing a BSDF (bidirectional scattering distribution function), which represents the color of a material and what colors are produced in in the outgoing direction towards a ray of incoming light that hits the object. I specifically implemented the diffuse case, which is when light is scattered in all directions uniformly, in such a way that the incoming direction of the light and outgoing ray from the object do not affect the final color of the material. Therefore all the Diffuse function does is return the reflectance divided by pi.
With a material available to us, next I implemented two direct lighting estimations: Uniform hemisphere sampling and Importance sampling. In a general sense, uniform hemisphere sampling estimates the direct lighting on a point by sampling uniformly in a hemisphere, while importance sampling instead samples all Direct Lighting only. the lights directly.
Hemisphere sampling will take samples uniformly around a point of interest. It does this by creating a number of ray samples that are in the same direction as our sample's hit point, and then checks if each ray intersects our bounding volume or not. One thing to keep in mind is that it it is necessary to offset the origin of the ray from the hit point by a small constant so that the ray does not frequently intersect the same triangle at the same spot again (this is due to floating point imprecision). If it does intersect then I calculate the material's emitted light and transform that radiance into an irradiance. Finally I weight the irradiance by the pdf (following Monte Carlo Integration) and return.
We no longer have only a normal map! But our image is still a bit grainy. Although we can keep increasing the number of camera rays per pixel, and the number of samples per area light until convergence, there is a better method which is Importance sampling over the lights. For this method I summed over every light source in the scene, sampling the surface of each light, computed the incoming radiance from those sampled directions, then converted them into outgoing radiance using the BSDF (which in our case is Diffuse) at the surface. The biggest difference to take away is that we sample each light directly instead of uniformly along a hemisphere. First I looped over every light in the scene. If my scene has a point light, which means that light only goes in one direction, then I only need one sample since all of them would end up falling in the same place. But if the light is an not a point, then we loop over the number of samples to create each ray. For each sample we cast a shadow ray from the intersection point towards the light, anc check if it intersects the scene anywhere. If it does not find an intersect, then the light is hitting the sampled point and I then return the accumulated calculated diffuse BSDF.
Here are the results of both methods:


It is clear from these two images above that importance sampling does in fact reduce noise, and it also resolves issues with intersections with point light sources. In uniform hemisphere sampling a single sample point will cast of many rays into randomly uniform directions, and maybe one will hit the light source. This causes noise because think about how a neighboring sample point will also cast many rays, but its rays might not actually hit the light. Therefore, two points near each other might end up one being lit and the other not.
What importance sampling does instead (and better) is that is that it takes samples directly from the light rays, and checks if the returning ray in the direction from the point to the light hit any objects causing an occlusion. Therefore there is less randomness to this method which results in less noise. Here are two more nice looking Importance sampling Images:


And one more example of uniform hemisphere sampling for comparison:
Here are four scenes that compare the noise levels in soft shadows when rendering 1, 4, 16, and 64 light rays. This images using 1 sample per pixel and importance sampling direct lighting:




With only 1 camera ray pixel it is easier to see the difference between different smaples per area light. With 1 sample per area light the noise level is pretty high and the shadow under the bunny is pretty hard. Under the bunny it's pitch black and even kind of hard to distinguish where its feet are. For 4 light rays you can start to see better shadows, but as the shadow get's further away from the object, the more noisy and random the "dark" shadow pixels are. Using 8 light rays produces a smoother image. It is still grainy, giving a sandytype look to the image's texture. The shadow under the bunny is less harsh, although as the shadow gets further away from the object, the noisier it gets. In the last image, where we have 64 sample per area light the scene's shadows are beautifully smooth! The shadow casted under the bunny smoothly dissolves as it gets further away from it, and is not harsh.
Indirect lighting creates a more realistic looking scene because it takes into account for light that bounces off of objects and colors them appropriately. I implemented four functions:
zero_bounce_radiance
: Which returns the light that results from no bounces of light. This
simply returns the light emitted from the intersection of the ray and the point.
one_bounce_radiance
: Which selects whether to set the Spectrum to the Uniform Hemisphere
lighting or Importance lighting direct functions.
est_radiance_global_illumination
: Which simply returns the addition of the
zero_bounce_radiance
and the at_least_one_bounce_radiance
.
at_least_one_bounce_radiance
: Which handles most of the indirect lighting logic so I will
explain it in more detail below.
The at_least_one_bounce_radiance
does exactly that, it calculates the lighting equations for
at least one bounce of light. It takes a sample from the surface BSDF at the intersection point and uses
Russian roulette to decide whether to randomly terminate the ray or not. This decision also takes into
account whether the depth of original ray is greater than 1, and if the original ray is equal to the
max_ray_depth. If these conditions hold then we recurse through our function, passing in the new bounced
ray creating, and it's intersection point. The result of this recursion is added to our output along scaled
by the sammple BSDF, a cosine, divided by the BSDF pdf.
Here are two scenes that compare directonly lighting versus indirectonly lighting:


In the direct lighting we get the effect we had before with the direct lighting equations before implementing the indirect ones. There is no color bouncing off of the walls onto the spheres, and the shadow under the spheres are pitch black and thick. In indirect lighting the shadows are much softer and do reflect the colors on the walls. You can see on the left and right side of the spheres in the indirect lighting scene red and blue bouncing off of their surfaces. The ceiling in indirect lighting is now apparent because the light bounces of the objects and hits the ceiling as well, where as in the direct scene, no light rays are casted onto the ceiling. And by joining the two you get:
Now lets do a maxraydepth comparison:





Since the indirect lighting function is implemented in such way that a ray's depth gets initialized to the max ray depth and we only add one more light bounce to the output if this depth is greater than 1, then it makes sense that at 0 and 1 max ray depth, the images above have no global illumination. It is treated like a single point light and only illuminates the scene as the direct lighting does. The other max ray depths (2,3, and 100) are very similar looking. It seems that at this point, since recursion is expensive, you could probably set the max ray's depth to a small value and still get a very similar high quality result.







The results above are interesting but not unfamiliar. It makes sense that the less pixels you have, the more "random" the color bouncing becomes. So in the sampleperpixel = 1, you can see the staticlooking image, with many colorful pixels scattered around (but still in a way that you can distinguish the image). As the sampleperpixel rate goes up, the better the estimate is made of what color a pixel should be depending on the bounced back ray.
Here are some more images displaying the global illumination effect:
Adaptive sampling allows for quicker results with less noise since it uses an algorithm to test batches of samples. Basically we know by now that we can reduce noise by increasing the number of samples per pixel, but all the pixels on the scene don't necessarily converge at that same number of samples. Since some pixels converge faster (with lower sampling rates) while others require more pixels, adaptive sampling does exactly this, by figuring out what areas of the scene require more samples.
This algorithm runs on each pixel and uses confidence interval statistics in order to detect
whether the pixel has converged as we trace the ray samples through it. Although theory and numbers
behind it take a bit more reading and studying to fully understand the algorithm is simple to use.
It is implemented inside of the raytrace_pixel
function I implemented for Part 1.
In addition to what I was doing before in this method (reference back to Part 1), now I am also
keeping track of a sample_batch_count
and sample_count
variables.
These will keep track of how many samples I have already accumulated per batch, and how many
total samples I have accumulated, which I increment at each iteration over num_samples
.
I also keep track of two accumulators of each current_sample's illuminance. These are:
s1 += illuminance;
and s2 += illuminance * illuminance
. Inside my for loop
I have a check for if batch_count
equals the samplesPerBatch
(which gets
set from the commandline).
If batch_count
equals the samplesPerBatch
then I calculate a mean, a variance
and a convergence variable I
. These equations are as follows:



If the convergence variable I
is <= maxTolerance * mean
(and the maxTolerance
is set in the commandline) then I break out of my for loop because I have reached convergence in sample
numbers for this specific pixel. Isn't this amazing and magical?
Last I made sure to update my sampleCountBuffer[index] = sampleCount
just so that the colors
appear correctly in the outputted rate image. And I also return the average of my samples divided by the
actual sampleCount
counter I accumulated as oppose to the num_samples
it was
previously since now I have a change of exiting the loop early.
Here are my outputs for 64 samples per batch and 0.05 max tolerance:




One interesting thing to note about the images above is that since the 4096 samples/pixel one produces a ceiling that has more colors than the 2048 samples/pixel, we can conclude that it needs more than 2048 pixels per sample but less than 4000 samples in order to converge.