CIS 451 |
Vector Processing |
Fall 2019 |
The objective of this lab assignment is to allow students to examine source code for vector processing and observe the performance benefits.
Consider the (obviously broken) recursive function below:
float chaos(x, init) { return chaos(x*x + init); }
From a programming perspective, this function is broken because it has no base case and will never terminate. However,
from a mathematical perspective, some of the inputs have well-defined outputs. If we were to run this broken code, we
might find that the values stop changing much at some point. For example, chaos(0, 0)
will limit to 0,
but chaos(-1.0, 0)
just keeps growing.
It turns out that for inputs 0.25 ≤ c < 2
, this function has a limit. If we plot this function it
looks like this:
Notice that the plot is reasonably smooth for 0.25 ≤ c < 1.5
; but, when c
reaches
1.5, the values start jumping all over the place. In this interval, we say that this function is chaotic.
(Yes, there really is a mathematical definition of chaos.)
If we mapped outputs to colors, we could make an interesting drawing; however, since the function is one-dimensional, it would just be stripes. So, lets change the function so that it operates on a complex numbers. That way, we can define a two-dimensional image.
This is the new function:
float mandelbrot(z, init) { return mandelbrot(z*z + init); }
This time, instead of plotting the limit, we are going to keep track of how fast it grows --- how many
iterations we can go before norm(z) > 2.0
.
The result is this familiar picture:
Brighter colors represent slower divergence. The bright green represents points that diverge very slowly: It
takes many iterations for the point to reach a distance of 2.0 from the origin. Dark green colors diverge very
quickly: They reach a distance of 2.0 from the origin within a few iterations. Black points are those that do
not diverge at all. They are still withing 2.0 of the origin after the maximum number of iterations.
(See math.hws.edu/eck/js/mandelbrot/MB-info.html
for more information.)
Download mandelbrotComplex.cpp
and figure out how it works. (This is the simplest
implementation. It uses a class to represent complex numbers. That class overloads the arithmetic operators.)
dx
and dy
represent?*image++ = count;
do? Explain how this line works.
(Be certain to explain all the parts of *image++
.)mandelbrotComplex
with the -O0
flag. Then run the program and report how long it
takes. (Note: The flag is a capital O
then a zero. The -O
flag is for
Optimization.
Contrast this with the lower-case o
flag, which is for output.) g++ -O0 mandelbrotComplex.cpp
time ./a.out
mandelbrotComplex
with the -O3
, run it, and report how long it takes.
Now, download mandelbrot.c
and figure out how the function madelbrot_c_regular
works. The benefit of this C version is
that it results in much simpler assembly code; however, we must (1) store the real and imaginary parts of the complex
numbers separately, and (2) write our own code to implement complex operations.
tzr
used for? Why is it needed?mandelbrot.c
with the -O0
flag. Then run the program and report how long
it takes.gcc -mfma -O0 mandelbrot.c
time ./a.out
mandelbrot.c
with the -O3
, run it, and report how long it takes.
Intel provides a set of instructions that can process several elements in an array at once. The machines in our labs can process up to 8 floating point values at once.
Look at mandelbrot_c_avx
. Notice that the inner for
loop (the one that uses
col
as the loop index) increments by 8. This is because each iteration of this loop processes 8
points at once. The functions beginning with __mm256_
call Intel's special vector instructions.
Thoroughly examine this code until you understand how it works.
_mm256_broadcast_ss
function do?incr
array?_mm256_xor_ps(dx_ymm0, dx_ymm0)
? test = _mm256_movemask_ps(norm_z_ymm15) &
255U;
Is it necessary to the correct operation of the algorithm? If so, what would go wrong if it were
missing. If not, why is it optional; and, what is the benefit of keeping it?
main
to call mandelbrot_c_avx
, then compile the program, run it, and report
how long it takes.-O3
, run it, and report how long it takes.
Compile mandelbrot.c
down to assembly:
gcc -mfma -S mandelbrot.c
.
Find the assembly code for mandelbrot_c_avx
(look for the label).
mandelbrot_c_regular
.
If you are curious, you can comment out mandelbrot_c_avx
then run
gcc -mno-avx -S mandelbrot.c
to see what Intel's older style of floating
point assembly looked like.
Updated Monday, 18 November 2019, 8:41 PM