Tuesday, 16 July 2013

Random Floats in GLSL 330

I had some fun putting together a random number generator in GLSL.

I wanted a set functions with the following specs:
  • Outputs a float in the half-open range 0:1  (a fraction)
  • Outputs all variations in the range given sufficient input variation
  • Multidimensional variation: accepts between 1 and 4 arbitrary float inputs
  • Uniform distribution
  • No apparent patterns
  • Self-contained: no texture lookup
  • Portable: identical results on all GPUs
  • Low overhead
Fairly stringent requirements.  I think I managed to pull it off though.

The key to making a decent set of functions was realising that in a shader - which has no state - a random function is actually a kind of hash function.  It returns a uniformly distributed output value which varies wildly in value given small changes in input.

But we're dealing with floats.   Floats which according to the GLSL spec explicitly lack precision guarantees.  Writing a hashing function in terms of floats isn't going to work very well and isn't likely to be portable either.  Also, writing a good hashing function is a difficult task in itself.  I'd rather use an existing integer hash with well defined properties.

That's what we're going to do.  So that we only have to hash integers, we're going to move the bits of the float directly into an unsigned integer, hash it, and move the bits back into a float - accompanied by a little bit-twiddling of course.  For this purpose GLSL 330 has floatBitsToUint and uintBitsToFloat.  This way the PRNG will be capable of dealing with any input value without range or precision being a concern.

The GLSL spec guarantees that float is in IEEE-754 binary32 format and also that uint is a 32-bit unsigned integer.  Which means we can do bit-twiddling tricks without worrying about compatibility issues.

Here's the algorithm I came up with:
  1. Hash all inputs together into a uint
  2. AND away everything but the low 23 bits
  3. Construct a valid float where the mantissa bits are all zero (value 1.0)
  4. OR the constructed float with the 23-bit uint to get range [1:2]
  5. With float arithmetic, subtract 1.0 to get desired [0:1] range
This probably sounds quite confusing if you're not intimately familiar with the binary32 format.  I'll explain it more thoroughly after I post the actual code.

Here's our integer hash function, a single iteration of Bob Jenkins' OAT algorithm.  This is what we're going to mix up our bits with.
uint hash( uint x ) {
    x += ( x << 10u );
    x ^= ( x >>  6u );
    x += ( x <<  3u );
    x ^= ( x >> 11u );
    x += ( x << 15u );
    return x;
}
Now for the bit-twiddling random() function.  Let's keep it simple and stick with one input for now:
float random( float f ) {
    const uint mantissaMask = 0x007FFFFFu;
    const uint one          = 0x3F800000u;
   
    uint h = hash( floatBitsToUint( f ) );
    h &= mantissaMask;
    h |= one;
    
    float  r2 = uintBitsToFloat( h );
    return r2 - 1.0;
}
What the hell is going on here?  To explain this I have to first explain some aspects of the floating point format:
The mantissa of a binary32 float is a 23-bit binary fraction in the half-open range [0:1].  There's an implicit leading 1 in the mantissa so the effective value of the mantissa is always 1.0 + fraction.  The absolute value of a float can be calculated as:
(1.0 + fraction) * power(2,exponent)
This means that the float value 1.0 has both an exponent and mantissa of zero.  Zero mantissa means 23 zero bits.  As the value increases from 1.0 to 2.0 only those mantissa bits change - they count upwards the same as a 23-bit unsigned integer.  By exploiting this property we can easily construct a float in that range using only bitwise operations.  Stick those bits back inside a float, subtract 1.0, and you've got a float in the desired range of [0:1].

To get multidimensional variants of this function we simply combine more inputs into the hash function.



Let's try our one-dimensional variant first:
float r = random( gl_FragCoord.x );
fragment = vec4( r,r,r, 1.0 );
This should vary the luminance randomly over the X axis.  Result:


Already we can see some good indications.  The average luminance is about 50% which means the distribution of the output is roughly uniform and covers the full range.

But it's not possible to tell whether there are nasty patterns using only one-dimensional input.  Let's make some multi-dimensional variants of the hash function:
uint hash( uvec2 v ) {
    return hash( v.x ^ hash(v.y) );
}

uint hash( uvec3 v ) {
    return hash( v.x ^ hash(v.y) ^ hash(v.z) );
}

uint hash( uvec4 v ) {
    return hash( v.x ^ hash(v.y) ^ hash(v.z) ^ hash(v.w) );
}
The way I combined the inputs is totally arbitrary and rather lazy.  I just experimented with them until I found something that worked on my test inputs.  There's probably a better and faster way of doing this, like unrolling the OAT algorithm separately for each one.

Making versions of random() which use these functions is very simple so I'll spare you the wall of text.



Anyway, let's try some 2D randomness:
float r = random( gl_FragCoord.xy );
fragment = vec4( r,r,r, 1.0 );

Great - not a pattern to be seen.  Bad RNGs create visual artifacts which spoil the effect, typically diagonal lines.  Something that looks exactly like static is the best possible outcome.



Our noise is still stationary though.  If you want the value to differ each frame then you need another input: time.  You can pass this into the shader as a uniform.  With both spatial and temporal inputs it now looks like this:
float r = random( vec3(gl_FragCoord.xy, time) );
fragment = vec4( r,r,r, 1.0 );

Wonderful.  (It looks much better in practice at 60hz.  There's only so much I can do with a GIF)



It would be interesting to see how it holds up compared to other techniques.  I've never seen anyone generate 'random' floats this way before.  Knowing computer science somebody probably invented this 50 years ago though.



What about performance?  It seems pretty good but I'm no expert on measuring GPU perf.  My Geforce 660 Ti manages ~3400 FPS at 1920x1200x32 but this is probably a useless figure.  GPUs can execute multiple instruction streams in parallel, interleave them to hide latency, have very deep pipelines, and are often limited by memory bandwidth rather than computational power.  You'd have to compare it with other random number techniques in an actual application to get useful answers.




Monday, 15 July 2013

Why Alpha Premultiplied Colour Blending Rocks

Blending with alpha-premultiplied colours solves a few annoying rendering problems, gives you some fancy rendering features for free and is very easy to use.  More people should use it!  So here's how to use it, how it works and the problems it solves.

There are a number of posts about this online already, but they tend to be overly technical and lacking in code examples.  Hopefully my explanations will be clearer and more practical.  I'll also provide all the code you need to get started - which isn't much.

Why you should care:
  • Conventional alpha blending doesn't work for transparency compositing and can be affected by weird filtering artifacts.
  • Alpha-premultiplied colour blending on the other hand is great: it's easy to implement, intuitive to use, optionally gives you free additive blending without the need for blend mode changes, provides a variable level of additivity and opacity at the same time, and is not affected by filtering problems.

 

 

Conventional Alpha Blending

 

In conventional alpha blending the alpha channel is considered the colour's opacity level.  The following glBlendFunc setup is used:
Source:      GL_SRC_ALPHA
Destination: GL_ONE_MINUS_SRC_ALPHA
Which performs this operation:
Src   = [pixel we're blending into the buffer]
Dest  = [pixel already in the buffer]
pixel = (src * src.a) + (dest * (1.0 - src.a));
This isn't quite opacity.  It would be more accurate to say that the alpha value represents the proportion of the destination colour that will be replaced by the source colour.  In other words it's a component-wise lerp between the destination and source colour parameterised by the source alpha.

And this works - to a point.

But what happens when we introduce translucent colours?  Consider drawing 50% opacity pure red to a 100% opacity pure black RGBA buffer.  If your brain works like mine then you expect the final colour to be fully opaque dark red.  Here's what actually happens:
([1,0,0,0.5] * 0.5)  +  ([0,0,0,1] * (1-0.5));
== [0.5,0,0,0.25]  +  [0,0,0,0.5];
== [0.5,0,0,0.75];
Crap.  The buffer is now translucent, not what we wanted at all.  It gets worse: what happens when we draw more stuff onto this buffer?  Let's blend in some more of that red:
([1,0,0,0.5 ] * 0.5)  +  ([0.5,0,0,0.75] * (1-0.5));
== [0.5, 0,0,0.25]  +  [0.25,0,0,0.375];
== [0.75,0,0,0.6 ];
Now we're really in trouble.  The buffer is even more translucent than before.   Every time we blend translucent objects into the buffer it gets worse!  The end result: a goddamn mess.

This problem doesn't rear its ugly head until you begin to work with transparency compositing.  Your typical default framebuffer doesn't have an alpha channel to make translucent.  It effectively has 100% constant opacity so things more or less work correctly.  But all hell breaks loose the moment you start drawing to an offscreen buffer with an alpha channel.  Transparency compositing just doesn't work at all in this setup.

Another tangentially related problem is that some image programs set transparent pixels to all-zeroes on all channels.  Some compressed texture formats require this as well.  This causes ugly dark halos on partially transparent objects when they get linear filtered; the filter will lerp between transparent black pixels and the correct colour, making it get darker and more transparent at the same time.  Not good.

There are ways around most of these problems.  But we can cure these ills in one fell swoop by...



Blending With Alpha-Premultiplied Colour


Lerping between colours gave us nasty results.  The solution to this problem is to change our concept of colour a little and use alpha premultiplied colours.  Converting a 'normal' RGBA colour to an alpha premultiplied one is simple:
pixel.rgb *= pixel.a;
Note that this isn't something which is done when blending.  It's a preprocessing step you perform on your textures long before rendering starts.

APM'd colours work a bit differently than normal ones:
  • Each colour is defined by all four channels.  Changing the alpha changes the colour, it's not independent anymore.
  • There's now only one 0% opacity colour: zero on all channels.
  • To modulate the opacity, all channels are multiplied by the desired opacity level.
  • When the alpha is a lower value than a colour channel it adds colour in proportion with how much lower it is.  When the alpha is zero it's equivalent to additive blending.

To blend these types of colours you need this glBlendFunc setup:
Source:      GL_ONE
Destination: GL_ONE_MINUS_SRC_ALPHA
Which performs this blend op:
pixel = src + (dest * (1.0-src.a));
Okay, let's break this down.  In this scheme the source is always purely additive and the destination contributes in proportion to the inverse of the source alpha.  In plain English: the source A determines the proportion of light the source colour blocks out, while the source RGB determines how much light the source colour additively contributes.

Let's try drawing red to our black backbuffer again using this setup:
[0.5,0,0,0.5]  +  ([0,0,0,1] * (1-0.5));
== [0.5,0,0,0.5]  +  [0,0,0,0.5];
== [0.5,0,0,1];
Exactly what we wanted.  Let's add more red:
[0.5, 0,0,0.5]  +  ([0.5,0,0,1] * (1-0.5));
== [0.5, 0,0,0.5]  +  [0.25,0,0,0.5];
== [0.75,0,0,1];
50% opacity red made the buffer 50% more red both times, without reducing the opacity level.  Good.

You might be thinking that this all sounds a little complicated and you just want to think about colours in 'normal' terms.  Don't worry, you can still do that.  In practice you hardly have to care about these details at all.  Alpha premultiply your textures and colours before blending, set the appropriate blending modes and you're done.  You don't even have to interact with premultiplied colours manually in your program logic: you can use normal colours and APM them in the vertex shader, or abstract the operation away inside your colour class.  No problem.

And of course you gain some worthwhile features for your efforts.

First and foremost, transparency compositing works exactly like you'd expect with zero effort whatsoever.  You're finally free to composite your ass off.  Composite composites.  Go hog wild.

Filtering always works properly.  Transparent is always all-zero RGBA in APM colours; lerping to transparent is equivalent to lerping to all zeroes.  The original problem no longer exists.

Free additive blending.  And it's not some half-assed side-effect: it's better than the usual additive blending mode.  It's not merely on or off; when you lerp the APM colour's alpha toward zero you're lerping between normal and additive blending.  No blend mode switching required!  Even better, this can be controlled independently from the opacity level, which is determined by all four components.

Here's is the code for independently controlling the additivity and opacity levels of an alpha-premultiplied colour.  These parameters are all ranged [0:1] just like the colour components.
col.a *= 1.0 - additivity;
col   *= opacity;
Easy peasy.  The order of application doesn't even matter because, of course, multiplication is commutative.  And they work together: you can specify a colour which is partially opaque and partially additive and it behaves intuitively.

This enables a few things which are not normally possible, like making a single primitive luminescent additive pink on one side and 50% opaque black on the other while smoothly interpolatimg between the two.  Sometimes this is very useful.  A typical example is making glowing hot embers which first dim into opaque dark ashes and then fade away.  With normal blending this is a bit of a pain: two draws and a blend mode change are required.  With the APM colour blending scheme this can be accomplished much more easily because the colour, opacity and additivity can essentially be treated as independent properties.  Here is an example:
colour orange; // APM orange
colour black;  // APM opaque black
colour out;    // APM output colour
float state;   // Born at 0.0, dies at 1.0


float stateLow   = boxStep( state, 0.0, 0.5 );
float stateHigh  = boxStep( state, 0.5, 1.0 );
float blackness  = stateLow;
float additivity = 1.0 - stateLow;
float opacity    = 1.0 - stateHigh;

out    = mix( orange, black, blackness );
out.a *= 1.0 - additivity;
out   *= opacity;
 BoxStep is a clamped range normalising function:
float boxStep( float x, float low, float high ) {
    return clamp( (x-low)/(high-low), 0.0, 1.0 );
}


Here is the effect applied to a sprite, plotted on a circle with state going from 0.0 to 1.0 clockwise:


Things that are not obvious in the screenshot:
  • Everything in the foreground was drawn onto a transparent multisampled framebuffer texture in the APM blending mode.  Notice that multisampling works normally.
  • The navy background colour is the default framebuffer's clear colour - alpha compositing is working correctly.  There are no translucent holes on overlapping translucent areas (which there would be with lerp blending).
  • I didn't change blend modes during the drawing process.  The colour, additivity and opacity variations are purely a function of APM colour manipulation.  In fact, I only set the blend mode once at the start of the program.  Quite convenient.
Here's the same composite over a lighter background.  You can see the black part more easily:






Converting Textures to Premultiplied Alpha Format
Here's a ready-made function.  Summary: treating the bytes as normalised fixed-point numbers in the range [0:1], multiply them and then transform them back to bytes, rounding to the nearest representable value.  This is a whole lot simpler than it looks.  It's simply RGB *= A.
void alphaPremultiply( std::vector<uint8_t>& pixels )
{
    typedef uint_fast16_t fixed;
    typedef uint8_t       byte;
    
    auto mul = [](fixed a, fixed b) -> const byte {
        const fixed f       = a * (b+1u);
        const fixed frac    = f & 0x00FFu;
        const fixed rounded = (f>>8u) + (0x007Fu < frac);
        return byte( rounded );
    };
    
    for (auto i=pixels.begin(); i<pixels.end();) {
        const fixed r = *(i   );
        const fixed g = *(i+1u);
        const fixed b = *(i+2u);
        const fixed a = *(i+3u);
        
        *(i++) = mul( r, a );
        *(i++) = mul( g, a );
        *(i++) = mul( b, a );
          i++;
    }
}

Explaining how fixed-point arithmetic works is outside the scope of this post.  Long stort short, it's faster than floating point and has a few interesting properties which will come in handy in a moment.


Disadvantages of Premultiplied Alpha


Blending with alpha premultiplied colours is nice but it's not perfect.   It has at least one [curable] theoretical disadvantage: when you're working with 8-bit colour channels there will be loss of precision in the RGB channels when the A channel has a value close to zero.  The loss could be quite severe; below alpha value 16 it will truncate all the RGB channels to zero.

I only noticed this because I was reading the APM output values directly.  I can't see it at all in the actual image output (and I'm notoriously picky) presumably because there isn't much to see at such a low opacity level anyway.  In all liklihood you can skip over this entire section, stop caring entirely, and you'll never see a pixel out of place.  It simply doesn't matter that much.  But if you're an intolerable stickler for image quality like myself then carry on.  After all OpenGL isn't just for games and sometimes accurate colour matters.

I said it's curable.  I'll discuss the most obvious cure first: higher precision colour channels.

OpenGL is fully capable of loading and operating on 16-bit-per-channel textures.  The format for this is GL_RGBA16.  But where do we get 16-bit colours from if we've only got 8-bit textures?  Thanks to a nice quirk inherent to fixed-point arithmetic, producing precise 16-bit output from the premultiplication of 8-bit channels is a trivial task.  16-bit results are actually easier to compute than 8-bit ones - the initial result of the fixed-point multiply is already an unsigned 16-bit normalised fraction.

Completely untested code off the top of my head which implements this:
std::vector<uint16_t> alphaPremultiply16( std::vector<uint8_t>& pixels )
{
    typedef uint_fast16_t fixed;
    
    auto mul = [](fixed a, fixed b) -> const uint16_t {
        const fixed f = a * (b+2u);
        return uint16_t( f );
    };
    
    std::vector<uint16_t> outPixels( pixels.size() );
    auto out = outPixels.begin();
    
    for (auto i=pixels.begin(); i<pixels.end();) {
        const fixed r = *(i++);
        const fixed g = *(i++);
        const fixed b = *(i++);
        const fixed a = *(i++);
        
        *(out++) = mul( r, a );
        *(out++) = mul( g, a );
        *(out++) = mul( b, a );
        *(out++) = a << 8u;
    }
}

This can't be done in-place so it returns a new vector instead.  Remember, I never tested this so it might not even work.  The principle is sound though.  I will check it properly later on.

Another option is to forego preprocessing entirely and simply do the premultiply in your vertex and fragment shaders.  This gives you minimal storage and very high precision but costs a little performance.  It's not particularly expensive - three MULs aren't going to tank your framerate - but it has the significant downside of adding complexity where there would have been none before.  You'd have to do alpha premultication every time you sampled a colour texture.  Nah.

Another option is to stick with 8-bit channels and simulate higher precision via dithering instead of just rounding to the nearest value.  Four-level ordered dithering simulates 10-bit precision and is simple to implement.  That should be plenty considering the subtlety of the gradiations we're dealing with.  I haven't tried it yet but I will add the code to this post later.  It seems the best candidate out of all the possible solutions, since it has no performance or memory impact and should improve quality a great deal at low alpha values (it has quadruple the number of effective gradiations).


Summary

Let's go over how this is done once more.

Ensure your colour textures have premultiplied alpha.  Preferably in a preprocessing step before runtime.

Enable blending and set the APM blending parameters:
glEnable( GL_BLEND );
glBlendFunc( GL_ONE, GL_ONE_MINUS_SOURCE_ALPHA );
Clear your RGBA framebuffer to transparent:
glClearColor( 0,0,0,0 );
glClear( COLOR_BUFFER_BIT );
That's it.  Now draw things.  If all you do is alpha blending and additive blending, you will never need change the blend mode again.  Drawing the composited result to the default framebuffer doesn't require any mode changes.  You just draw it with the APM blend mode like usual.


That about wraps it up.  Go forth and multiply!