Blending with alpha-premultiplied colours solves a few annoying rendering problems, gives you some fancy rendering features for free and is very easy to use. More people should use it! So here's how to use it, how it works and the problems it solves.
There are a number of posts about this online already, but they tend to be overly technical and lacking in code examples. Hopefully my explanations will be clearer and more practical. I'll also provide all the code you need to get started - which isn't much.
Why you should care:
- Conventional alpha blending doesn't work for transparency compositing and can be affected by weird filtering artifacts.
- Alpha-premultiplied colour blending on the other hand is great: it's easy to implement, intuitive to use, optionally gives you free additive blending without the need for blend mode changes, provides a variable level of additivity and opacity at the same time, and is not affected by filtering problems.
Conventional Alpha Blending
In conventional alpha blending the alpha channel is considered the colour's opacity level. The following glBlendFunc setup is used:
Source: GL_SRC_ALPHA
Destination: GL_ONE_MINUS_SRC_ALPHA
Which performs this operation:
Src = [pixel we're blending into the buffer]
Dest = [pixel already in the buffer]
pixel = (src * src.a) + (dest * (1.0 - src.a));
This isn't quite opacity. It would be more accurate to say that the alpha value represents the proportion of the destination colour that will be replaced by the source colour. In other words it's a component-wise lerp between the destination and source colour parameterised by the source alpha.
And this works - to a point.
But what happens when we introduce translucent colours? Consider drawing 50% opacity pure red to a 100% opacity pure black RGBA buffer. If your brain works like mine then you expect the final colour to be fully opaque dark red. Here's what actually happens:
([1,0,0,0.5] * 0.5) + ([0,0,0,1] * (1-0.5));
== [0.5,0,0,0.25] + [0,0,0,0.5];
== [0.5,0,0,0.75];
Crap. The buffer is now translucent, not what we wanted at all. It gets worse: what happens when we draw more stuff onto this buffer? Let's blend in some more of that red:
([1,0,0,0.5 ] * 0.5) + ([0.5,0,0,0.75] * (1-0.5));
== [0.5, 0,0,0.25] + [0.25,0,0,0.375];
== [0.75,0,0,0.6 ];
Now we're really in trouble. The buffer is even more translucent than before. Every time we blend translucent objects into the buffer it gets worse! The end result: a goddamn mess.
This problem doesn't rear its ugly head until you begin to work with transparency compositing. Your typical default framebuffer doesn't have an alpha channel to make translucent. It effectively has 100% constant opacity so things more or less work correctly. But all hell breaks loose the moment you start drawing to an offscreen buffer with an alpha channel. Transparency compositing just doesn't work at all in this setup.
Another tangentially related problem is that some image programs set transparent pixels to all-zeroes on all channels. Some compressed texture formats require this as well. This causes ugly dark halos on partially transparent objects when they get linear filtered; the filter will lerp between transparent black pixels and the correct colour, making it get darker and more transparent at the same time. Not good.
There are ways around most of these problems. But we can cure these ills in one fell swoop by...
Blending With Alpha-Premultiplied Colour
Lerping between colours gave us nasty results. The solution to this problem is to change our concept of colour a little and use alpha premultiplied colours. Converting a 'normal' RGBA colour to an alpha premultiplied one is simple:
pixel.rgb *= pixel.a;
Note that this isn't something which is done when blending. It's a preprocessing step you perform on your textures long before rendering starts.
APM'd colours work a bit differently than normal ones:
- Each colour is defined by all four channels. Changing the alpha changes the colour, it's not independent anymore.
- There's now only one 0% opacity colour: zero on all channels.
- To modulate the opacity, all channels are multiplied by the desired opacity level.
- When the alpha is a lower value than a colour channel it adds colour in proportion with how much lower it is. When the alpha is zero it's equivalent to additive blending.
To blend these types of colours you need this glBlendFunc setup:
Source: GL_ONE
Destination: GL_ONE_MINUS_SRC_ALPHA
Which performs this blend op:
pixel = src + (dest * (1.0-src.a));
Okay, let's break this down. In this scheme the source is always purely additive and the destination contributes in proportion to the inverse of the source alpha. In plain English: the source A determines the proportion of light the source colour blocks out, while the source RGB determines how much light the source colour additively contributes.
Let's try drawing red to our black backbuffer again using this setup:
[0.5,0,0,0.5] + ([0,0,0,1] * (1-0.5));
== [0.5,0,0,0.5] + [0,0,0,0.5];
== [0.5,0,0,1];
Exactly what we wanted. Let's add more red:
[0.5, 0,0,0.5] + ([0.5,0,0,1] * (1-0.5));
== [0.5, 0,0,0.5] + [0.25,0,0,0.5];
== [0.75,0,0,1];
50% opacity red made the buffer 50% more red both times, without reducing the opacity level. Good.
You might be thinking that this all sounds a little complicated and you just want to think about colours in 'normal' terms. Don't worry, you can still do that. In practice you hardly have to care about these details at all. Alpha premultiply your textures and colours before blending, set the appropriate blending modes and you're done. You don't even have to interact with premultiplied colours manually in your program logic: you can use normal colours and APM them in the vertex shader, or abstract the operation away inside your colour class. No problem.
And of course you gain some worthwhile features for your efforts.
First and foremost, transparency compositing works exactly like you'd expect with zero effort whatsoever. You're finally free to composite your ass off. Composite composites. Go hog wild.
Filtering always works properly. Transparent is always all-zero RGBA in APM colours; lerping to transparent is equivalent to lerping to all zeroes. The original problem no longer exists.
Free additive blending. And it's not some half-assed side-effect: it's better than the usual additive blending mode. It's not merely on or off; when you lerp the APM colour's alpha toward zero you're lerping between normal and additive blending. No blend mode switching required! Even better, this can be controlled independently from the opacity level, which is determined by all four components.
Here's is the code for independently controlling the additivity and opacity levels of an alpha-premultiplied colour. These parameters are all ranged [0:1] just like the colour components.
col.a *= 1.0 - additivity;
col *= opacity;
Easy peasy. The order of application doesn't even matter because, of course, multiplication is commutative. And they work together: you can specify a colour which is partially opaque and partially additive and it behaves intuitively.
This enables a few things which are not normally possible, like making a single primitive luminescent additive pink on one side and 50% opaque black on the other while smoothly interpolatimg between the two. Sometimes this is very useful. A typical example is making glowing hot embers which first dim into opaque dark ashes and then fade away. With normal blending this is a bit of a pain: two draws and a blend mode change are required. With the APM colour blending scheme this can be accomplished much more easily because the colour, opacity and additivity can essentially be treated as independent properties. Here is an example:
colour orange; // APM orange
colour black; // APM opaque black
colour out; // APM output colour
float state; // Born at 0.0, dies at 1.0
float stateLow = boxStep( state, 0.0, 0.5 );
float stateHigh = boxStep( state, 0.5, 1.0 );
float blackness = stateLow;
float additivity = 1.0 - stateLow;
float opacity = 1.0 - stateHigh;
out = mix( orange, black, blackness );
out.a *= 1.0 - additivity;
out *= opacity;
BoxStep is a clamped range normalising function:
float boxStep( float x, float low, float high ) {
return clamp( (x-low)/(high-low), 0.0, 1.0 );
}
Here is the effect applied to a sprite, plotted on a circle with state going from 0.0 to 1.0 clockwise:
Things that are not obvious in the screenshot:
- Everything in the foreground was drawn onto a transparent multisampled framebuffer texture in the APM blending mode. Notice that multisampling works normally.
- The navy background colour is the default framebuffer's clear colour - alpha compositing is working correctly. There are no translucent holes on overlapping translucent areas (which there would be with lerp blending).
- I didn't change blend modes during the drawing process. The colour, additivity and opacity variations are purely a function of APM colour manipulation. In fact, I only set the blend mode once at the start of the program. Quite convenient.
Here's the same composite over a lighter background. You can see the black part more easily:
Converting Textures to Premultiplied Alpha Format
Here's a ready-made function. Summary: treating the bytes as normalised fixed-point numbers in the range [0:1], multiply them and then transform them back to bytes, rounding to the nearest representable value. This is a whole lot simpler than it looks. It's simply RGB *= A.
void alphaPremultiply( std::vector<uint8_t>& pixels )
{
typedef uint_fast16_t fixed;
typedef uint8_t byte;
auto mul = [](fixed a, fixed b) -> const byte {
const fixed f = a * (b+1u);
const fixed frac = f & 0x00FFu;
const fixed rounded = (f>>8u) + (0x007Fu < frac);
return byte( rounded );
};
for (auto i=pixels.begin(); i<pixels.end();) {
const fixed r = *(i );
const fixed g = *(i+1u);
const fixed b = *(i+2u);
const fixed a = *(i+3u);
*(i++) = mul( r, a );
*(i++) = mul( g, a );
*(i++) = mul( b, a );
i++;
}
}
Explaining how fixed-point arithmetic works is outside the scope of this post. Long stort short, it's faster than floating point and has a few interesting properties which will come in handy in a moment.
Disadvantages of Premultiplied Alpha
Blending with alpha premultiplied colours is nice but it's not perfect. It has at least one [curable] theoretical disadvantage: when you're working with 8-bit colour channels there will be loss of precision in the RGB channels when the A channel has a value close to zero. The loss could be quite severe; below alpha value 16 it will truncate all the RGB channels to zero.
I only noticed this because I was reading the APM output values directly. I can't see it at all in the actual image output (and I'm notoriously picky) presumably because there isn't much to see at such a low opacity level anyway. In all liklihood you can skip over this entire section, stop caring entirely, and you'll never see a pixel out of place. It simply doesn't matter that much. But if you're an intolerable stickler for image quality like myself then carry on. After all OpenGL isn't just for games and sometimes accurate colour matters.
I said it's curable. I'll discuss the most obvious cure first: higher precision colour channels.
OpenGL is fully capable of loading and operating on 16-bit-per-channel textures. The format for this is GL_RGBA16. But where do we get 16-bit colours from if we've only got 8-bit textures? Thanks to a nice quirk inherent to fixed-point arithmetic, producing precise 16-bit output from the premultiplication of 8-bit channels is a trivial task. 16-bit results are actually easier to compute than 8-bit ones - the initial result of the fixed-point multiply is already an unsigned 16-bit normalised fraction.
Completely untested code off the top of my head which implements this:
std::vector<uint16_t> alphaPremultiply16( std::vector<uint8_t>& pixels )
{
typedef uint_fast16_t fixed;
auto mul = [](fixed a, fixed b) -> const uint16_t {
const fixed f = a * (b+2u);
return uint16_t( f );
};
std::vector<uint16_t> outPixels( pixels.size() );
auto out = outPixels.begin();
for (auto i=pixels.begin(); i<pixels.end();) {
const fixed r = *(i++);
const fixed g = *(i++);
const fixed b = *(i++);
const fixed a = *(i++);
*(out++) = mul( r, a );
*(out++) = mul( g, a );
*(out++) = mul( b, a );
*(out++) = a << 8u;
}
}
This can't be done in-place so it returns a new vector instead. Remember, I never tested this so it might not even work. The principle is sound though. I will check it properly later on.
Another option is to forego preprocessing entirely and simply do the premultiply in your vertex and fragment shaders. This gives you minimal storage and very high precision but costs a little performance. It's not particularly expensive - three MULs aren't going to tank your framerate - but it has the significant downside of adding complexity where there would have been none before. You'd have to do alpha premultication every time you sampled a colour texture. Nah.
Another option is to stick with 8-bit channels and simulate higher precision via dithering instead of just rounding to the nearest value. Four-level ordered dithering simulates 10-bit precision and is simple to implement. That should be plenty considering the subtlety of the gradiations we're dealing with. I haven't tried it yet but I will add the code to this post later. It seems the best candidate out of all the possible solutions, since it has no performance or memory impact and should improve quality a great deal at low alpha values (it has quadruple the number of effective gradiations).
Summary
Let's go over how this is done once more.
Ensure your colour textures have premultiplied alpha. Preferably in a preprocessing step before runtime.
Enable blending and set the APM blending parameters:
glEnable( GL_BLEND );
glBlendFunc( GL_ONE, GL_ONE_MINUS_SOURCE_ALPHA );
Clear your RGBA framebuffer to transparent:
glClearColor( 0,0,0,0 );
glClear( COLOR_BUFFER_BIT );
That's it. Now draw things. If all you do is alpha blending and additive blending, you will never need change the blend mode again. Drawing the composited result to the default framebuffer doesn't require any mode changes. You just draw it with the APM blend mode like usual.
That about wraps it up. Go forth and multiply!