It's primarily about optimizing but because of the way it works you must design for it. If it serves the design, I'll use it. For example, my graphics library supports arbitrary binary footprints and representations for pixels. The only way to compute the shifts to get at the individual channels efficiently is to do so during compile time. One of the places I hated using it, but it was really the only way, is building execution chains, similar to a SQL execution plan except at compile time, to spit graphics at a display or another draw destination in the way that most efficient for it. At runtime, even computing that would have not been worth any savings you'd pick up downstream.
There's smoke in my iris But I painted a sunny day on the insides of my eyelids So I'm ready now (What you ready for?) I'm ready for life in this city And my wings have grown almost enough to lift me