x64 does have SSE2 available, but the programmer has to enable the extended instruction set in the compiler. I think that there are few enough machines prior to Pentium 4 and Athlon 64 out there that lack of SSE2 won't be missed. I vaguely recall that using the Intel compiler would let me generate an executable that would work on any x86 feature set, but it's been a while and I have a foggy notion that some of the Wilbur code was doing things that caused that particular compiler to generate the wrong output.

The best overall option is to use OpenCL, but that's a lot of code and Intel's CPU-only and CPU/GPU implementations won't be out until next year at the earliest according to some discussions at SIGGRAPH this year. Plus, it's a lot of work to convert the code base and some of the algorithms (the precipiton erosion algorithm, for example) aren't particularly amenable to parallelization. I have lots of things to do already in terms of fixing code and adding certain useful features without working on performance.