Minimal effort compiling

In general, one gets good performance out of the Sun compilers by just providing 3 options on the compile and link lines:

-fast  -xarch=sparcvis2 -m32 (32-bit addressing) 
-fast  -xarch=sparcvis2 -m64 (64-bit addressing) 

The -fast flag is a macro and it expands to (Studio 11):

-xO5 -xcache=<machine_dependent> -xchip= <machine_dependent> -xdepend=yes
-xpad=local -xvector=lib -xprefetch=auto,explicit -dalign -fsimple=2 -fns=yes
-ftrap=common -xlibmil -xlibmopt -fround=nearest
 
-fns -fsimple=2 -fsingle -xalias_level=basic -xbuiltin=%all 
-xcache=<machine_dependent> -xchip=<machine_dependent> -xdepend -xlibmil
-xlibmopt -xmemalign=8s -xO5 -xprefetch=auto,explicit

The above works best if you compile and run on the same machine. If you know the target processor of the machine you want to run the program on, you can specify it with the -xtarget option, e.g. -xtarget=ultra3 or -xtarget=ultra4plus. This will adjust the machine dependent options above, e.g. information about the cache layout.

For maximum benefit, you should also link with -fast. The Sun compilers all follow the "rightmost-flag-win" rule, which means that if you want to compile with all the options in the fast macro and lower the optimization level you should compile with -fast -xO4.

A quick reference is obtained using the flag -flags. For a more detailed description of the different flags and their effect see man cc or man f90.

Useful options

Show, but do not compile

-dryrun
-### 

Allow loop interchange and loop optimizations

-xdepend 

True 64-bit load/store and alignment

-dalign 

Explicit function in-lining (of my_func)

-xinline=my_func 

Interprocedural optimizations (across source files)

-xipo=1 

Math libraries

The Sun Studio compilers supply optimized versions and in-lined versions of the libm library:

In-lined libm

-xlibmil 

Optimized libm

-xlibmopt 

Sun also supplies:

using the Sun Performance library. To use the library compile with -dalign -xlic_lib=sunperf and make sure to add a -xarch option to your link line !

Fortran 90 users should also include the module sunperf (USE SUNPERF).

The library automatically switches to a parallel version if the compiling program is shared memory parallelized. You can control the number of threads with the OMP_NUM_THREADS environment variable, i.e. OMP_NUM_THREADS=4.


For more information consult the Sun Performance Library User's Guide (pdf) or the Sun Performance Library Readme file.