Compiling
Programming environment
In order to get Nvtx working, the nvhpc/21.9 module must be available and
loaded. In the mkmf.template files, we have been using an additional
variable, ACCFLAGS to set options for the nvfortran compiler.
Compiler flags
To run the DART get_close kernel, these are the additional compiler flags:
ACCFLAGS = -acc -ta=tesla:cc70,deepcopy,pinned -Minfo=accel -Mnofma -r8
deepcopyis of particular concern for us. DART has a lot of nested derived types:type%type%type. The compiler was not reliably able to determine that the nested types needed copying to the GPU. The deepcopy flag forces this, but ideally you would not force a deep copy on everything. Improvements to the compiler would be needed to fix this. There is a workaround for forcing the correct copy in the code, which is adding a loop around the openACC directives. However, this is not good for code readability as it looks like a pointless loop.Mnofmawas to force less optimization while debugging.r8was to force double precision type conversions. This was a sanity check while debugging memory problems. It was not needed in the end.Minfo=accelprints out at compile time what the compiler was able to parallelize. It is similar to the old intel-vec-reportflag.cc70is the compute capability, so this depends on the graphics card. Ascent (Oak Ridge’s machine) and Casper are V100 gpus so you usecc70. Perlmutter is A100 (same as Derecho) so you usecc80. This is not intuitive at all for users.
General performance results
ACCFLAGS = -acc -ta=tesla:cc80,deepcopy (3x)
ACCFLAGS = -acc -ta=tesla:cc80,deepcopy,pinned (15x)