It uses simple and fast memory management routines;
it allocates all the needed memory at the starting time
and borrows/returns a chunk of memory from/to the memory reservoir
during the simulation run.
All the information on memory requested by the client function is stacked in the list.
We manage to alleviate the overload of memory movement/copy
that occurs when it tries to realloc and free the memory chunk that resides
in the middle of the memory stack in the reservoir.
It uses z-directional domain decompsitions;
we decided to use this DD scheme because of the fftw which wants it.
Also this kind of DD is very simple to implement.
It uses the FFTW of the parallel MPI version 2.
It adopts a dynamic DD with a variabel height
equalizing the number of particles in each domain
within sub % levels.
It uses Oct-sibling trees to correct short-range g-force;
the Oct-sibling tree is a good choice to enhance the speed of the tree walks.
A more advanced Oct-sibling tree and tree walking are adopted in the GOTPM-II and
the resulting speed is about three times faster than the original one.
It uses particle-structure exchanages (for GOTPM-II only);
The PM particle structure has 7 members (x,y,z,vx,vy,vz, indexing) and
Tree particle structure has 8 members (x,y,z,vx,vy,vz,indexing, pointer).
And the PM part needs additional memory similar in size to two members of each particle
for the dynamic PM mesh and static fftw mesh. If not using the particle indexing,
the GOTPM-II code needs 2.2 Terabytes for the 4096^3 particle simulation.
It does not use the force array (for GOTPM-II only)
in order to reduce the memory overload. Temporal variables to save the forces measured in the PM/Tree
are instantly applied to update the velocity of the particle and it is recycled
for another particle.
|