Consolidating OpenACC device-host memory transfers#1315
Open
abishekg7 wants to merge 8 commits intoMPAS-Dev:developfrom
Open
Consolidating OpenACC device-host memory transfers#1315abishekg7 wants to merge 8 commits intoMPAS-Dev:developfrom
abishekg7 wants to merge 8 commits intoMPAS-Dev:developfrom
Conversation
ac98504 to
4845ce2
Compare
e8c9c64 to
e4c2509
Compare
Collaborator
Author
|
@mgduda I think it might be ready for a second look. I did try to move the |
31a1ccd to
4b7137d
Compare
5b23998 to
35cc144
Compare
35cc144 to
738138a
Compare
mgduda
requested changes
Mar 4, 2026
mgduda
requested changes
Mar 6, 2026
mgduda
requested changes
Mar 6, 2026
mgduda
requested changes
Mar 6, 2026
Contributor
|
@abishekg7 This PR looks in good shape to me, and thanks for addressing all of the comments! I'll run a couple of final tests, but in the meanwhile, please feel free to rework the commit history. |
…nfers This commit introduces a set of routines to mpas_atm_time_integration in order to begin consolidating OpenACC data transfers between host and device during the course of the dynamical core execution. As the atm_compute_solve_diagnostics subroutine also being called once before the time integration loop, we also introduce a separate pair of subroutines to handle data movements around the first call to atm_compute_solve_diagnostics. The mesh/time-invariant fields are still copied onto the device in the call to mpas_atm_dynamics_init and removed from the device during the call to mpas_atm_dynamics_finalize, with the exception of certain fields moved in mpas_atm_pre/post_compute_solve_diagnostics. This is a special case due to atm_compute_solve_diagnostics being called for the first time before the call to mpas_atm_dynamics_init.
This commit introduces a set of routines to mpas_atm_iau, building on the previous commit, to begin consolidating OpenACC data transfers between host and device during the course of the dynamical core execution. As the IAU code is currently executed on CPUs, it is necessary to synchronize the fields needed for this computation with the host before the call to atm_add_tend_anal_incr and sync back to the device after this call.
…a tranfers This commit introduces a set of routines to mpas_atmphys_interface, building on the last two commits, to begin consolidating OpenACC data transfers between host and device during the course of the dynamical core execution. As the microphysics is currently executed on CPUs, it is necessary to synchronize the fields needed for this computation with the host before the call to microphysics from the dycore and sync back to the device after this call.
…ta tranfers This commit introduces a set of routines to mpas_atmphys_todynamics, building on the last several commits, to begin consolidating OpenACC data transfers between host and device during the course of the dynamical core execution. As the computation of the physics tendencies is currently executed on CPUs, it is necessary to synchronize the fields needed for this computation with the host before the call to physics_get_tend and sync back to the device after this call
…anfers This commit introduces a set of routines to mpas_vector_reconstruction, on top of the last several commits, to begin consolidating OpenACC data transfers between host and device during the course of the dynamical core execution. The call to mpas_reconstruct_2d is currently executed on device (GPU), and there is no need for ACC data transfers around this call within the time integration loop. However, mpas_reconstruct_2d is also invoked once before the start of the time integration loop and it becomes necessary to synchronize the fields needed for mpas_reconstruct_2d with the device before this call and sync back to the host following this call.
This commit introduces changes to the MPAS Atmosphere core to consolidate OpenACC host and device data transfers during the course of the dynamical core execution. This commit adds calls to OpenACC device-host memory transfer subroutines, introduced in previous commits, in order to eliminate extraneous data transfers in the dynamical core. Much of the previously distributed data movement statements have been consolidated in two subroutines, mpas_atm_pre_dynamics and mpas_atm_post_dynamics These pair of subroutines are called once per timestep in the atmosphere core, right before and after the call to atm_srk3. The mesh/time-invariant fields are still copied onto the device in mpas_atm_ dynamics_init and removed from the device in mpas_atm_dynamics_finalize, with the exception of select fields transferred in the subroutines mpas_atm_pre_compute_solve_diagnostics and mpas_atm_post_compute_solve_diagnostics This is a special case due to atm_compute_solve_diagnostics being called for the first time before the call to mpas_atm_dynamics_init. This commit also invokes host-device data transfer routines in the mpas_atm_iau, mpas_atmphys_interface and mpas_atmphys_todynamics modules to ensure that the code regions performing computations related to IAU, microphysics and physics tendencies, all of which are currently executed on CPUs, are using the most field values from dynamical core running on GPUs, and vice versa. In addition, this commit also includes explicit data transfers around halo exchanges in the atm_srk3 subroutine.
This commit introduces changes to previously existing timers, and adds new timers in order to measure the time taken for OpenACC host-device memory transfers in various code regions after the memory movement consolidation introduced the previous commit.
e05d652 to
4d2592b
Compare
Contributor
|
With an LES test case, I'm getting the following error when attempting to run on GPUs: |
…nt errors This commit removes the slicing when copying in and deleting certain variables such as: uReconstructMeridional, uReconstructZonal, uReconstructY, etc. The presence of the slicing seems to induce partially present errors when running with multiple GPUs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces changes to the MPAS Atmosphere core to consolidate OpenACC host and device data transfers during the course of the dynamical core execution. This commit adds calls to OpenACC device-host memory transfer
subroutines, introduced in previous commits, in order to eliminate extraneous
data transfers in the dynamical core.
Much of the previously distributed data movement statements in
mpas_atm_time_integrationhave been consolidated in two subroutines,mpas_atm_pre_dynamicsandmpas_atm_post_dynamicsThese pair of subroutines are called once per time step in the atmosphere core, right before and after the call toatm_srk3. Any fields copied onto the device in these subroutines are removed from explicit data movement statements in the dynamical core.The mesh/time-invariant fields are still copied onto the device in
mpas_atm_dynamics_initand removed from the device inmpas_atm_dynamics_finalize, with the exception of select fields transferred in the subroutinesmpas_atm_pre_compute_solve_diagnosticsandmpas_atm_post_compute_solve_diagnostics. This is a special case due toatm_compute_solve_diagnosticsbeing called for the first time before the call tompas_atm_dynamics_init.This PR also invokes host-device data transfer routines in the
mpas_atm_iau,mpas_atmphys_interfaceandmpas_atmphys_todynamicsmodules to ensure that the code regions performing computations related to IAU, microphysics and physics tendencies, all of which are currently executed on CPUs, are using the most field values from dynamical core running on GPUs, and vice versa.In addition, this commit also includes explicit data transfers around halo exchanges in the
atm_srk3subroutine.These subroutines for data routines, and the
acc updatestatements are an interim solution until we have a book-keeping method in place.This PR also introduces a couple of new timers to keep track of the cost of data transfers.