fixes an issue with macro directives for !$acc kernels #926
fixes an issue with macro directives for !$acc kernels #926sbryngelson merged 1 commit intoMFlowCode:masterfrom
!$acc kernels #926Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
There was a problem hiding this comment.
Pull Request Overview
This PR replaces the custom GPU_PARALLEL() macro with raw OpenACC kernels directives as a stopgap for NVHPC compatibility.
- Substituted
#:call GPU_PARALLEL()/#:endcall GPU_PARALLELwith!$acc kernels/!$acc end kernels - Applied the change in both the time-stepping and data-output modules
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/simulation/m_time_steppers.fpp | Replaced GPU_PARALLEL macro around minval(max_dt) with kernels |
| src/simulation/m_data_output.fpp | Replaced GPU_PARALLEL macro around maxval/minval calls |
Comments suppressed due to low confidence (3)
src/simulation/m_time_steppers.fpp:996
- Wrapping a scalar
minvalcall in akernelsregion may incur unnecessary kernel launch overhead and may not generate a reduction on the device. Consider using!$acc parallel loop reduction(min:dt_local)around the explicit loop overmax_dtwith a collapse if multiple dimensions are involved.
!$acc kernels
src/simulation/m_data_output.fpp:319
- Enclosing
maxval(icfl_sf)in akernelsregion may not produce an efficient reduction; consider converting this to a!$acc parallel loop reduction(max:icfl_max_loc)over the underlying array to leverage device-side reductions.
!$acc kernels
src/simulation/m_data_output.fpp:323
- This
kernelsregion wraps two scalar reductions (vcfl_max_loc,Rc_min_loc); you may get better performance by using a combined!$acc parallel loop reduction(max:vcfl_max_loc) reduction(min:Rc_min_loc)over the loop indices instead ofkernels.
!$acc kernels
PR Code Suggestions ✨Explore these optional code suggestions:
|
|||||||||
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #926 +/- ##
==========================================
+ Coverage 43.68% 43.71% +0.02%
==========================================
Files 68 68
Lines 18363 18360 -3
Branches 2295 2292 -3
==========================================
+ Hits 8022 8026 +4
+ Misses 8949 8945 -4
+ Partials 1392 1389 -3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
User description
The #883 created an issue that substituted
!$acc parallelfor!$acc kernelsin all of 3 whole places (!!). It turns out this does not work on NVHPC. This is a stopgap fix. @prathi-wind will fix it up more properly.PR Type
Bug fix
Description
Replace GPU_PARALLEL macro with
!$acc kernelsdirectivesFix NVHPC compiler compatibility issues
Update OpenACC directives in data output and time stepping modules
Changes diagram
Changes walkthrough 📝
m_data_output.fpp
Update OpenACC directives in data output modulesrc/simulation/m_data_output.fpp
#:call GPU_PARALLEL()with!$acc kernelsforicfl_max_loccalculation
!$acc kernelsdirectives forvcfl_max_locandRc_min_locm_time_steppers.fpp
Fix OpenACC directives in time stepping modulesrc/simulation/m_time_steppers.fpp
#:call GPU_PARALLEL()with!$acc kernelsfordt_localcalculation