Do not use __managed__ in GPU builds. Add utils.#606
Conversation
|
Logfiles from GitLab pipeline #12242 (:no_entry:) have been uploaded here! Status and direct links:
|
|
Logfiles from GitLab pipeline #12246 (:no_entry:) have been uploaded here! Status and direct links:
|
|
This is blocked by #607. |
ee3feb6 to
f439456
Compare
This fixes CPU execution of GPU builds on machines that do not have GPUs, which were previously segfaulting due to the use of the __managed__ keyword. Now the handling of Random123 state is more explicit for host/device. Also add a pair of helper functions coreneuron::[de]allocate_unified() that wrap cudaMallocManaged in GPU builds if --gpu was passed at runtime and fall back to new/delete otherwise, and a method coreneuron::unified_memory_enabled() that queries whether this condition is met. Additionally add a C++ allocator template coreneuron::unified_allocator<T> that wraps these functions, a templated coreneuron::alloc_deleter<T> for use with std::unique_ptr<T, D>, and a helper coreneuron::allocate_unique(...). Cleanup Random123 code by dropping an unused nrnran123_mutconstruct method. Tweak compilation/CMake scripts to remove libcudacoreneuron.a and instead build CUDA sources inside libcoreneuron.a. This sidesteps circular dependency issues that would otherwise be introduced by this commit. Modify CMake so `clang-format` target formats CUDA (.cu) files too.
f439456 to
43b595a
Compare
|
Logfiles from GitLab pipeline #13543 (:white_check_mark:) have been uploaded here! Status and direct links:
|
ferdonline
left a comment
There was a problem hiding this comment.
LGTM but I'm not the best person to review this code. Maybe let's wait for a review from @iomaganaris
| nrnran123_setseq(s, 0, 0); | ||
| { | ||
| // TODO: can I assert something useful about the instance count going | ||
| // back to zero anywhere? Or that it is zero when some operations happen? |
There was a problem hiding this comment.
Answered in previous comment.
@nrnhines : How the ran123 streams allocated with nrnran123_newstream3 inside bbcore_read() should be allocated? any thoughts?
There was a problem hiding this comment.
lets keep this as a separate issue. BlueBrain/nmodl#383 would help to implement this easily.
|
Logfiles from GitLab pipeline #13648 (:white_check_mark:) have been uploaded here! Status and direct links:
|
a421fa5 to
899fffa
Compare
|
Logfiles from GitLab pipeline #13680 (:white_check_mark:) have been uploaded here! Status and direct links:
|
| nrnran123_setseq(s, 0, 0); | ||
| { | ||
| // TODO: can I assert something useful about the instance count going | ||
| // back to zero anywhere? Or that it is zero when some operations happen? |
There was a problem hiding this comment.
lets keep this as a separate issue. BlueBrain/nmodl#383 would help to implement this easily.
…n#606) This fixes CPU execution of GPU builds on machines that do not have GPUs, which were previously segfaulting due to the use of the __managed__ keyword. Now the handling of Random123 state is more explicit for host/device. Also add a pair of helper functions coreneuron::[de]allocate_unified() that wrap cudaMallocManaged in GPU builds if --gpu was passed at runtime and fall back to new/delete otherwise, and a method coreneuron::unified_memory_enabled() that queries whether this condition is met. Additionally add a C++ allocator template coreneuron::unified_allocator<T> that wraps these functions, a templated coreneuron::alloc_deleter<T> for use with std::unique_ptr<T, D>, and a helper coreneuron::allocate_unique(...). Cleanup Random123 code by dropping an unused nrnran123_mutconstruct method. Tweak compilation/CMake scripts to remove libcudacoreneuron.a and instead build CUDA sources inside libcoreneuron.a. This sidesteps circular dependency issues that would otherwise be introduced by this commit. Modify CMake so `clang-format` target formats CUDA (.cu) files too. CoreNEURON Repo SHA: BlueBrain/CoreNeuron@ac2fa3b
Description
This fixes CPU execution of GPU builds on machines that do not have GPUs, which were previously segfaulting due to the use of the
__managed__keyword. Now the handling of Random123 state is more explicit for host/device. This was only added in #595, but the issue wasn't noticed at the time because the GPU-enabled CI only executes tests on machines with GPUs.Also add a pair of helper functions
coreneuron::[de]allocate_unified()that wrapcudaMallocManaged()in GPU builds if--gpuwas passed at runtime and fall back tonew/deleteotherwise, and a methodcoreneuron::unified_memory_enabled()that queries whether this condition is met.Additionally add a C++ allocator template
coreneuron::unified_allocator<T>that wraps these functions, and a templatedcoreneuron::alloc_deleter<T>for use withstd::unique_ptr<T, D>. Also addcoreneuron::allocate_uniquehelper from SO.Cleanup Random123 code by dropping an unused
nrnran123_mutconstruct()method.Tweak compilation scripts to allow for circular dependencies between
libcoreneuron.aandlibcudacoreneuron.a.In future these should just be merged into a single library.
This addresses #599 (comment). #599 should stay open because various OpenACC calls are still not conditional on
--gpu.How to test this?
Try running a GPU-built
special-corewithout--gpuon a machine that does not have an NVIDIA GPU.Test System
Use certain branches for the SimulationStack CI
CI_BRANCHES:NEURON_BRANCH=master,