Cythonize launch & LaunchConfig more#1390
Conversation
7642502 to
51635bf
Compare
|
/ok to test 51635bf |
This comment has been minimized.
This comment has been minimized.
|
For comparison, this is with In [16]: %timeit driver.cuLaunchKernel(kernel._handle, *config.grid, *config.block, 0, s.handle, ((0, 0), (ctypes.c_void_p, ctypes.c_void_p)), 0)
3.98 μs ± 27.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each) |
rparolin
left a comment
There was a problem hiding this comment.
generally lgtm, a left a few comments for you to review.
|
Thanks, Rob! cc @kkraus14 @cpcloud @emcastillo for vis |
| cdef: | ||
| public tuple grid | ||
| public tuple cluster | ||
| public tuple block | ||
| public int shmem_size | ||
| public bint cooperative_launch |
There was a problem hiding this comment.
Leaving a quick note here in case I forget.
I wasn't super happy about this PR and it's why I was on the fence of pushing this forward: Our design of LaunchConfig allows reusing the Python object across multiple launches. But, it is too flexible that we pay the price of maintaining the public attributes (both readable and writable in Python) associated with Python overhead. If we don't think this is reasonable, we should break it before GA, and turn LaunchConfig into an immutable object.
There was a problem hiding this comment.
In theory we could store the native types and have getter / setter properties that translate to/from tuples as needed? I agree we should avoid paying the Python overhead here if possible.
There was a problem hiding this comment.
Yes. Though in the case of launch config here, we essentially are wrapping a struct with a flexible array member (the attributes array can be arbitrarily long), which is annoying.
Last night I was thinking about thread safety. But in the present case we still do not offer thread safety anyway (ex: when two threads set grid member at the same time, for example). Something to think about in #1389.
|
Description
closes #1078
this PR:
main branch (commit 027ba10)
cuda.core v0.4.2 (regression):
cuda.core v0.3.2:
script:
Execute it via
ipython -i launch_perf.pyto warm up and then repeat the launch via%timeit launch(s, config, kernel, 0, 0).Checklist