Cythonize `launch` & `LaunchConfig` more by leofang · Pull Request #1390 · NVIDIA/cuda-python

leofang · 2025-12-16T22:45:09Z

Description

this PR:

In [1]: %timeit launch(s, config, kernel, 0, 0)
2.6 μs ± 0.656 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

main branch (commit 027ba10)

In [1]: %timeit launch(s, config, kernel, 0, 0)
4.02 μs ± 5.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

cuda.core v0.4.2 (regression):

In [1]: %timeit launch(s, config, kernel, 0, 0)
4.42 μs ± 6.69 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

cuda.core v0.3.2:

In [1]: %timeit launch(s, config, kernel, 0, 0)
4.18 μs ± 4.13 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

script:

from cuda.core.experimental import Device, Program, launch, LaunchConfig


dev = Device()
dev.set_current()


prog = Program(
    r"""
    extern "C" __global__ void empty_kernel(const float* a, const double* b) {
    }
    """, code_type="c++"
)
obj = prog.compile(target_type="cubin")
kernel = obj.get_kernel("empty_kernel")

config = LaunchConfig(grid=64, block=256, shmem_size=0)
s = dev.default_stream
launch(s, config, kernel, 0, 0)
dev.sync()

Execute it via ipython -i launch_perf.py to warm up and then repeat the launch via %timeit launch(s, config, kernel, 0, 0).

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2025-12-16T22:45:13Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

leofang · 2025-12-16T23:21:11Z

/ok to test 51635bf

leofang · 2025-12-16T23:36:06Z

For comparison, this is with cuda.bindings:

In [16]: %timeit driver.cuLaunchKernel(kernel._handle, *config.grid, *config.block, 0, s.handle, ((0, 0), (ctypes.c_void_p, ctypes.c_void_p)), 0)
3.98 μs ± 27.6 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

cuda_core/cuda/core/experimental/_launch_config.pyx

cuda_core/cuda/core/experimental/_launcher.pyx

rparolin

generally lgtm, a left a few comments for you to review.

leofang · 2025-12-17T01:39:59Z

Thanks, Rob!

cc @kkraus14 @cpcloud @emcastillo for vis

leofang · 2025-12-17T01:43:02Z

cuda_core/cuda/core/experimental/_launch_config.pxd

+    cdef:
+        public tuple grid
+        public tuple cluster
+        public tuple block
+        public int shmem_size
+        public bint cooperative_launch


Leaving a quick note here in case I forget.

I wasn't super happy about this PR and it's why I was on the fence of pushing this forward: Our design of LaunchConfig allows reusing the Python object across multiple launches. But, it is too flexible that we pay the price of maintaining the public attributes (both readable and writable in Python) associated with Python overhead. If we don't think this is reasonable, we should break it before GA, and turn LaunchConfig into an immutable object.

In theory we could store the native types and have getter / setter properties that translate to/from tuples as needed? I agree we should avoid paying the Python overhead here if possible.

Yes. Though in the case of launch config here, we essentially are wrapping a struct with a flexible array member (the attributes array can be arbitrarily long), which is annoying.

Last night I was thinking about thread safety. But in the present case we still do not offer thread safety anyway (ex: when two threads set grid member at the same time, for example). Something to think about in #1389.

github-actions · 2025-12-17T01:48:12Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

Andy-Jost

LGTM

cuda_core/cuda/core/experimental/_launcher.pyx

cythonize launch & launch config

71e8637

leofang self-assigned this Dec 16, 2025

leofang added enhancement Any code-related improvements triage Needs the team's attention P0 High priority - Must do! cuda.core Everything related to the cuda.core module labels Dec 16, 2025

nits

51635bf

leofang force-pushed the faster_launch branch from 7642502 to 51635bf Compare December 16, 2025 23:00

This comment has been minimized.

Sign in to view

rparolin reviewed Dec 16, 2025

View reviewed changes

cuda_core/cuda/core/experimental/_launch_config.pyx Show resolved Hide resolved

rparolin reviewed Dec 16, 2025

View reviewed changes

cuda_core/cuda/core/experimental/_launch_config.pyx Show resolved Hide resolved

rparolin reviewed Dec 16, 2025

View reviewed changes

cuda_core/cuda/core/experimental/_launcher.pyx Show resolved Hide resolved

rparolin approved these changes Dec 16, 2025

View reviewed changes

leofang marked this pull request as ready for review December 17, 2025 01:38

leofang merged commit f83eff2 into NVIDIA:main Dec 17, 2025
80 checks passed

leofang deleted the faster_launch branch December 17, 2025 01:38

leofang modified the milestones: cuda.core backlog, cuda.core beta 10 Dec 17, 2025

leofang removed the triage Needs the team's attention label Dec 17, 2025

leofang commented Dec 17, 2025

View reviewed changes

leofang mentioned this pull request Dec 17, 2025

Address known thread-safety issues in cuda.core #1389

Open

Andy-Jost reviewed Dec 17, 2025

View reviewed changes

cuda_core/cuda/core/experimental/_launcher.pyx Show resolved Hide resolved

Conversation

leofang commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

copy-pr-bot bot commented Dec 16, 2025

Uh oh!

leofang commented Dec 16, 2025

Uh oh!

This comment has been minimized.

leofang commented Dec 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rparolin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

leofang commented Dec 17, 2025

Uh oh!

leofang Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

kkraus14 Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

leofang Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 17, 2025

Uh oh!

Andy-Jost left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

leofang commented Dec 16, 2025 •

edited

Loading