Fix device attribute handling for older drivers#1437
Conversation
|
/ok to test 3af415c |
|
/ok to test 5f957b1 |
This comment has been minimized.
This comment has been minimized.
| err = cydriver.cuDeviceGetAttribute(&val, attr, self._handle) | ||
| if err == cydriver.CUresult.CUDA_ERROR_INVALID_VALUE: | ||
| return 0 | ||
| if err == cydriver.CUresult.CUDA_ERROR_INVALID_VALUE and default is not None: |
There was a problem hiding this comment.
This tiny tweak will make it possible to specify None as the default.
@@ -32,6 +32,7 @@ if TYPE_CHECKING:
# but it seems it is very convenient to expose them for testing purposes...
_tls = threading.local()
_lock = threading.Lock()
+_NO_DEFAULT = object()
cdef bint _is_cuInit = False
@@ -61,7 +62,7 @@ cdef class DeviceProperties:
cdef cydriver.CUresult err
with nogil:
err = cydriver.cuDeviceGetAttribute(&val, attr, self._handle)
- if err == cydriver.CUresult.CUDA_ERROR_INVALID_VALUE and default is not None:
+ if err == cydriver.CUresult.CUDA_ERROR_INVALID_VALUE and default is not _NO_DEFAULT:
return default
HANDLE_RETURN(err)
return valThere was a problem hiding this comment.
After updating the type signatures to return int for slight efficiency, it was no longer possible to return None so I updated the two related functions locally to convert 0 to None.
5f957b1 to
d946b9a
Compare
|
Latest upload changes the PCI device IDs to |
2e4caeb to
a7d4c22
Compare
Add explicit default value handling for device attributes that may not be supported by older CUDA drivers. When cuDeviceGetAttribute returns CUDA_ERROR_INVALID_VALUE, return a sensible default instead of raising an error. - Add default parameter to _get_attribute() and _get_cached_attribute() - Use default=0 for boolean/enablement attributes (returns False) - Use default=1 for mem_sync_domain_count (single domain is traditional behavior) - Use default=-1 for host_numa_id (indicates NUMA not supported) - Document that gpu_pci_device_id/gpu_pci_subsystem_id return 0 if unsupported Closes NVIDIA#1420
0a08b87 to
9945810
Compare
|
I'm not 100% sold on using |
|
/ok to test fdff347 |
I think that's fine (Pythonic as you said, i.e. what people expect), but I'd also be happy if we made it To be explicit about the context I have in mind: def gpu_pci_device_id(self) -> int:
"""int: The combined 16-bit PCI device ID and 16-bit PCI vendor ID."""
Returns -1 if the driver does not support this query.
""" def gpu_pci_subsystem_id(self) -> int:
"""int: The combined 16-bit PCI subsystem ID and 16-bit PCI subsystem vendor ID.
Returns -1 if the driver does not support this query.
"""Why not I'm assuming: Any actual IDs will be greater than zero. — Is that a valid assumption? But I'd avoid the |
|
The format is Typical NVIDIA PCI device IDs are non-zero values like:
The combined value is an unsigned int with several bits set (never zero). I think either Aside from zero, users may see numbers such as the following in logs:
|
- Add except? -2 to _get_attribute and _get_cached_attribute for proper exception propagation (-2 never clashes with valid return values) - Keep default parameter untyped to allow None, cast to int when used - Simplify gpu_pci_device_id/gpu_pci_subsystem_id to return 0 when unsupported (0 is never a valid PCI ID)
fdff347 to
0ebcce9
Compare
|
/ok to test 0ebcce9 |
|
Summary
cuDeviceGetAttributereturnsCUDA_ERROR_INVALID_VALUE, return a sensible default instead of raising an errorChanges
defaultparameter to_get_attribute()and_get_cached_attribute()with default value0default=1formem_sync_domain_count(single domain is traditional behavior)default=-1forhost_numa_id(indicates NUMA not supported)gpu_pci_device_id/gpu_pci_subsystem_idreturn 0 if unsupported (added in CUDA 12.8)Closes #1420