[python-package] Add test for converting a `ctypes` int64 pointer array to a NumPy array #7071

nicklamiller · 2025-10-24T18:02:47Z

Contributes to: #7031

Adds test for _cint64_array_to_numpy:

LightGBM/python-package/lightgbm/basic.py

Lines 504 to 509 in 5dbfcdc

    
           def _cint64_array_to_numpy(*, cptr: "ctypes._Pointer", length: int) -> np.ndarray: 
        
               """Convert a ctypes int pointer array to a numpy array.""" 
        
               if isinstance(cptr, ctypes.POINTER(ctypes.c_int64)): 
        
                   return np.ctypeslib.as_array(cptr, shape=(length,)).copy() 
        
               else: 
        
                   raise RuntimeError("Expected int64 pointer")

jameslamb

Thanks for working on this @nicklamiller

Before I review this... did you try to do this through LightGBM's public API?

From #7031:

Tests should only use lightgbm's public API, unless that is very difficult or expensive. Any function whose name begins with a _ is considered private.

This particular internal function is so small and simple, I think it'd be preferable to have tests which cover it via the public API. That'd give us more coverage of lightgbm under whatever conditions lead to this function being invoked. I'm not exactly sure what code paths do that, probably passing int64 arrays for label or something,... you'd have to do a bit of investigation.

This reverts commit 511070f.

nicklamiller · 2025-10-29T00:50:27Z

did you try to do this through LightGBM's public API?

@jameslamb thank you very much for the detailed instructions in #7031 and sorry for completely missing that point! Testing through the public API to mimic how functionality is called in the wild by users makes sense, and _cint64_array_to_numpy is now tested through Booster(...).predict(...).

Newly covered lines

Running:

pytest \
    --cov=lightgbm \
    --cov-report="term" \
    --cov-report="html:htmlcov" \
    tests/python_package_test/test_engine.py

and viewing coverage report for python-package/lightgbm/basic.py (where _cint64_array_to_numpy is defined)

on master:

on this PR's branch:

Some additional notes/thoughts:

_cint64_array_to_numpy only ever gets called if pred_contrib=True in Booster(...).predict(...), please see call chain/graph below

Even though _cint64_array_to_numpy is defined in basic.py, I noticed some tests in test_engine.py were directly testing Booster(...).predict(...) specifically with pred_contrib=True so I placed my test in test_engine.py instead of test_basic.py, one notable example of such a test is test_contribs_sparse:

LightGBM/tests/python_package_test/test_engine.py

Lines 1929 to 1962 in f2a32f9

    
           def test_contribs_sparse(): 
        
               n_features = 20 
        
               n_samples = 100 
        
               # generate CSR sparse dataset 
        
               X, y = make_multilabel_classification( 
        
                   n_samples=n_samples, sparse=True, n_features=n_features, n_classes=1, n_labels=2 
        
               ) 
        
               y = y.flatten() 
        
               X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42) 
        
               params = { 
        
                   "objective": "binary", 
        
                   "verbose": -1, 
        
               } 
        
               lgb_train = lgb.Dataset(X_train, y_train) 
        
               gbm = lgb.train(params, lgb_train, num_boost_round=20) 
        
               contribs_csr = gbm.predict(X_test, pred_contrib=True) 
        
               assert isspmatrix_csr(contribs_csr) 
        
               # convert data to dense and get back same contribs 
        
               contribs_dense = gbm.predict(X_test.toarray(), pred_contrib=True) 
        
               # validate the values are the same 
        
               if platform.machine() == "aarch64": 
        
                   np.testing.assert_allclose(contribs_csr.toarray(), contribs_dense, rtol=1, atol=1e-12) 
        
               else: 
        
                   np.testing.assert_allclose(contribs_csr.toarray(), contribs_dense) 
        
               assert np.linalg.norm(gbm.predict(X_test, raw_score=True) - np.sum(contribs_dense, axis=1)) < 1e-4 
        
               # validate using CSC matrix 
        
               X_test_csc = X_test.tocsc() 
        
               contribs_csc = gbm.predict(X_test_csc, pred_contrib=True) 
        
               assert isspmatrix_csc(contribs_csc) 
        
               # validate the values are the same 
        
               if platform.machine() == "aarch64": 
        
                   np.testing.assert_allclose(contribs_csc.toarray(), contribs_dense, rtol=1, atol=1e-12) 
        
               else: 
        
                   np.testing.assert_allclose(contribs_csc.toarray(), contribs_dense)

I debated appending to this test but given it was already testing quite a few things, I decided to add test_predict_contrib_int64 as a separate one

There are some failing CI jobs for older CUDA versions, it's not immediately obvious to me if these are related to my changes, I'll dig a bit deeper.

`_cint64_array_to_numpy` call chain from Booster(...).predict(...)

Booster.predict(pred_contrib=True)
    ↓
_InnerPredictor.predict(pred_contrib=True)
    ↓
__pred_for_csr(csr, predict_type=_C_API_PREDICT_CONTRIB)
    ↓
__inner_predict_csr_sparse(csr, predict_type=_C_API_PREDICT_CONTRIB)
    ↓
_LIB.LGBM_BoosterPredictSparseOutput()  # C API call
    ↓
__create_sparse_native(csr, out_ptr_indptr, ...)
    ↓
_cint64_array_to_numpy(cptr=out_ptr_indptr, length=indptr_len)

jameslamb

Thanks so much for your effort here and for figuring out a code path through the public API that relies on this function!!! I really appreciate your thorough explanation, as always.

I've merged in latest master to trigger a new CI run, let's see if that issue in the CUDA jobs persists.

I've put up a few suggestions for changes to the tests. In general, when testing LightGBM with very small datasets, it's important to ensure that a non-empty model (multiple trees and splits) is trained. There are code paths that are skpped when the model doesn't have any splits.

jameslamb · 2025-11-16T21:27:36Z

tests/python_package_test/test_engine.py

+    n_samples = 100
+    n_features = 5
+
+    X, y = make_multilabel_classification(
+        n_samples=n_samples, sparse=True, n_features=n_features, n_classes=1, n_labels=2
+    )


Suggested change

n_samples = 100

n_features = 5

X, y = make_multilabel_classification(

n_samples=n_samples, sparse=True, n_features=n_features, n_classes=1, n_labels=2

)

X, y = make_multilabel_classification(

n_samples=100, sparse=True, n_features=5, n_classes=1, n_labels=2

)

Since these values are only used one time in the code, let's just hard-code them here instead of introducing this indirection.

jameslamb · 2025-11-16T21:30:33Z

tests/python_package_test/test_engine.py

+    params = {"objective": "binary", "num_leaves": 7, "learning_rate": 0.1, "verbose": -1}
+    booster = lgb.Booster(params, train_data)
+
+    for _ in range(5):
+        booster.update()


Suggested change

params = {"objective": "binary", "num_leaves": 7, "learning_rate": 0.1, "verbose": -1}

booster = lgb.Booster(params, train_data)

for _ in range(5):

booster.update()

params = {

"objective": "binary",

"num_leaves": 7,

"min_data_in_bin": 1,

"min_data_in_leaf": 1,

"seed": 708,

"verbose": -1

}

booster = lgb.train(params, train_set=train_data, num_boost_round=5)

assert booster.num_trees() == 5

Instead of manually implementing the training loop, let's just use lgb.train().

And because this is such a small Dataset, let's set min_data_* parameters to ensure that the trained model actually has some trees and splits.

jameslamb

Thanks so much for your effort here and for figuring out a code path through the public API that relies on this function!!! I really appreciate your thorough explanation, as always.

I've merged in latest master to trigger a new CI run, let's see if that issue in the CUDA jobs persists.

I've put up a few suggestions for changes to the tests. In general, when testing LightGBM with very small datasets, it's important to ensure that a non-empty model (multiple trees and splits) is trained. There are code paths that are skpped when the model doesn't have any splits.

nicklamiller · 2025-11-19T02:18:28Z

when testing LightGBM with very small datasets, it's important to ensure that a non-empty model (multiple trees and splits) is trained. There are code paths that are skpped when the model doesn't have any splits.

@jameslamb Ah I see that makes sense, I've gone ahead and added these suggestions.

It looks like the CUDA jobs were still failing even after merging the latest master, and specifically due to test_predict_contrib_int64 (error). So it appears that there's an issue having 64 bit indices in sparse matrices as done in this test for CUDA jobs. For that reason, I've simply skipped this test for CUDA jobs, but please let me know if we want to make sure this functionality works when CUDA is used.

jameslamb

Thanks for your work on this!

I think skipping this test is totally fine. Exciting that adding this testing actually found a bug! Even if it's intentional that the library doesn't currently support int64 sparse matrix indices for the CUDA variant, it's still a bug that that results in a segfault instead of a nice exception.

Could you add a short bug report issue documenting that? It'd be nice to eventually either support this or at least eliminate the segfault.

I left one other very small suggestion. Please address that and update to latest master... otherwise I think this is ready to merge 😁

tests/python_package_test/test_engine.py

nicklamiller · 2025-12-09T23:29:05Z

@jameslamb thanks very much for the review!

Could you add a short bug report issue documenting that? It'd be nice to eventually either support this or at least eliminate the segfault.

This is now being tracked in #7101.

jameslamb · 2025-12-10T00:05:40Z

That issue looks perfect, thank you so much!!!

Don't worry about the failing R-package tests here, I bet it is still some fallout from recent R-devel changes (#7099). Hopefully will be resolved by a re-run in a few hours or tomorrow.

Add test for _cint64_array_to_numpy

511070f

nicklamiller requested review from StrikerRUS, borchero, guolinke, jameslamb, jmoralez and shiyu1994 as code owners October 24, 2025 18:02

nicklamiller changed the title ~~Add test for converting a ctypes int64 pointer array to a NumPy array~~ [python-package] Add test for converting a ctypes int64 pointer array to a NumPy array Oct 24, 2025

jameslamb requested changes Oct 25, 2025

View reviewed changes

jameslamb added in progress maintenance labels Oct 25, 2025

nicklamiller added 5 commits October 25, 2025 10:34

Revert "Add test for _cint64_array_to_numpy"

195c487

This reverts commit 511070f.

Add test for _cint64_array_to_numpy through public API

643e3b3

Merge remote-tracking branch 'upstream/master' into cint64-arr-test

e43b31b

Move test_predict_contrib_int64 to test_engine.py

f674626

Use conventions in test_engine.py

cbf6908

Merge branch 'master' into cint64-arr-test

6354bd6

jameslamb requested changes Nov 16, 2025

View reviewed changes

nicklamiller added 3 commits November 17, 2025 09:11

Cleanup test and update params to ensure splitting

10bee22

Merge remote-tracking branch 'upstream/master' into cint64-arr-test

d91fd9c

Skip test_predict_contrib_int64 for CUDA tests

10aebd7

nicklamiller mentioned this pull request Nov 19, 2025

cint64 arr test no sparse #7084

Closed

Merge remote-tracking branch 'upstream/master' into cint64-arr-test

d640e6d

nicklamiller requested a review from jameslamb November 26, 2025 22:23

jameslamb approved these changes Dec 7, 2025

View reviewed changes

tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved

Merge branch 'master' into cint64-arr-test

9aea4fa

nicklamiller mentioned this pull request Dec 9, 2025

[CUDA] Segfault when using int64 sparse matrix indices with predict(pred_contrib=True) #7101

Open

Remove num_iteration arg since no early stopping

b6f2e7c

jameslamb removed the in progress label Dec 10, 2025

jameslamb merged commit d5e449d into microsoft:master Dec 10, 2025
100 of 110 checks passed

jameslamb mentioned this pull request Dec 10, 2025

[python-package] expand test coverage #7031

Open

31 tasks

nicklamiller deleted the cint64-arr-test branch December 10, 2025 05:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[python-package] Add test for converting a `ctypes` int64 pointer array to a NumPy array #7071

[python-package] Add test for converting a `ctypes` int64 pointer array to a NumPy array #7071

Uh oh!

nicklamiller commented Oct 24, 2025 •

edited

Loading

Uh oh!

jameslamb left a comment

Uh oh!

nicklamiller commented Oct 29, 2025

Uh oh!

jameslamb left a comment

Uh oh!

jameslamb Nov 16, 2025

Uh oh!

jameslamb Nov 16, 2025

Uh oh!

jameslamb left a comment

Uh oh!

nicklamiller commented Nov 19, 2025

Uh oh!

jameslamb left a comment

Uh oh!

Uh oh!

nicklamiller commented Dec 9, 2025

Uh oh!

jameslamb commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	def _cint64_array_to_numpy(*, cptr: "ctypes._Pointer", length: int) -> np.ndarray:
	"""Convert a ctypes int pointer array to a numpy array."""
	if isinstance(cptr, ctypes.POINTER(ctypes.c_int64)):
	return np.ctypeslib.as_array(cptr, shape=(length,)).copy()
	else:
	raise RuntimeError("Expected int64 pointer")

[python-package] Add test for converting a ctypes int64 pointer array to a NumPy array #7071

[python-package] Add test for converting a ctypes int64 pointer array to a NumPy array #7071

Uh oh!

Conversation

nicklamiller commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jameslamb left a comment

Choose a reason for hiding this comment

Uh oh!

nicklamiller commented Oct 29, 2025

Uh oh!

jameslamb left a comment

Choose a reason for hiding this comment

Uh oh!

jameslamb Nov 16, 2025

Choose a reason for hiding this comment

Uh oh!

jameslamb Nov 16, 2025

Choose a reason for hiding this comment

Uh oh!

jameslamb left a comment

Choose a reason for hiding this comment

Uh oh!

nicklamiller commented Nov 19, 2025

Uh oh!

jameslamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nicklamiller commented Dec 9, 2025

Uh oh!

jameslamb commented Dec 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[python-package] Add test for converting a `ctypes` int64 pointer array to a NumPy array #7071

[python-package] Add test for converting a `ctypes` int64 pointer array to a NumPy array #7071

nicklamiller commented Oct 24, 2025 •

edited

Loading