[tmva][sofie] Restructure emitted code to be differentiable with Clad by guitargeek · Pull Request #18332 · root-project/root

guitargeek · 2025-04-09T09:40:02Z

Restructure emitted code to be differentiable with Clad.

vgvassilev · 2025-04-09T10:03:40Z

tmva/sofie/inc/TMVA/SOFIE_common.hxx

   return out;
 }

+inline void Copy(float const *b, float const *e, float *o)


Does providing a pullback for std::copy not work?

No, I tried a bit but then gave up. This was my approach:

#include <Math/CladDerivator.h> namespace std { void copy_pullback(double const *first, double const *last, double *out_first, double *_d_out, double *_d_first, double *_d_last, double *_d_out_first) { // Implementation doesn't matter yet, it doesn't compile anyway } } // namespace std void fooImpl(double const *x, double *y) { std::copy(x, x + 1, y); } void foo(double const *x, double *y) { fooImpl(x, y); } double g(double *variables) { double out; foo(variables, &out); return out * variables[1]; } void clademo() { // Call clad to generate the gradient of g. auto g_grad = clad::gradient(g, "variables"); // Execute the generated gradient function. double variables[]{3., 4.}; double grad_output[]{0., 0.}; g_grad.execute(variables, grad_output); std::cout << "grad_output[0]: " << grad_output[0] << std::endl; std::cout << "grad_output[1]: " << grad_output[1] << std::endl; // Dump the generated gradient code to standard output. g_grad.dump(); }

It segfaults. I think Clad just doesn't play well with the STL algos that take iterators, it's better to avoid this, no?

In any case, supporting this is not crucial for this PR. I was refactoring things to avoid this copy call in the generated code anyway.

Once this PR is functional for our usecase (actually now, but I also want to make the ROOT CI pass again), I'll write up what was not perfect in Clad for this and open issues

Ah, I see. Probably worth opening an issue in clad…

vgvassilev · 2025-04-09T10:04:36Z

tmva/sofie/test/TestCustomModelsFromONNX.cxx

   });

   TMVA_SOFIE_Equal::Session s("Equal_FromONNX.dat");
-   std::vector<bool> output = s.infer(input1.data(),input2.data());


Did that fail to differentiate?

No, I didn't even try to differentiate the models in the test. I'm solely focusing on the SBI usecase that we implement with LHCb. The reason why I changed this is because std::vector<bool> is not a good output type parameter. See:

[TMVA][SOFIE] Use uint8_t instead of bool in return types #18302

github-actions · 2025-04-09T12:04:48Z

Test Results

22 files 22 suites 3d 3h 39m 5s ⏱️
3 828 tests 3 827 ✅ 1 💤 0 ❌
75 695 runs 75 686 ✅ 9 💤 0 ❌

Results for commit 728e5ad.

♻️ This comment has been updated with latest results.

guitargeek · 2025-04-22T17:06:23Z

Proof of concept test for this PR

Take this ONNX file (remove the .txt suffix after downloading):

VRlL_real_500k_evts_model.onnx.txt

Here are the scripts to convert the model to C++ and then to differentiate it with Clad:

// onnx_to_cpp.C

void onnx_to_cpp()
{
   using namespace TMVA::Experimental;
   SOFIE::RModelParser_ONNX parser;
   SOFIE::RModel model = parser.Parse("./VRlL_real_500k_evts_model.onnx");
   model.SetOptimizationLevel(SOFIE::OptimizationLevel::kBasic);
   model.Generate();
   model.PrintRequiredInputTensors();

   model.OutputGenerated("./VRlL_real_500k_evts_model.hxx");
}

// sofie_ad.C

#include "VRlL_real_500k_evts_model.hxx"

#include <Math/CladDerivator.h>

float my_func(TMVA_SOFIE_VRlL_real_500k_evts_model::Session const *session, float const *tensor_x,
              float *tensor_theory_params)
{
   float out = 0.;
   TMVA_SOFIE_VRlL_real_500k_evts_model::doInfer(session, tensor_x, tensor_theory_params, &out);
   return out;
}

void sofie_ad()
{
   std::vector<float> input1{5.0, 2.0, 1.0, -1.0, 1.0};
   std::vector<float> input2{0.0};

   // Generated header file shall contain a Session class which requires
   // initialization to load the corresponding weights.
   TMVA_SOFIE_VRlL_real_500k_evts_model::Session s("VRlL_real_500k_evts_model.dat");

   // Once instantiated the session object's infer method can be used
   // std::vector<float> out = s.infer(input1.data(), input2.data());

   auto func = [&](std::span<float> params) { return s.infer(input1.data(), params.data())[0]; };

   auto numDiff = [&](int i) {
      const float eps = 1e-4;
      std::vector<float> p{input2};
      p[i] = input2[i] - eps;
      float funcValDown = func(p);
      p[i] = input2[i] + eps;
      float funcValUp = func(p);
      return (funcValUp - funcValDown) / (2 * eps);
   };

   for (std::size_t i = 0; i < input2.size(); ++i) {
      std::cout << i << ":" << std::endl;
      std::cout << "  numr : " << numDiff(i) << std::endl;
   }

   float grad_output[]{0., 0., 0., 0., 0.};
   auto g_grad = clad::gradient<clad::opts::disable_tbr>(my_func, "tensor_theory_params");
   g_grad.execute(&s, input1.data(), input2.data(), grad_output);
   std::fill(std::begin(grad_output), std::end(grad_output), 0);
   g_grad.execute(&s, input1.data(), input2.data(), grad_output);

   std::cout << "  clad : " << grad_output[0] << std::endl;

   g_grad.dump();
}

Note that clad::opts::disable_tbr can probably be removed when this Clad issue is fixed:

Another regression in Clad v1.10 with new crash in code that worked before vgvassilev/clad#1369

Usage with expected output (replace libblas.so location with relevant path for your system):

   ------------------------------------------------------------------
  | Welcome to ROOT 6.35.01                        https://root.cern |
  | (c) 1995-2024, The ROOT Team; conception: R. Brun, F. Rademakers |
  | Built for linuxx8664gcc on Jan 01 1980, 00:00:00                 |
  | From heads/sofie_ad@v6-35-01-2277-g8ddebb98bb                    |
  | With g++ (GCC) 14.2.1 20250322                                   |
  | Try '.help'/'.?', '.demo', '.license', '.credits', '.quit'/'.q'  |
   ------------------------------------------------------------------

root [0] .L /nix/store/6kknwpcf8fl7ihkkxmdb6p764kdn443n-blas-3/lib/libblas.so
root [1] .x onnx_to_cpp.C
Model requires following inputs:
Fully Specified Tensor name: theory_params	type: float	shape: [1]
Fully Specified Tensor name: x	type: float	shape: [5]

root [2] .x sofie_ad.C
0:
  numr : -0.531077
  clad : -0.532437
root [3] .q

vgvassilev · 2025-08-05T17:43:56Z

Why did we decide to not pursue this?

guitargeek · 2025-08-11T08:20:38Z

@vgvassilev, sorry that was totally an accident. Maybe I confused it with another PR, or I wanted to close and re-open the PR to run the tests, but apparently I missed the "reopen" button.

It's possible to call `gemm` without a `C` parameter (constant offset), so the custom derivative can't assume that `C` and `_d_C` are always set. This avoids segfaults when `C` is `nullptr`.

It simplifies the code a bit if we don't have separate treatment of larger and smaller (n < 100) constant vectors. It's also not that meaningful to have a magic size threshold per vector that also doesn't help avoiding large stack allocations and emitted code footprint for the initializer list. The mechanism is mute when there are many small constant tensors.

The idea of this commit is to refactor the `doInfer()` function that implements the inference from a member function of the `Session` struct to a free function that takes the `Session` by `const`-reference.

This is covered by the tests that differentiate code emitted by SOFIE.

guitargeek added the in:TMVA label Apr 9, 2025

guitargeek self-assigned this Apr 9, 2025

vgvassilev reviewed Apr 9, 2025

View reviewed changes

guitargeek force-pushed the sofie_ad branch 8 times, most recently from 6b90cb6 to 87597cd Compare April 15, 2025 06:33

guitargeek force-pushed the sofie_ad branch from 87597cd to fa2cd12 Compare April 22, 2025 18:17

guitargeek force-pushed the sofie_ad branch 3 times, most recently from 89b638c to a3d545f Compare May 7, 2025 14:42

guitargeek changed the title ~~[TMVA][SOFIE] Restructure emitted code to be differentiable with Clad~~ [tmva][sofie] Restructure emitted code to be differentiable with Clad May 7, 2025

guitargeek force-pushed the sofie_ad branch 2 times, most recently from 3f40542 to 78fcc20 Compare May 8, 2025 09:12

guitargeek force-pushed the sofie_ad branch from 78fcc20 to 66e39bb Compare May 27, 2025 06:34

guitargeek force-pushed the sofie_ad branch 2 times, most recently from 4c9920f to 97903fa Compare July 15, 2025 14:52

guitargeek closed this Aug 5, 2025

guitargeek deleted the sofie_ad branch August 5, 2025 17:14

guitargeek restored the sofie_ad branch August 11, 2025 08:20

guitargeek reopened this Aug 11, 2025

guitargeek force-pushed the sofie_ad branch from 97903fa to 2742478 Compare August 11, 2025 09:22

guitargeek force-pushed the sofie_ad branch from 2742478 to c93c28e Compare February 5, 2026 16:34

guitargeek mentioned this pull request Feb 7, 2026

Regression since Clad 2.1 that broke differentiating code emitted by SOFIE vgvassilev/clad#1721

Open

guitargeek force-pushed the sofie_ad branch from c93c28e to 9ea0f01 Compare February 7, 2026 14:28

guitargeek force-pushed the sofie_ad branch from 9ea0f01 to 362f8b3 Compare February 17, 2026 12:34

guitargeek force-pushed the sofie_ad branch 7 times, most recently from 9873d07 to 4f822ad Compare March 4, 2026 13:40

guitargeek added in:SOFIE and removed in:TMVA labels Mar 4, 2026

guitargeek mentioned this pull request Mar 4, 2026

[tmva][sofie] Use this pointer when accessing Session data members #21491

Merged

guitargeek force-pushed the sofie_ad branch 4 times, most recently from 7e5f4aa to 575f9e0 Compare March 4, 2026 23:28

guitargeek added 3 commits March 5, 2026 11:12

[Math] Protect against C=nullptr in SOFIE::Gemm_Call pullback

861ed79

It's possible to call `gemm` without a `C` parameter (constant offset), so the custom derivative can't assume that `C` and `_d_C` are always set. This avoids segfaults when `C` is `nullptr`.

[tmva][sofie] Restructure emitted code to be differentiable with Clad

728e5ad

The idea of this commit is to refactor the `doInfer()` function that implements the inference from a member function of the `Session` struct to a free function that takes the `Session` by `const`-reference.

guitargeek force-pushed the sofie_ad branch from 575f9e0 to 728e5ad Compare March 5, 2026 13:21

guitargeek added 2 commits March 5, 2026 16:54

[Math] Generalize Gemm pullback to all transpose configurations

c238768

This is covered by the tests that differentiate code emitted by SOFIE.

[tmva][sofie] Add test for differentiating emitted code with Clad

bd28c89

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tmva][sofie] Restructure emitted code to be differentiable with Clad#18332

[tmva][sofie] Restructure emitted code to be differentiable with Clad#18332
guitargeek wants to merge 5 commits intoroot-project:masterfrom
guitargeek:sofie_ad

guitargeek commented Apr 9, 2025

Uh oh!

vgvassilev Apr 9, 2025

Uh oh!

guitargeek Apr 9, 2025

Uh oh!

vgvassilev Apr 9, 2025

Uh oh!

vgvassilev Apr 9, 2025

Uh oh!

guitargeek Apr 9, 2025

Uh oh!

github-actions bot commented Apr 9, 2025 •

edited

Loading

Uh oh!

guitargeek commented Apr 22, 2025 •

edited

Loading

Uh oh!

vgvassilev commented Aug 5, 2025

Uh oh!

guitargeek commented Aug 11, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

guitargeek commented Apr 9, 2025

Uh oh!

vgvassilev Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

guitargeek Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

vgvassilev Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

vgvassilev Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

guitargeek Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

guitargeek commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proof of concept test for this PR

Uh oh!

vgvassilev commented Aug 5, 2025

Uh oh!

guitargeek commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Apr 9, 2025 •

edited

Loading

guitargeek commented Apr 22, 2025 •

edited

Loading

guitargeek commented Aug 11, 2025 •

edited

Loading