Cortex-M: Fix pad op to support channels_last memory format by rascani · Pull Request #18429 · pytorch/executorch

rascani · 2026-03-23T23:06:27Z

Summary

Fix pad_meta to propagate channels_last from input to output tensor.
Fix pad_out (C++) to use dim_order() to permute logical dims and padding
into physical memory order for arm_pad_s8.
Add channels_last test cases to test_pad.

Test Plan

pytest backends/cortex_m/test/ops/test_pad.py

pytorch-bot · 2026-03-23T23:06:30Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18429

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

viable/strict has been blocked for 5+ days

❌ 2 New Failures, 7 Unrelated Failures

As of commit 637e2d2 with merge base 8f1b5ee ():

NEW FAILURES - The following jobs have failed:

pull / unittest / macos / macos-job (gh)
export/tests/test_target_recipes.py::TestTargetRecipes::test_mv3_model
pull / unittest-editable / macos / macos-job (gh)
export/tests/test_target_recipes.py::TestTargetRecipes::test_mv3_model

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

pull / test-llama-runner-linux (fp32, xnnpack+custom+qe, linux.arm64.2xlarge, executorch-ubuntu-22.04-gc... / linux-job (gh) (detected as infra flaky with no log or failing log classifier)
pull / test-models-linux-basic (mv3, portable, cmake, linux.arm64.2xlarge, executorch-ubuntu-22.04-gcc11... / linux-job (gh) (detected as infra flaky with no log or failing log classifier)
pull / test-qnn-wheel-packages-linux (3.10) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)
pull / test-qnn-wheel-packages-linux (3.12) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)
pull / test-qnn-wheel-packages-linux (3.13) / linux-job (gh) (detected as infra flaky with no log or failing log classifier)

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-03-23T23:07:27Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Fix pad_meta to propagate channels_last from input to output tensor. Fix pad_out (C++) to use dim_order() to permute logical dims and padding into physical memory order for arm_pad_s8. Add channels_last test cases to test_pad.

backends/cortex_m/ops/operators.py

AdrianLundell · 2026-03-24T13:37:59Z

backends/cortex_m/ops/op_pad.cpp

-  cmsis_nn_dims input_dims = {1, 1, 1, 1};
-  int32_t* d = &input_dims.n;
+  // Permute the real dims according to dim_order.
+  const auto dim_order = input.dim_order();


Are you sure this always works? I have had bad experiences with using dim_order generally, it seems to be very easy for it to default to channels-first when it cannot parse it from the dimensions, or when the fake implementation doesn't propagate it correctly. That is why there is a helper function for check if it is channels last instead,

Nope, not confident. I've switched to the helper function.

AdrianLundell · 2026-03-24T13:39:00Z

backends/cortex_m/ops/op_pad.cpp

+  // arm_pad_s8 processes data in {n, h, w, c} order where c is the
+  // fastest-varying (innermost) dimension. Use dim_order to permute
+  // logical sizes and padding into physical memory order so this holds
+  // for both contiguous and channels_last tensors.
  const size_t offset = kMaxSupportedDims - rank;


Did you consider changing this in the AOT flow instead? I generally like to keep the glue here as small as possible unless there is a good reason not to

That's a great suggestion. Let me see if I can work that out.

This added a good amount of complexity to the AOT flow, and we still have to do permutation of the input dims because the sizes are still in logical form. I'm a bit on the fence about it, so let me know what you think.

AdrianLundell

Nice! We should probably have channels-first and channels-last tests for all operators

The C++ pad kernel previously used dim_order() at runtime to permute both logical sizes and padding into physical memory order. Move the padding permutation to the AOT fusion pass so pre_pad/post_pad arrive in physical order, reducing runtime work. The kernel still permutes sizes via is_channels_last_tensor (unavoidable — tensor API reports logical sizes). Co-authored-by: Claude <noreply@anthropic.com>

rascani

Thanks for the review Adrian!

rascani · 2026-03-25T00:17:40Z

backends/cortex_m/ops/op_pad.cpp

+  // arm_pad_s8 processes data in {n, h, w, c} order where c is the
+  // fastest-varying (innermost) dimension. Use dim_order to permute
+  // logical sizes and padding into physical memory order so this holds
+  // for both contiguous and channels_last tensors.
  const size_t offset = kMaxSupportedDims - rank;


This added a good amount of complexity to the AOT flow, and we still have to do permutation of the input dims because the sizes are still in logical form. I'm a bit on the fence about it, so let me know what you think.

backends/cortex_m/ops/operators.py

rascani · 2026-03-25T00:25:57Z

backends/cortex_m/ops/op_pad.cpp

-  cmsis_nn_dims input_dims = {1, 1, 1, 1};
-  int32_t* d = &input_dims.n;
+  // Permute the real dims according to dim_order.
+  const auto dim_order = input.dim_order();


Nope, not confident. I've switched to the helper function.

rascani requested review from AdrianLundell and psiddh March 23, 2026 23:06

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 23, 2026

Cortex-M: Fix pad op to support channels_last memory format

a56a4d8

Fix pad_meta to propagate channels_last from input to output tensor. Fix pad_out (C++) to use dim_order() to permute logical dims and padding into physical memory order for arm_pad_s8. Add channels_last test cases to test_pad.

AdrianLundell reviewed Mar 24, 2026

View reviewed changes

backends/cortex_m/ops/operators.py Outdated Show resolved Hide resolved

AdrianLundell reviewed Mar 24, 2026

View reviewed changes

rascani force-pushed the cortex-m-pad-channels-last branch from 98c8c57 to 1e325c6 Compare March 25, 2026 00:14

rascani commented Mar 25, 2026

View reviewed changes

AdrianLundell approved these changes Mar 25, 2026

View reviewed changes

Merge branch 'main' into cortex-m-pad-channels-last

637e2d2

rascani mentioned this pull request Mar 26, 2026

NXP Backend: Add imxrt700cm backend which combines the Neutron and CortexM backends #18488

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cortex-M: Fix pad op to support channels_last memory format#18429

Cortex-M: Fix pad op to support channels_last memory format#18429
rascani wants to merge 3 commits intopytorch:mainfrom
rascani:cortex-m-pad-channels-last

rascani commented Mar 23, 2026

Uh oh!

pytorch-bot bot commented Mar 23, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 23, 2026

Uh oh!

Uh oh!

AdrianLundell Mar 24, 2026

Uh oh!

rascani Mar 25, 2026

Uh oh!

AdrianLundell Mar 24, 2026

Uh oh!

rascani Mar 24, 2026

Uh oh!

rascani Mar 25, 2026

Uh oh!

AdrianLundell left a comment •

edited

Loading

Uh oh!

rascani left a comment

Uh oh!

rascani Mar 25, 2026

Uh oh!

Uh oh!

rascani Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rascani commented Mar 23, 2026

Summary

Test Plan

Uh oh!

pytorch-bot bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18429

❗ 1 Active SEVs

❌ 2 New Failures, 7 Unrelated Failures

Uh oh!

github-actions bot commented Mar 23, 2026

This PR needs a release notes: label

Uh oh!

Uh oh!

AdrianLundell Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

rascani Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

AdrianLundell Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

rascani Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

rascani Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

AdrianLundell left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rascani left a comment

Choose a reason for hiding this comment

Uh oh!

rascani Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rascani Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Mar 23, 2026 •

edited

Loading

This PR needs a `release notes:` label

AdrianLundell left a comment •

edited

Loading