Skip to content

Cortex-M: Fix pad op to support channels_last memory format#18429

Open
rascani wants to merge 3 commits intopytorch:mainfrom
rascani:cortex-m-pad-channels-last
Open

Cortex-M: Fix pad op to support channels_last memory format#18429
rascani wants to merge 3 commits intopytorch:mainfrom
rascani:cortex-m-pad-channels-last

Conversation

@rascani
Copy link
Contributor

@rascani rascani commented Mar 23, 2026

Summary

Fix pad_meta to propagate channels_last from input to output tensor.
Fix pad_out (C++) to use dim_order() to permute logical dims and padding
into physical memory order for arm_pad_s8.
Add channels_last test cases to test_pad.

Test Plan

pytest backends/cortex_m/test/ops/test_pad.py

@rascani rascani requested review from AdrianLundell and psiddh March 23, 2026 23:06
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 23, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18429

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 2 New Failures, 7 Unrelated Failures

As of commit 637e2d2 with merge base 8f1b5ee (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 23, 2026
@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Fix pad_meta to propagate channels_last from input to output tensor.
Fix pad_out (C++) to use dim_order() to permute logical dims and padding
into physical memory order for arm_pad_s8.
Add channels_last test cases to test_pad.
cmsis_nn_dims input_dims = {1, 1, 1, 1};
int32_t* d = &input_dims.n;
// Permute the real dims according to dim_order.
const auto dim_order = input.dim_order();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this always works? I have had bad experiences with using dim_order generally, it seems to be very easy for it to default to channels-first when it cannot parse it from the dimensions, or when the fake implementation doesn't propagate it correctly. That is why there is a helper function for check if it is channels last instead,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, not confident. I've switched to the helper function.

// arm_pad_s8 processes data in {n, h, w, c} order where c is the
// fastest-varying (innermost) dimension. Use dim_order to permute
// logical sizes and padding into physical memory order so this holds
// for both contiguous and channels_last tensors.
const size_t offset = kMaxSupportedDims - rank;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider changing this in the AOT flow instead? I generally like to keep the glue here as small as possible unless there is a good reason not to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great suggestion. Let me see if I can work that out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This added a good amount of complexity to the AOT flow, and we still have to do permutation of the input dims because the sizes are still in logical form. I'm a bit on the fence about it, so let me know what you think.

Copy link
Collaborator

@AdrianLundell AdrianLundell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! We should probably have channels-first and channels-last tests for all operators

The C++ pad kernel previously used dim_order() at runtime to permute
both logical sizes and padding into physical memory order. Move the
padding permutation to the AOT fusion pass so pre_pad/post_pad arrive
in physical order, reducing runtime work. The kernel still permutes
sizes via is_channels_last_tensor (unavoidable — tensor API reports
logical sizes).

Co-authored-by: Claude <noreply@anthropic.com>
@rascani rascani force-pushed the cortex-m-pad-channels-last branch from 98c8c57 to 1e325c6 Compare March 25, 2026 00:14
Copy link
Contributor Author

@rascani rascani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review Adrian!

// arm_pad_s8 processes data in {n, h, w, c} order where c is the
// fastest-varying (innermost) dimension. Use dim_order to permute
// logical sizes and padding into physical memory order so this holds
// for both contiguous and channels_last tensors.
const size_t offset = kMaxSupportedDims - rank;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This added a good amount of complexity to the AOT flow, and we still have to do permutation of the input dims because the sizes are still in logical form. I'm a bit on the fence about it, so let me know what you think.

cmsis_nn_dims input_dims = {1, 1, 1, 1};
int32_t* d = &input_dims.n;
// Permute the real dims according to dim_order.
const auto dim_order = input.dim_order();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, not confident. I've switched to the helper function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants