Cortex-M: Fix pad op to support channels_last memory format#18429
Cortex-M: Fix pad op to support channels_last memory format#18429rascani wants to merge 3 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18429
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 2 New Failures, 7 Unrelated FailuresAs of commit 637e2d2 with merge base 8f1b5ee ( NEW FAILURES - The following jobs have failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
Fix pad_meta to propagate channels_last from input to output tensor. Fix pad_out (C++) to use dim_order() to permute logical dims and padding into physical memory order for arm_pad_s8. Add channels_last test cases to test_pad.
backends/cortex_m/ops/op_pad.cpp
Outdated
| cmsis_nn_dims input_dims = {1, 1, 1, 1}; | ||
| int32_t* d = &input_dims.n; | ||
| // Permute the real dims according to dim_order. | ||
| const auto dim_order = input.dim_order(); |
There was a problem hiding this comment.
Are you sure this always works? I have had bad experiences with using dim_order generally, it seems to be very easy for it to default to channels-first when it cannot parse it from the dimensions, or when the fake implementation doesn't propagate it correctly. That is why there is a helper function for check if it is channels last instead,
There was a problem hiding this comment.
Nope, not confident. I've switched to the helper function.
| // arm_pad_s8 processes data in {n, h, w, c} order where c is the | ||
| // fastest-varying (innermost) dimension. Use dim_order to permute | ||
| // logical sizes and padding into physical memory order so this holds | ||
| // for both contiguous and channels_last tensors. | ||
| const size_t offset = kMaxSupportedDims - rank; |
There was a problem hiding this comment.
Did you consider changing this in the AOT flow instead? I generally like to keep the glue here as small as possible unless there is a good reason not to
There was a problem hiding this comment.
That's a great suggestion. Let me see if I can work that out.
There was a problem hiding this comment.
This added a good amount of complexity to the AOT flow, and we still have to do permutation of the input dims because the sizes are still in logical form. I'm a bit on the fence about it, so let me know what you think.
The C++ pad kernel previously used dim_order() at runtime to permute both logical sizes and padding into physical memory order. Move the padding permutation to the AOT fusion pass so pre_pad/post_pad arrive in physical order, reducing runtime work. The kernel still permutes sizes via is_channels_last_tensor (unavoidable — tensor API reports logical sizes). Co-authored-by: Claude <noreply@anthropic.com>
98c8c57 to
1e325c6
Compare
rascani
left a comment
There was a problem hiding this comment.
Thanks for the review Adrian!
| // arm_pad_s8 processes data in {n, h, w, c} order where c is the | ||
| // fastest-varying (innermost) dimension. Use dim_order to permute | ||
| // logical sizes and padding into physical memory order so this holds | ||
| // for both contiguous and channels_last tensors. | ||
| const size_t offset = kMaxSupportedDims - rank; |
There was a problem hiding this comment.
This added a good amount of complexity to the AOT flow, and we still have to do permutation of the input dims because the sizes are still in logical form. I'm a bit on the fence about it, so let me know what you think.
backends/cortex_m/ops/op_pad.cpp
Outdated
| cmsis_nn_dims input_dims = {1, 1, 1, 1}; | ||
| int32_t* d = &input_dims.n; | ||
| // Permute the real dims according to dim_order. | ||
| const auto dim_order = input.dim_order(); |
There was a problem hiding this comment.
Nope, not confident. I've switched to the helper function.
Summary
Fix pad_meta to propagate channels_last from input to output tensor.
Fix pad_out (C++) to use dim_order() to permute logical dims and padding
into physical memory order for arm_pad_s8.
Add channels_last test cases to test_pad.
Test Plan