fix: resolve 18 English ITN cased test failures for 100% pass rate by Alex-Wengg · Pull Request #10 · FluidInference/text-processing-rs

Alex-Wengg · 2026-03-13T02:32:39Z

Summary

cardinal: Preserve original casing for "Zero" and "Twelve" in cased mode (2 fixes)
electronic: Rewrite domain/email parsing to preserve original casing throughout, add stop-word detection and single-letter+word joining (11 fixes)
telephone: Add >11 digit formatting with space-separated groups (2 fixes)
time: Support prefix-preserving "X to Y" patterns with am/pm context, e.g. "set alarm at ten to eleven pm" → "set alarm at 10:50 p.m." (2 fixes)
Remove 4 dead functions from electronic.rs

Test plan

All 1,029 tests pass (0 failures)
cardinal_cased: 60/60 (was 58/60)
electronic_cased: 59/59 (was 48/59)
telephone + telephone_cased: 23/23 + 18/18 (was 21/23 + 17/18)
time + time_cased: 29/29 + 33/33 (was 28/29 + 32/33)
No regressions in any other language or module

- cardinal: preserve case for Zero/Twelve in cased mode - electronic: rewrite domain/email parsing to preserve original casing, add stop-word detection, single-letter+word joining, remove dead code - telephone: add >11 digit formatting (NNN NNNN [middle] NNNN) - time: support prefix-preserving "X to Y" with am/pm context

devin-ai-integration

Devin Review found 1 potential issue.

View 6 additional findings in Devin Review.

devin-ai-integration · 2026-03-13T02:45:11Z

src/asr/en/electronic.rs

+    let at_pos = input.find(" at ")?;
+    let orig_local = &original[..at_pos];
+    let orig_domain = &original[at_pos + 4..];


🟡 Byte-position mismatch when slicing original with offset from lowercased input in parse_email

The code finds at_pos in input (which is original.to_lowercase()) at src/asr/en/electronic.rs:49, then uses that byte offset to slice original at lines 50-51. For non-ASCII characters whose lowercase form has a different byte length (e.g., Turkish İ (2 bytes) → i̇ (3 bytes), or ẞ (3 bytes) → ß (2 bytes)), the byte position from the lowercased string does not correspond to the correct character boundary in original. This can cause either a panic (slicing at a non-UTF-8 boundary, e.g., input "testẞ at gmail dot com") or incorrect output (slicing at the wrong position, producing garbled local/domain parts).

Regression from safe primary path in old code

The old code had a safe primary path that split directly on original using original.splitn(2, " at "), only falling back to the byte-position approach when " at " wasn't literally lowercase in the original. The new code always uses the byte-position approach, and also newly slices orig_domain this way (line 51), which the old code didn't do.

Suggested change

let at_pos = input.find(" at ")?;

let orig_local = &original[..at_pos];

let orig_domain = &original[at_pos + 4..];

// Find " at " position in original (case-insensitive)

let at_pos = original.to_lowercase().find(" at ")?;

let orig_local = &original[..at_pos];

let orig_domain = &original[at_pos + 4..];

Was this helpful? React with 👍 or 👎 to provide feedback.

Alex-Wengg merged commit 98d0195 into main Mar 13, 2026
2 checks passed

devin-ai-integration bot reviewed Mar 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: resolve 18 English ITN cased test failures for 100% pass rate#10

fix: resolve 18 English ITN cased test failures for 100% pass rate#10
Alex-Wengg merged 1 commit intomainfrom
fix/en-itn-100

Alex-Wengg commented Mar 13, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Mar 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Alex-Wengg commented Mar 13, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Alex-Wengg commented Mar 13, 2026 •

edited by devin-ai-integration bot

Loading