Skip to content

Audit norm_case usage and consider separating path normalization responsibilities #278

@karthiknadig

Description

@karthiknadig

Background

As part of fixing #186 (Windows junctions not properly resolved), we replaced fs::canonicalize() with GetLongPathNameW in norm_case() to avoid resolving junctions/symlinks.

During this fix, we discovered that norm_case() was doing more than just case normalization - it was also:

  1. Converting relative paths to absolute paths
  2. Stripping trailing path separators (to match canonicalize behavior)
  3. Normalizing case

Current State

The function is defined in crates/pet-fs/src/path.rs and is called from ~20+ locations across the codebase:

  • crates/pet-conda/src/environment_locations.rs
  • crates/pet-poetry/src/environment_locations.rs
  • crates/pet-windows-store/src/environments.rs
  • crates/pet-pipenv/src/lib.rs
  • crates/pet-global-virtualenvs/src/lib.rs
  • crates/pet-python-utils/src/fs_cache.rs
  • crates/pet-windows-registry/src/environments.rs
  • And more...

Concern

The Windows Registry stores paths with trailing backslashes (e.g., C:\...\x64\). The old fs::canonicalize() silently stripped these. When we switched to GetLongPathNameW, we had to explicitly add trailing slash stripping to maintain backward compatibility.

This raises questions:

  1. Should path sources (like Windows Registry) sanitize their own output?
  2. Should norm_case() be split into separate functions with clear responsibilities?
  3. Are there other implicit behaviors from canonicalize() that callers depend on?

Suggested Audit Tasks

  • Review all norm_case() call sites to understand what each caller actually needs
  • Consider splitting into focused functions:
    • normalize_case() - Just case normalization
    • normalize_path() - Full path normalization (relative→absolute, trailing slashes, case)
  • Evaluate if path sources (Registry, conda rc files, etc.) should sanitize their own data
  • Add documentation for expected behavior

Related

Metadata

Metadata

Assignees

Labels

debtCode quality issues

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions