Skip to content

Fix end_line incorrectly assigned from start_line in CSV output #4785#4786

Open
codewithfourtix wants to merge 1 commit intoaboutcode-org:developfrom
codewithfourtix:fix-csv-end-line-assignment
Open

Fix end_line incorrectly assigned from start_line in CSV output #4785#4786
codewithfourtix wants to merge 1 commit intoaboutcode-org:developfrom
codewithfourtix:fix-csv-end-line-assignment

Conversation

@codewithfourtix
Copy link

Fix end_line incorrectly assigned from start_line in CSV output

In flatten_scan() in src/formattedcode/output_csv.py, the end_line field for copyrights, holders, and authors was assigned copyr['start_line'] instead of copyr['end_line']. This caused the end_line column in CSV output to always equal start_line, losing the actual end line information for multi-line copyright notices.

The underlying data model (CopyrightDetection, HolderDetection, AuthorDetection in src/cluecode/copyrights.py) all define both start_line and end_line fields, and the JSON output correctly includes both values. Git blame traces this to a copy-paste error in commit ef8086d ("Update CSV output to latest copyright data format", ~2018).

Fixes #4785

Tasks

  • Reviewed contribution guidelines
  • PR is descriptively titled 📑 and links the original issue above 🔗
  • Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
    Run tests locally to check for errors.
  • Commits are in uniquely-named feature branch and has no merge conflicts 📁
  • Updated documentation pages (if applicable)
  • Updated CHANGELOG.rst (if applicable)

Copilot AI review requested due to automatic review settings March 1, 2026 18:35
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a long-standing CSV output bug where end_line for copyright-related detections was incorrectly set to start_line, causing CSV consumers to lose correct multi-line range information.

Changes:

  • Correct end_line assignment for copyrights rows in flatten_scan().
  • Correct end_line assignment for holders rows in flatten_scan().
  • Correct end_line assignment for authors rows in flatten_scan().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 184 to +188
for copyr in scanned_file.get('copyrights', []):
inf = dict(path=path)
inf['copyright'] = copyr['copyright']
inf['start_line'] = copyr['start_line']
inf['end_line'] = copyr['start_line']
inf['end_line'] = copyr['end_line']
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change fixes the CSV end_line values, but the existing flatten_scan fixture expectations still encode the buggy behavior for multi-line copyright detections (e.g. tests/formattedcode/data/csv/flatten_scan/full.json has a copyright with start_line 366 and end_line 367, while full.json-expected currently expects end_line 366). Please update/regenerate the affected *-expected fixtures (and ideally add a non-scanslow regression case where end_line > start_line) so tests validate the corrected behavior.

Copilot uses AI. Check for mistakes.
…code-org#4785

In flatten_scan(), the end_line field for copyrights, holders, and
authors was assigned copyr['start_line'] instead of copyr['end_line'].
This caused the end_line column in CSV output to always equal
start_line, losing the actual end line information.

Signed-off-by: codewithfourtix <codewithfourtix@gmail.com>
@codewithfourtix codewithfourtix force-pushed the fix-csv-end-line-assignment branch from f05ff1d to 36c5d07 Compare March 2, 2026 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: CSV output assigns start_line to end_line for copyrights, holders, and authors

2 participants