Fix end_line incorrectly assigned from start_line in CSV output #4785 by codewithfourtix · Pull Request #4786 · aboutcode-org/scancode-toolkit

codewithfourtix · 2026-03-01T18:35:15Z

Fix `end_line` incorrectly assigned from `start_line` in CSV output

In flatten_scan() in src/formattedcode/output_csv.py, the end_line field for copyrights, holders, and authors was assigned copyr['start_line'] instead of copyr['end_line']. This caused the end_line column in CSV output to always equal start_line, losing the actual end line information for multi-line copyright notices.

The underlying data model (CopyrightDetection, HolderDetection, AuthorDetection in src/cluecode/copyrights.py) all define both start_line and end_line fields, and the JSON output correctly includes both values. Git blame traces this to a copy-paste error in commit ef8086d ("Update CSV output to latest copyright data format", ~2018).

Fixes #4785

Tasks

Reviewed contribution guidelines
PR is descriptively titled 📑 and links the original issue above 🔗
Tests pass -- look for a green checkbox ✔️ a few minutes after opening your PR
Run tests locally to check for errors.
Commits are in uniquely-named feature branch and has no merge conflicts 📁
Updated documentation pages (if applicable)
Updated CHANGELOG.rst (if applicable)

Copilot

Pull request overview

Fixes a long-standing CSV output bug where end_line for copyright-related detections was incorrectly set to start_line, causing CSV consumers to lose correct multi-line range information.

Changes:

Correct end_line assignment for copyrights rows in flatten_scan().
Correct end_line assignment for holders rows in flatten_scan().
Correct end_line assignment for authors rows in flatten_scan().

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-01T18:38:18Z

src/formattedcode/output_csv.py

        for copyr in scanned_file.get('copyrights', []):
            inf = dict(path=path)
            inf['copyright'] = copyr['copyright']
            inf['start_line'] = copyr['start_line']
-            inf['end_line'] = copyr['start_line']
+            inf['end_line'] = copyr['end_line']


This change fixes the CSV end_line values, but the existing flatten_scan fixture expectations still encode the buggy behavior for multi-line copyright detections (e.g. tests/formattedcode/data/csv/flatten_scan/full.json has a copyright with start_line 366 and end_line 367, while full.json-expected currently expects end_line 366). Please update/regenerate the affected *-expected fixtures (and ideally add a non-scanslow regression case where end_line > start_line) so tests validate the corrected behavior.

…code-org#4785 In flatten_scan(), the end_line field for copyrights, holders, and authors was assigned copyr['start_line'] instead of copyr['end_line']. This caused the end_line column in CSV output to always equal start_line, losing the actual end line information. Signed-off-by: codewithfourtix <codewithfourtix@gmail.com>

Copilot AI review requested due to automatic review settings March 1, 2026 18:35

Copilot started reviewing on behalf of codewithfourtix March 1, 2026 18:35 View session

Copilot AI reviewed Mar 1, 2026

View reviewed changes

codewithfourtix force-pushed the fix-csv-end-line-assignment branch from f05ff1d to 36c5d07 Compare March 2, 2026 19:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix end_line incorrectly assigned from start_line in CSV output #4785#4786

Fix end_line incorrectly assigned from start_line in CSV output #4785#4786
codewithfourtix wants to merge 1 commit intoaboutcode-org:developfrom
codewithfourtix:fix-csv-end-line-assignment

codewithfourtix commented Mar 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

codewithfourtix commented Mar 1, 2026