Fix end_line incorrectly assigned from start_line in CSV output #4785#4786
Fix end_line incorrectly assigned from start_line in CSV output #4785#4786codewithfourtix wants to merge 1 commit intoaboutcode-org:developfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes a long-standing CSV output bug where end_line for copyright-related detections was incorrectly set to start_line, causing CSV consumers to lose correct multi-line range information.
Changes:
- Correct
end_lineassignment forcopyrightsrows inflatten_scan(). - Correct
end_lineassignment forholdersrows inflatten_scan(). - Correct
end_lineassignment forauthorsrows inflatten_scan().
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for copyr in scanned_file.get('copyrights', []): | ||
| inf = dict(path=path) | ||
| inf['copyright'] = copyr['copyright'] | ||
| inf['start_line'] = copyr['start_line'] | ||
| inf['end_line'] = copyr['start_line'] | ||
| inf['end_line'] = copyr['end_line'] |
There was a problem hiding this comment.
This change fixes the CSV end_line values, but the existing flatten_scan fixture expectations still encode the buggy behavior for multi-line copyright detections (e.g. tests/formattedcode/data/csv/flatten_scan/full.json has a copyright with start_line 366 and end_line 367, while full.json-expected currently expects end_line 366). Please update/regenerate the affected *-expected fixtures (and ideally add a non-scanslow regression case where end_line > start_line) so tests validate the corrected behavior.
…code-org#4785 In flatten_scan(), the end_line field for copyrights, holders, and authors was assigned copyr['start_line'] instead of copyr['end_line']. This caused the end_line column in CSV output to always equal start_line, losing the actual end line information. Signed-off-by: codewithfourtix <codewithfourtix@gmail.com>
f05ff1d to
36c5d07
Compare
Fix
end_lineincorrectly assigned fromstart_linein CSV outputIn
flatten_scan()insrc/formattedcode/output_csv.py, theend_linefield for copyrights, holders, and authors was assignedcopyr['start_line']instead ofcopyr['end_line']. This caused theend_linecolumn in CSV output to always equalstart_line, losing the actual end line information for multi-line copyright notices.The underlying data model (
CopyrightDetection,HolderDetection,AuthorDetectioninsrc/cluecode/copyrights.py) all define bothstart_lineandend_linefields, and the JSON output correctly includes both values. Git blame traces this to a copy-paste error in commitef8086d("Update CSV output to latest copyright data format", ~2018).Fixes #4785
Tasks
Run tests locally to check for errors.