FIX: VARCHAR fetch fails when data length equals column size with non-ASCII CP1252 characters by subrata-ms · Pull Request #444 · microsoft/mssql-python

subrata-ms · 2026-02-23T11:28:06Z

Work Item / Issue Reference

AB#42604

GitHub Issue: #435

Summary

This pull request improves the handling of character encoding and buffer sizing for SQL CHAR/VARCHAR data in the ODBC Python bindings, especially for cross-platform compatibility between Linux/macOS and Windows. The changes ensure that character data is decoded correctly and that buffer sizes are sufficient to prevent corruption or truncation when dealing with multi-byte UTF-8 data returned by the ODBC driver on non-Windows systems.

Character Encoding Handling:

Introduced the GetEffectiveCharDecoding function to determine the correct decoding to use for SQL CHAR data: always UTF-8 on Linux/macOS (since the ODBC driver returns UTF-8), and the user-specified encoding on Windows. This function is now used consistently throughout the codebase to select the decoding method. [1] [2] [3] [4] [5] [6]

Buffer Sizing for UTF-8:

Updated buffer allocation logic to use columnSize * 4 + 1 for SQL CHAR/VARCHAR columns on Linux/macOS, accounting for the worst-case UTF-8 expansion (up to 4 bytes per character), preventing data truncation when multi-byte characters are present at the column boundary. [1] [2] [3]

Decoding and Data Fetching:

Modified all data fetching and decoding paths (FetchLobColumnData, SQLGetData_wrap, ProcessChar, and batch fetch functions) to use the effective character encoding and the correct buffer sizes, ensuring consistent and correct decoding regardless of platform. [1] [2] [3] [4]

API Changes:

Updated the FetchBatchData function and its callers to accept and propagate the character encoding parameter, ensuring the encoding context is preserved throughout the data-fetching stack. [1] [2] [3]

Minor Fixes:

Minor formatting and logging improvements for error messages and function signatures.

These changes collectively improve correctness and reliability when handling string data from SQL databases, especially in multi-platform environments.

CP1252 VARCHAR Boundary Fix — Summary
Problem:: VARCHAR columns with CP1252 non-ASCII characters (e.g., é, ñ, ö) returned corrupted data when the string length exactly equaled the column size. Inserting "café René!" into VARCHAR(10) returned "©!".

Root Cause::
Three bugs in ddbc_bindings.cpp:

Undersized buffer — SQLGetData / SQLBindCol allocated columnSize + 1 bytes, but on Linux/macOS the ODBC driver converts server data to UTF-8 where CP1252 é (1 byte) becomes 0xC3 0xA9 (2 bytes). A 10-char string with 2 accented characters needs 12 bytes, exceeding the 11-byte buffer → truncation → LOB fallback re-reads consumed data → corruption.

Wrong decode encoding — After fixing the buffer, data arrived intact but was decoded with the user's charEncoding (CP1252) instead of UTF-8. Since ODBC on Linux/macOS already converts to UTF-8, double-interpreting as CP1252 produced mojibake (cafÃ© RenÃ©!).

ProcessChar assumed UTF-8 on all platforms — The batch/fetchall hot path used PyUnicode_FromStringAndSize which assumes UTF-8 input. Correct on Linux (ODBC returns UTF-8), but wrong on Windows (ODBC returns native encoding like CP1252).

github-actions · 2026-02-23T11:49:39Z

📊 Code Coverage Report

🔥 Diff Coverage 68%	🎯 Overall Coverage 76%	📈 Total Lines Covered: `5579` out of `7258` 📁 Project: `mssql-python`

Diff Coverage

Diff: main...HEAD, staged and unstaged changes

mssql_python/pybind/ddbc_bindings.cpp (89.5%): Missing lines 1170-1171,2904,3003
mssql_python/pybind/ddbc_bindings.h (26.3%): Missing lines 839-847,853-857

Summary

Total: 57 lines
Missing: 18 lines
Coverage: 68%

mssql_python/pybind/ddbc_bindings.cpp

Lines 1166-1175

  1166     if (_type != SQL_HANDLE_STMT) {
  1167         // Log error but don't throw - we're likely in cleanup/destructor path
  1168         LOG_ERROR("SAFETY VIOLATION: Attempted to mark non-STMT handle as implicitly freed. "
  1169                   "Handle type=%d. This will cause handle leak. Only STMT handles are "
! 1170                   "automatically freed by parent DBC handles.",
! 1171                   _type);
  1172         return;  // Refuse to mark - let normal free() handle it
  1173     }
  1174     _implicitly_freed = true;
  1175 }

Lines 2900-2908

  2900             effectiveCharEncoding.c_str(), buffer.size(), py::len(decoded), colIndex);
  2901         return decoded;
  2902     } catch (const py::error_already_set& e) {
  2903         LOG_ERROR("FetchLobColumnData: Failed to decode with '%s' for column %d: %s",
! 2904                   effectiveCharEncoding.c_str(), colIndex, e.what());
  2905         // Return raw bytes as fallback
  2906         return raw_bytes;
  2907     }
  2908 }

Lines 2999-3007

  2999                                         py::len(decoded));
  3000                                 } catch (const py::error_already_set& e) {
  3001                                     LOG_ERROR(
  3002                                         "SQLGetData: Failed to decode CHAR column %d with '%s': %s",
! 3003                                         i, decodeEncoding.c_str(), e.what());
  3004                                     // Return raw bytes as fallback
  3005                                     row.append(raw_bytes);
  3006                                 }
  3007                             } else {

mssql_python/pybind/ddbc_bindings.h

  835         if (!pyStr) {
  836             // Decode failed — fall back to returning raw bytes (consistent with
  837             // FetchLobColumnData and SQLGetData_wrap which also return raw bytes
  838             // on decode failure instead of silently converting to None).
! 839             PyErr_Clear();
! 840             PyObject* pyBytes = PyBytes_FromStringAndSize(dataPtr, numCharsInData);
! 841             if (pyBytes) {
! 842                 PyList_SET_ITEM(row, col - 1, pyBytes);
! 843             } else {
! 844                 PyErr_Clear();
! 845                 Py_INCREF(Py_None);
! 846                 PyList_SET_ITEM(row, col - 1, Py_None);
! 847             }
  848         } else {
  849             PyList_SET_ITEM(row, col - 1, pyStr);
  850         }
  851     } else {

  849             PyList_SET_ITEM(row, col - 1, pyStr);
  850         }
  851     } else {
  852         // Slow path: LOB data requires separate fetch call
! 853         PyList_SET_ITEM(
! 854             row, col - 1,
! 855             FetchLobColumnData(hStmt, col, SQL_C_CHAR, false, false, colInfo->charEncoding)
! 856                 .release()
! 857                 .ptr());
  858     }
  859 }
  860 
  861 // Process SQL NCHAR/NVARCHAR (wide/Unicode string) column into Python str

📋 Files Needing Attention

📉 Files with overall lowest coverage (click to expand)

mssql_python.pybind.logger_bridge.hpp: 58.8%
mssql_python.pybind.logger_bridge.cpp: 59.2%
mssql_python.row.py: 66.2%
mssql_python.pybind.ddbc_bindings.h: 67.8%
mssql_python.pybind.ddbc_bindings.cpp: 69.7%
mssql_python.pybind.connection.connection.cpp: 75.3%
mssql_python.ddbc_bindings.py: 79.6%
mssql_python.pybind.connection.connection_pool.cpp: 79.6%
mssql_python.cursor.py: 84.8%
mssql_python.__init__.py: 84.9%

🔗 Quick Links

⚙️ Build Summary	📋 Coverage Details
View Azure DevOps Build	Browse Full Coverage Report

Copilot

Pull request overview

This PR fixes a cross-platform fetch bug for CHAR/VARCHAR values at the exact column-size boundary when non-ASCII characters are involved (notably with CP1252), by adjusting buffer sizing and ensuring consistent, platform-appropriate decoding in the ODBC C++ bindings. It also adds regression coverage to prevent the issue from recurring.

Changes:

Add platform-aware “effective” decoding selection for SQL_C_CHAR (UTF-8 on Linux/macOS, user-configured encoding on Windows) and propagate it through fetch paths.
Increase SQL_C_CHAR fetch/bind buffer sizes on Linux/macOS to handle worst-case UTF-8 expansion at column boundaries.
Add a dedicated regression test suite for CP1252 VARCHAR boundary cases and a small decoding config update in the encoding test suite.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
tests/test_017_varchar_cp1252_boundary.py	New regression tests for CP1252 non-ASCII `VARCHAR` values at exact-length boundaries across multiple fetch paths.
tests/test_013_encoding_decoding.py	Adds `setdecoding(SQL_CHAR, "utf-8")` to align decoding configuration in an existing parameter encoding test.
mssql_python/pybind/ddbc_bindings.h	Updates `ProcessChar` fast/slow paths to decode correctly on Windows and to pass encoding through LOB fetch.
mssql_python/pybind/ddbc_bindings.cpp	Introduces `GetEffectiveCharDecoding`, adjusts buffer sizing and decoding in `SQLGetData_wrap`, and propagates encoding into batch fetch metadata.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mssql_python/pybind/ddbc_bindings.cpp

mssql_python/pybind/ddbc_bindings.h

mssql_python/pybind/ddbc_bindings.cpp

tests/test_017_varchar_cp1252_boundary.py

mssql_python/pybind/ddbc_bindings.cpp

mssql_python/pybind/ddbc_bindings.h

mssql_python/pybind/ddbc_bindings.cpp

base fix for the issue

82c580f

github-actions bot added the pr-size: medium Moderate update size label Feb 23, 2026

subrata-ms and others added 2 commits February 23, 2026 17:00

Merge branch 'main' into subrata-ms/CP1252_Conversion

70c00bc

fix for linting and additional changes

016e549

subrata-ms added 2 commits February 23, 2026 11:50

fixing build failure

e324c29

test fix for windows

90d0919

subrata-ms marked this pull request as ready for review February 23, 2026 12:50

subrata-ms requested review from bewithgaurav, Copilot, gargsaumya, jahnvi480 and sumitmsft February 23, 2026 12:50

Copilot started reviewing on behalf of subrata-ms February 23, 2026 12:50 View session

Copilot AI reviewed Feb 23, 2026

View reviewed changes

subrata-ms and others added 3 commits February 24, 2026 10:10

Merge branch 'main' into subrata-ms/CP1252_Conversion

f89ad95

copilot review comments

ef8af68

improving test coverage

645ff6a

github-actions bot added pr-size: large Substantial code update and removed pr-size: medium Moderate update size labels Feb 24, 2026

sumitmsft reviewed Feb 24, 2026

View reviewed changes

mssql_python/pybind/ddbc_bindings.cpp Show resolved Hide resolved

sumitmsft reviewed Feb 24, 2026

View reviewed changes

mssql_python/pybind/ddbc_bindings.cpp Show resolved Hide resolved

sumitmsft reviewed Feb 24, 2026

View reviewed changes

mssql_python/pybind/ddbc_bindings.h Outdated Show resolved Hide resolved

gargsaumya requested changes Feb 25, 2026

View reviewed changes

mssql_python/pybind/ddbc_bindings.cpp Show resolved Hide resolved

mssql_python/pybind/ddbc_bindings.cpp Show resolved Hide resolved

subrata-ms and others added 2 commits February 26, 2026 14:30

Merge branch 'main' into subrata-ms/CP1252_Conversion

dcbe1a5

review comment

5f22a62

sumitmsft approved these changes Feb 27, 2026

View reviewed changes

gargsaumya approved these changes Feb 27, 2026

View reviewed changes

Merge branch 'main' into subrata-ms/CP1252_Conversion

d3763cd

gargsaumya merged commit 249cfdf into main Feb 27, 2026
30 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: VARCHAR fetch fails when data length equals column size with non-ASCII CP1252 characters#444

FIX: VARCHAR fetch fails when data length equals column size with non-ASCII CP1252 characters#444
gargsaumya merged 11 commits intomainfrom
subrata-ms/CP1252_Conversion

subrata-ms commented Feb 23, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 23, 2026 •

edited

Loading

🔥 Diff Coverage

68%

🎯 Overall Coverage

76%

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

subrata-ms commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Work Item / Issue Reference

Summary

Uh oh!

github-actions bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Code Coverage Report

🔥 Diff Coverage

68%

🎯 Overall Coverage

76%

Diff Coverage

Diff: main...HEAD, staged and unstaged changes

Summary

mssql_python/pybind/ddbc_bindings.cpp

mssql_python/pybind/ddbc_bindings.h

📋 Files Needing Attention

🔗 Quick Links

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

subrata-ms commented Feb 23, 2026 •

edited

Loading

github-actions bot commented Feb 23, 2026 •

edited

Loading