CWE-135: Incorrect Calculation of Multi-Byte String Length
Learn about CWE-135 (Incorrect Calculation of Multi-Byte String Length), its security impact, exploitation methods, and prevention guidelines.
What is Incorrect Calculation of Multi-Byte String Length?
• Overview: Incorrect Calculation of Multi-Byte String Length occurs when a program fails to accurately compute the length of strings containing wide or multi-byte characters, leading to potential errors or vulnerabilities.
• Exploitation Methods:
- Attackers can exploit this vulnerability to manipulate string lengths, potentially causing buffer overflows.
- Common attack patterns include inserting malicious payloads into improperly sized buffers and exploiting off-by-one errors.
• Security Impact:
- Direct consequences can include buffer overflows, which may result in arbitrary code execution or crashes.
- Potential cascading effects include denial of service or unauthorized access if memory corruption occurs.
- Business impact may involve data breaches, service downtime, or loss of customer trust.
• Prevention Guidelines:
- Specific code-level fixes include using functions that handle wide or multi-byte strings correctly, such as wcslen in C for wide strings.
- Security best practices involve validating string length calculations and ensuring proper buffer sizing.
- Recommended tools and frameworks include static analysis tools that detect improper calculations and using libraries that provide safe string handling functions.
Corgea can automatically detect and fix Incorrect Calculation of Multi-Byte String Length in your codebase. Try Corgea free today.
Technical Details
Likelihood of Exploit: Not specified
Affected Languages: C, C++
Affected Technologies: Not specified
Vulnerable Code Example
C Example for CWE-135
#include <stdio.h>
#include <string.h>
// Vulnerable function that incorrectly calculates the length of a multi-byte string
void printStringLength(const char *str) {
// Using strlen() to calculate the length of a multi-byte string
size_t length = strlen(str);
printf("Length of the string: %zu\n", length);
}
int main() {
// Example multi-byte string (UTF-8)
const char *multiByteStr = "こんにちは"; // "Hello" in Japanese
printStringLength(multiByteStr);
return 0;
}
Explanation
- Vulnerability: The code uses
strlen()
to calculate the length of a multi-byte string.strlen()
returns the number of bytes, not the number of characters, for multi-byte strings. This can lead to incorrect length calculation and potential buffer overflows or logic errors when the string contains multi-byte characters.
How to fix Incorrect Calculation of Multi-Byte String Length?
To properly handle multi-byte strings, use functions that are designed to work with wide or multi-byte character encodings. In C, the mbstowcs()
function can be used to convert a multi-byte string to a wide-character string, and then wcslen()
can be used to calculate the number of wide characters (not bytes). This approach ensures that the length calculation is accurate for multi-byte characters.
Key Fixes:
- Convert the multi-byte string to a wide-character string using
mbstowcs()
. - Calculate the length of the wide-character string using
wcslen()
.
Fixed Code Example
#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>
#include <string.h>
// Correct function to calculate the length of a multi-byte string
void printStringLength(const char *str) {
// Calculate the required size for the wide-character string
size_t wide_len = mbstowcs(NULL, str, 0);
if (wide_len == (size_t)-1) {
perror("Conversion error");
return;
}
printf("Number of characters in the string: %zu\n", wide_len);
}
int main() {
// Example multi-byte string (UTF-8)
const char *multiByteStr = "こんにちは"; // "Hello" in Japanese
printStringLength(multiByteStr);
return 0;
}
Explanation
- Fix: The
mbstowcs()
function is used to determine the number of characters in the multi-byte string by converting it to a wide-character string. This ensures that the actual character count is calculated, not just the byte count, which is essential for accurate string length determination in internationalized applications.
Additional Notes:
- The fixed code correctly handles potential conversion errors by checking the return value of
mbstowcs()
. - It's important to ensure that the locale is set appropriately before using these functions to ensure correct behavior with multi-byte character sets. This can typically be done with
setlocale(LC_CTYPE, "")
if needed.