CWE-185: Incorrect Regular Expression
Learn about CWE-185 (Incorrect Regular Expression), its security impact, exploitation methods, and prevention guidelines.
What is Incorrect Regular Expression?
• Overview: CWE-185 refers to a vulnerability where a regular expression is incorrectly specified, leading to improper data matching or comparison. This can occur in any software system using regular expressions for data validation or filtering, potentially allowing attackers to bypass restrictions.
• Exploitation Methods:
- Attackers can exploit this vulnerability by crafting input data that the incorrect regular expression fails to match as intended, thereby bypassing validation checks.
- Common attack patterns include SQL injection, cross-site scripting, or other injection attacks that leverage improperly validated input data.
• Security Impact:
- Direct consequences include unauthorized access or actions if input validation fails, leading to potential security breaches.
- Potential cascading effects include compromised data integrity, confidentiality breaches, and the ability to execute arbitrary code or commands.
- Business impact may involve financial loss, damage to reputation, legal liabilities, and loss of customer trust.
• Prevention Guidelines:
- Specific code-level fixes involve reviewing and testing regular expressions to ensure they match intended patterns accurately and do not allow bypass.
- Security best practices include validating regular expressions against a comprehensive set of test cases, including potential edge cases and attack patterns.
- Recommended tools and frameworks include static analysis tools to detect improper regular expressions and libraries that provide safer, well-tested alternatives for common validation tasks.
Technical Details
Likelihood of Exploit: Not specified
Affected Languages: Not Language-Specific
Affected Technologies: Not specified
Vulnerable Code Example
Python Example
import re
def search_phone_number(text):
# Vulnerable code: The regular expression is too permissive and incorrect
# This regex pattern mistakenly allows letters and special characters in the middle segment
phone_number_pattern = r'\d{3}[-.\s]?[a-zA-Z0-9]{3}[-.\s]?\d{4}' # {7}
match = re.search(phone_number_pattern, text) # {8}
return match.group() if match else None
text = "Contact me at 123-456-7890 or 123-abc-7890"
print(search_phone_number(text)) # Incorrectly matches "123-abc-7890"
In this vulnerable code example, the regular expression used for matching phone numbers is incorrect. It allows letters and special characters in the second segment of the phone number, which should only contain digits. This permissive pattern can lead to incorrect matches and potential security issues if the pattern is used for validation purposes.
How to fix Incorrect Regular Expression?
Fixed Code Example
Python Example
import re
def search_phone_number(text):
# Fixed code: The regular expression is now correctly defined
# The regex pattern correctly matches a phone number format with only digits
phone_number_pattern = r'\b\d{3}[-.\s]?\d{3}[-.\s]?\d{4}\b' # {7}
match = re.search(phone_number_pattern, text) # {8}
return match.group() if match else None
text = "Contact me at 123-456-7890 or 123-abc-7890"
print(search_phone_number(text)) # Correctly matches "123-456-7890" only
In the fixed code example, the regular expression has been updated to correctly interpret only valid phone numbers by:
- Using
\b
to ensure the pattern matches whole numbers and not parts of longer alphanumeric strings. - Strictly defining each segment of the phone number to contain exactly three or four digits, separated by optional hyphens, dots, or spaces.
- Ensuring the pattern does not allow alphabetic characters or other unintended characters, thus providing robust validation against malformed input.