CWE-112: Missing XML Validation
Learn about CWE-112 (Missing XML Validation), its security impact, exploitation methods, and prevention guidelines.
What is Missing XML Validation?
• Overview: Missing XML Validation occurs when an application accepts XML data from an untrusted source without validating it against a defined schema or Document Type Definition (DTD). This lack of validation allows potentially harmful or malformed data to be processed by the application, which can lead to unexpected behavior or security vulnerabilities.
• Exploitation Methods:
- Attackers can exploit this vulnerability by providing XML documents with unexpected or malicious content that the application is not prepared to handle.
- Common attack patterns include XML External Entity (XXE) attacks, schema poisoning, and injecting XML data designed to trigger errors or manipulate application logic.
• Security Impact:
- Direct consequences of successful exploitation can include unauthorized data access, denial of service, and execution of malicious payloads.
- Potential cascading effects include data corruption, application crashes, or further intrusion into the system through compromised data flows.
- Business impact may involve data breaches, loss of customer trust, legal liabilities, and financial damages due to disrupted operations or compromised data integrity.
• Prevention Guidelines:
- Specific code-level fixes include implementing strict XML validation against a defined schema or DTD before processing input data.
- Security best practices involve configuring XML parsers to disable external entity processing and using secure libraries that enforce validation.
- Recommended tools and frameworks include XML schema validators, such as Xerces for Java, and security-focused libraries that handle XML parsing securely, like the OWASP Java Encoder or .NET System.Xml.XmlReader with proper settings.
Technical Details
Likelihood of Exploit: Not specified
Affected Languages: Not Language-Specific
Affected Technologies: Not specified
Vulnerable Code Example
import xml.etree.ElementTree as ET
def parse_xml(xml_data):
# Vulnerable code: The XML from an untrusted source is parsed without validation
root = ET.fromstring(xml_data) # This line does not validate the XML against a schema
# Process the XML data
return root.find('data').text
# Example usage
user_input = "<data>Example</data>"
parsed_data = parse_xml(user_input)
print(parsed_data)
Explanation:
- The code above accepts XML input from an untrusted source.
- It directly parses the XML without validating it against a schema, which can lead to various attacks such as XML External Entity (XXE) attacks or data corruption.
- The use of
xml.etree.ElementTree
without validation allows potentially malicious XML content to be processed, which can be exploited by attackers.
How to fix Missing XML Validation?
To fix this vulnerability, it is crucial to validate the XML against a predefined schema, such as an XML Schema Definition (XSD). This ensures that the XML structure and content adhere to expected formats, preventing malicious data from being processed.
Specific Fixes:
- Use XML Schema Validation: Validate the XML against an XSD schema to ensure it follows the expected structure.
- Sanitize Input: Consider additional sanitization of XML input to remove or encode any unexpected elements or entities.
- Disable DTD Processing: Ensure that DTD processing is disabled to prevent external entities from being processed.
Fixed Code Example
from lxml import etree # Import lxml library for schema validation
def parse_xml(xml_data):
# Load the XSD schema
with open('schema.xsd', 'rb') as schema_file:
schema_root = etree.parse(schema_file)
schema = etree.XMLSchema(schema_root)
# Parse and validate XML against the schema
xml_doc = etree.fromstring(xml_data)
if not schema.validate(xml_doc): # Validate XML against the loaded schema
raise ValueError("Invalid XML data")
# Process the validated XML data
return xml_doc.find('data').text
# Example usage
user_input = "<data>Example</data>"
try:
parsed_data = parse_xml(user_input)
print(parsed_data)
except ValueError as e:
print(f"Error: {e}")
Explanation:
- Line {14}: XML input is now validated against a schema loaded from 'schema.xsd'.
- The
lxml
library is used for more robust XML processing and validation capabilities. - The code raises an error if the XML does not conform to the schema, ensuring only valid XML data is processed.
- This approach mitigates the risk of processing malicious XML content by enforcing strict adherence to the defined schema.