整体流程图

PerFolder (每个目录)

比如有一个url为:http://xxxxxxxx/imcloud/static/seat/build/images/pic.jpg

拆解后的目录为:

http://10.125.20.39/
http://10.125.20.39/imcloud/
http://10.125.20.39/imcloud/static/
http://10.125.20.39/imcloud/static/seat/
http://10.125.20.39/imcloud/static/seat/build/
http://10.125.20.39/imcloud/static/seat/build/images/

然后以这些目录url为基础,拼接url去扫描

备份文件扫描:

原理:

获取来自服务器的原始套接字响应即通过文件头来识别。

>>> r = requests.get('https://api.github.com/events', stream=True)
>>> r.raw
<requests.packages.urllib3.response.HTTPResponse object at 0x101194810>
>>> r.raw.read(10)
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x00\x03'

上网找了一个rar,试了一下输出的文件头是b'PK\x03\x04\x14\x00\x00\x00\x08\x00'

插件里的注释

* rar:526172211a0700cf9073
* zip:504b0304140000000800
* gz：1f8b080000000000000b
* tar.gz: 1f8b0800

不同后缀的文件有不同的文件头特征.

有一份简单的备份文件的字典列表,'bak.rar', 'bak.zip', 'backup.rar', 'backup.zip', 'www.zip', 'www.rar', 'web.rar', 'web.zip', 'wwwroot.rar', 'wwwroot.zip', 'log.zip', 'log.rar'

字典这玩意可大可小

通过拼接url,如果status_code是200而且文件头符合,则判断为扫出了备份文件。

目录遍历:

找出一些常见的目录遍历的页面,html源码中有这些特征

"directory listing for"

"<title>directory"

"<head><title>index of"

'<table summary="directory listing"'

'last modified</a>'

如果在返回包的源码中发现这些特征,则判断这个页面有目录遍历漏洞。

敏感文件扫描:

敏感文件字典采集于bbscan

总的敏感文件有以下这些

/config.inc
/config.php.bak
/db.php.bak
/conf/config.ini
/config.ini
/config/config.ini
/configuration.ini
/configs/application.ini
/settings.ini
/application.ini
/conf.ini
/app.ini
/config.json
/a.out
/key
/keys
/key.txt
/temp.txt
/tmp.txt
/php.ini
/sftp-config.json
/index.php.bak
/.index.php.swp
/index.cgi.bak
/config.inc.php.bak
/.config.inc.php.swp
/config/.config.php.swp
/.config.php.swp
/.settings.php.swp
/.database.php.swp
/.db.php.swp
/.mysql.php.swp
/readme
/README
/readme.md
/readme.html
/changelog.txt
/%e6%9b%b4%e6%96%b0%e6%97%a5%e5%bf%97.txt
/www.log
/error.log
/log.log
/sql.log
/errors.log
/db.log
/data.log
/app.log
/ntunnel_mysql.php

实际上的字典的格式为{'path': '/config.inc', 'tag': '', 'content-type': '', 'content-type_no': 'html'}

tag: html源码特征
content-type: 文件拓展名
content-type_no: 文件拓展名黑名单

原理:

拼接url,发包访问,返回码为200且同时满足以下3种条件

符合html源码特征
符合文件拓展名
不在文件拓展名黑名单

.idea 工作目录解析:

原理:

url拼接上/.idea/workspace.xml

如果返回包源码中能正则匹配到<project version="\w+">，正则匹配版本号,则判定为 JetBrans .idea 泄漏.

phpinfo探测解析:

字典为

"phpinfo.php",
"pi.php",
"php.php",
"i.php",
"test.php",
"temp.php",
"info.php",

原理:

拼接Url,发包访问,如果返回包中含<title>phpinfo()</title>, 则判定为存在phpinfo文件

git svn bzr hg泄漏:

字典为

flag = {
    "/.svn/all-wcprops": "svn:wc:ra_dav:version-url",
    "/.git/config": 'repositoryformatversion[\s\S]*',
    "/.bzr/README": 'This\sis\sa\sBazaar[\s\S]',
    '/CVS/Root': ':pserver:[\s\S]*?:[\s\S]*',
    '/.hg/requires': '^revlogv1.*'
}

键为文件路径,值为正则匹配规则。

拼接url, 正则匹配返回包内容,匹配成功则判定为目录下有仓库源码泄漏漏洞

Sftp探测:

字典:

1 2	/sftp-config.json /recentservers.xml

正则匹配

("type":[\s\S]*?"host":[\s\S]*?"user":[\s\S]*?

"password":[\s\S]*"),(<Pass>[\s\S]*?<\/Pass>)

匹配到则认为找到sftp

WEB编辑器探测:

规则格式和判定方式和敏感文件扫描一样。

规则格式:{'path': '/fckeditor/_samples/default.html', 'tag': '<title>FCKeditor', 'content-type': 'html', 'content-type_no': ''}

原理: