nginx conf parse by python

假设现在你接手一个nginx反向代理.你如何梳理出一个概览,知道目前代理了哪些站点.后端是哪些,有哪些url在nginx做了处理.

整理出一个类似如下的表格
server_names backend location
sitea.com backenda /auth
siteb.com backendb /rpc
sitec.com backendc /x/y/z

简单的办法就是用shell grep一些关键字,然后得出一些初步信息.
但是用shell格式化还有去除一些多余的字符串是比较琐碎的事情.
于是我写了一个简单的python 脚本来做这个.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
class Ngx_Conf_Summary(object):

def __init__(self,conf_file_path):
self.r = {}
self.backend = []
self.conf_path = conf_file_path
with open(self.conf_path,'r') as f:
self.conf_text = f.read().strip()
self.backend_pattern = r'upstream +(.+)+ {([^}]*)}'
#self.backend_host_pattern = r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d+|\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}'
self.backend_host_pattern = r'server+\s+.+[:]\d+|server+\s+[a-z1-9.+\w]+'
self.location_pattern = r'location (.+){'
self.server_name_pattern = r'server_name (.+);'
print "-----parse conf file: %s------"%self.conf_path

def get_server_names(self):
return "".join(re.findall(r'server_name (.+);',self.conf_text)).split()

def get_backend_hosts(self):
backend_list = re.match(self.backend_pattern,self.conf_text).group(2).split(';')
for backend_text in backend_list:
if backend_text.strip().startswith('#') or not backend_text:
pass
else:
backend_host = re.findall(self.backend_host_pattern,backend_text.strip())
backend_host = "".join(backend_host).replace('server ',"")
if backend_host:
self.backend.append(backend_host)
return self.backend

def get_location(self):
return re.findall(r'location (.+){',self.conf_text)

def summary(self):
self.r['file'] = self.conf_path
self.r['server_names'] = self.get_server_names()
self.r['location'] = self.get_location()
self.r['backends'] = self.get_backend_hosts()
return self.r

ngx_conf=Ngx_Conf_Summary('/tmp/2.vhost')
print ngx_conf.summary()
输出是这样的:
1
2
3
MacBook-Pro:~ min$ python ~/pycode/github/gangster/ngx_conf_parse.py
-----parse conf file: /tmp/2.vhost------
{'server_names': ['2012.site.com', '2015.site.com'], 'backends': ['10.0.7.10', 'upstreamhost1', 'upstreamvhost.v2.com', '10.0.7.5:80', '10.0.7.7:8081', '10.0.7.8'], 'location': ['= /50x.html ', '~ /\\.ht ', '~* ^/(busi|Business)/.*\\.(js|css|png|jpg|gif|ico|zip|rar|flv)$ ', '~* ^/.*\\.(js|css|png|jpg|gif)$ ', '/ ', '^~ /do_not_delete/ ', '~ /purge(/.*) ', '/xy '], 'file': '/tmp/2.vhost'}

这样能解析一个文件.但是解析多个也不难了

1
2
3
4
5
6
import glob
import ngx_conf_parse
conf_file_list = glob.glob(r"/usr/local/nginx/conf/*/*.vhost")
for conf in conf_file_list:
ngx_conf_parse(conf)
print ngx_conf_parse.summary()