python decorator - file based cache

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
from hashlib import sha1
import os,pickle,time
def cache_disk(seconds = 900, cache_folder="/tmp"):
def doCache(f):
def inner_function(*args, **kwargs):
# calculate a cache key based on the decorated method signature
key = sha1(str(f.__module__) + str(f.__name__) + str(args) + str(kwargs)).hexdigest()
filepath = os.path.join(cache_folder, key)

# verify that the cached object exists and is less than $seconds old
if os.path.exists(filepath):
modified = os.path.getmtime(filepath)
age_seconds = time.time() - modified
if age_seconds < seconds:
return pickle.load(open(filepath, "rb"))

# call the decorated function...
result = f(*args, **kwargs)

# ... and save the cached object for next time
pickle.dump(result, open(filepath, "wb"))

return result
return inner_function
return doCache

@cache_disk(seconds = 900, cache_folder="/tmp")
def do_something_time_consuming(n,a):
d = {}
time.sleep(10)
d['name'] = n
d['age'] = int(a)
return d

print do_something_time_consuming(sys.argv[1],sys.argv[2])

let’s test

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
MacBook-Pro:~ min$ time python decorator_cache.py wu 21
{'age': 21, 'name': 'wu'}

real 0m10.032s
user 0m0.017s
sys 0m0.009s

MacBook-Pro:~ min$ time python decorator_cache.py wu 21
{'age': 21, 'name': 'wu'}

real 0m0.028s
user 0m0.017s
sys 0m0.009s
MacBook-Pro:~ min$ time python decorator_cache.py wu 21
{'age': 21, 'name': 'wu'}

real 0m0.028s
user 0m0.017s
sys 0m0.009s

MacBook-Pro:~ min$ ls /tmp
33e9d0f6e649a4c1229580fca2b2d9ca3d770731 e92558a7cd5dd5c082f7299486df5516f615c0f0
3412471a14fb9287155358053089a7d66b8d9fa

可以看到第二次传入相同参数的时候发现不再需要等10秒就能得到结果了,直接从file cache取到了结果.
在tmp目录也看到了对应的key生成的MD5文件名的cache文件.
这个装饰器可以适用于缓存比较小的结果集.如果结果集大了,python pickle模块序列化和反序列化一个比较大的对象是比较耗时的.
另外,如果做file based cache 可以把目录设置在 内存里面 比如/dev/shm这种里面.会加速i/o .