delicious是个很不错的网摘收藏与分享的网站,它的标签化管理做的非常出色。我常常用它来收藏自己喜欢的一些文章。
但是我在学校里的用的是教育网,如果访问delicous就必须要用代理,可是一些免费的代理总不是很稳定,偶尔会登录不上去。而我在 delicious上收藏的文章一般都是国内的文章,是没必要代理就可以访问的。所有用delicious就有点不太方便了。而且delicous每页显 示的文章链接数只有10条,看完100条链接的话要点击10次,实在让我有点心烦。
因此我用python写了个脚本,可以将某个用户的分享的文章链接全部下载下来,放到一个HTML文件中,这要以后就可以不登记delicious就可以点击这些链接了,也可以算是一种备份吧。
废话不多说了,以下是python的代码。
Python语言: Codee#1012
01 #encoding=utf-8
02 import sys,urllib2,re
03
04 user = 'laphy'
05 def tag2html(name,url,description):
06 global user
07 file_name = "rewen.html"
08 f = open(file_name,"a")
09 f.write("\t<DT><A HREF=\""+url+"\">"+name+"</A>\n")
10 if description:
11 f.write("\t<DD>"+description+"</DD>\n")
12 f.close()
13
14 def main(agrv=None):
15 page_num = -1
16 base_url = "http://www.zhuaxia.com/indexFrame.php#showPopular(4,25,%d)"
17 url = base_url % page_num
18 #f = urllib2.urlopen(url)
19 #data = f.read()
20 #nums_re = '<span id="tagScopeCount">(\d+)</span>'
21 #link_count = (int)(re.search(nums_re,data).group(1))
22 #page_count = link_count/10 + 1
23 #for page_num in range(2,page_count+1):
24
25 while page_num<100:
26 page_num+=1
27 print page_num
28 url = base_url % page_num
29 data = urllib2.urlopen(url).read()
30 print data
31 #url_re = 'class="taggedlink " href="(.*)" >(.*)</a>'
32 url_re = 'href="(.*)" target="_blank" onclick="__305(500);">(.*)</a>'
33 urls = re.findall(url_re, data)
34 for i in urls:
35 tag2html(i[1],i[0],None)
36 print i[1]
37
38
39 if __name__ == "__main__":
40 sys.exit(main())
02 import sys,urllib2,re
03
04 user = 'laphy'
05 def tag2html(name,url,description):
06 global user
07 file_name = "rewen.html"
08 f = open(file_name,"a")
09 f.write("\t<DT><A HREF=\""+url+"\">"+name+"</A>\n")
10 if description:
11 f.write("\t<DD>"+description+"</DD>\n")
12 f.close()
13
14 def main(agrv=None):
15 page_num = -1
16 base_url = "http://www.zhuaxia.com/indexFrame.php#showPopular(4,25,%d)"
17 url = base_url % page_num
18 #f = urllib2.urlopen(url)
19 #data = f.read()
20 #nums_re = '<span id="tagScopeCount">(\d+)</span>'
21 #link_count = (int)(re.search(nums_re,data).group(1))
22 #page_count = link_count/10 + 1
23 #for page_num in range(2,page_count+1):
24
25 while page_num<100:
26 page_num+=1
27 print page_num
28 url = base_url % page_num
29 data = urllib2.urlopen(url).read()
30 print data
31 #url_re = 'class="taggedlink " href="(.*)" >(.*)</a>'
32 url_re = 'href="(.*)" target="_blank" onclick="__305(500);">(.*)</a>'
33 urls = re.findall(url_re, data)
34 for i in urls:
35 tag2html(i[1],i[0],None)
36 print i[1]
37
38
39 if __name__ == "__main__":
40 sys.exit(main())
0 Comments:
Subscribe to:
博文评论 (Atom)