《python爬虫》学习笔记:urllib2库的使用

来源:转载

最简单的爬虫代码

import urllib2response=urllib2.urlopen("http://www.baidu.com")print response.read()

上面的等价代码

#encoding=utf-8import urllib2request=urllib2.Request("http://www.baidu.com")#构造一个request对象实例response=urllib2.urlopen(request)print response.read()

POST和GET的使用

先看post提交数据的方法,如下:

#encoding=utf-8#post 方式传送数据import urllibimport urllib2values={'username':'[email protected]','password':'xxxx'}data=urllib.urlencode(values)#将提交的字典编码url="https://passport.csdn.net/account/login?from=http://my.csdn.net/my/mycsdn"request=urllib2.Request(url,data)response=urllib2.urlopen(request)print response.read()

get方式的传送数据

#encoding=utf-8import urllibimport urllib2#python中字典的另外一种写法values={}values["username"]="[email protected]"values["password"]="XXXX"data=urllib.urlencode(values)url="https://passport.csdn.net/account/login"geturl=url+"?"+data#get方式传送数据print geturlrequest=urllib2.Request(geturl)response=urllib2.urlopen(request)print response.read()

设置请求头

#encoding=utf-8#设置请求头import urllibimport urllib2user_agent="Mozilla/5.0 (Windows NT 6.1)"referer="http://www.zhihu.com/"header={"User-Agent":user_agent,"Referer":referer}url="http://www.zhihu.com/"values={"username":"wuranghao","password":"xxxx"}data=urllib.urlencode(values)request=urllib2.Request(url ,data ,header)response=urllib2.urlopen(request)print response.read()

URLError和HTTPError

#encoding=utf-8#访问错误的URL的抛异常的处理#首先解释下URLError可能产生的原因:# 网络无连接,即本机无法上网# 连接不到特定的服务器# 服务器不存在import urllib2url="http://wuranghao.com"request=urllib2.Request(url)try: response=urllib2.urlopen(request)except urllib2.URLError,e: print e.reason #输出 [Errno 11004] getaddrinfo failed
#encoding=utf-8#HTTPError的讲解:HTTPError是URLError的子类,在你利用urlopen方法发出一个请求时,#服务器上都会对应一个应答对象response,其中它包含一个数字”状态码”。import urllib2url="http://www.xingjiakmite.com"request=urllib2.Request(url)try: response=urllib2.urlopen(request)except urllib2.HTTPError,e: if hasattr(e,"code"): print e.code if hasattr(e,"reason"): print e.reasonexcept urllib2.URLError,e: if hasattr(e,"reson"): print e.reasonelse: print "OK"

分享给朋友:
您可能感兴趣的文章:
随机阅读: