当前位置: 动力学知识库 > 问答 > 编程问答 >

authentication - scraping website on web proxy using python

问题描述:

I am working on scraping databases that I have access to using the duke library web proxy. I encountered the issue that since the data base is accessed through a proxy server, I can't directly scrape this database as I would if the database was did not require proxy authentication.

I tried several thing:

I wrote one script that logs into the duke network (https://shib.oit.duke.edu/idp/AuthnEngine').

I then hardcode in my login data:

login_data = urllib.urlencode({'j_username' : 'userxx',

'j_password' : 'passwordxx',

'Submit' : 'Enter'

})

I then login:

resp = opener.open('https://shib.oit.duke.edu/idp/AuthnEngine', login_data)

and then I create a cookie jar object to hold the cookies from proxy website.

then i try to access the database with my script and it is still telling me authentication is required. I wanted to know how I can get around the authentication required for the proxy server.

If you have any suggestions please let me know.

Thank you,

Jan

网友答案:

A proxy login does not store cookies but instead uses the Proxy-Authorization header. This header will need to be sent with every request similar to Cookies. The header is of the same format as regular Basic Authentication, although there are different formats possible (Digest, NTLM.) I suggest you check the headers of a normal login and copy and paste the Proxy-Authorization header that was sent.

分享给朋友:
您可能感兴趣的文章:
随机阅读: