当前位置: 动力学知识库 > 问答 > 编程问答 >

php - best value for curl timeout and connection timeout

问题描述:

Greetings everyone

I am working on a small crawling engine and am using curl to request pages from various websites. Question is what do suggest should I set my connection_timeout and timeout values to? Stuff I would normally be crawling would be pages with lots of images and text.

网友答案:

cURL knows two different timeouts.
For CURLOPT_CONNECTTIMEOUT it doesn't matter how much text the site contains or how many other resources like images it references because this is a connection timeout and even the server cannot know about the size of the requested page until the connection is established.
For CURLOPT_TIMEOUT it does matter. Even large pages require only a few packets on the wire, but the server may need more time to assemble the output. Also the number of redirects and other things (e.g. proxies) can significantly increase response time.

Generally speaking the "best value" for timeouts depends on your requirements and conditions of the networks and servers. Those conditions are subject of change. Therefore there is no "one best value".
I recommend to use rather short timeouts and retry failed downloads later.

Btw cURL does not automatically download resources referenced in the response. You have to do this manually with further calls to curl_exec (with fresh timeouts).

网友答案:

The best response is the rik's one.

I have a Proxy Checker and in my benchmarks I saw that most of working Proxies takes less than 10 seconds to connect.

So I use 10 seconds for ConnectionTimeOut and TimeOut but that's in my case, you have to decide how many time you want to use so start with big values, use curl_getinfo to see time benchmarks and decrease the value.

Note: A proxy that takes more than 5 or 10 seconds to connect is useless for me, that's why I use that values.

网友答案:

Yes. If your target is a proxy to query another site, such a cascading connection will require fairly long period like these values to execute the curl calls.

Especially when you encountered intermittent curl problems, please check these values first.

网友答案:

If you set it too high then your script will be slow as a one url that is down will take all the time you set in CURLOPT_TIMEOUT to finish processing. If you are not using proxies then you can just set the following values

CURLOPT_TIMEOUT = 3 CURLOPT_CONNECTTIMEOUT = 1

Then you can go through failed urls at a later time to double check on them.

网友答案:

I use

     curl_setopt($ch, CURLOPT_CONNECTTIMEOUT,30);
curl_setopt($ch, CURLOPT_TIMEOUT,60);
分享给朋友:
您可能感兴趣的文章:
随机阅读: