├── README ├── README.md ├── parallelcurl.php └── test.php /README: -------------------------------------------------------------------------------- 1 | ParallelCurl 2 | ~~~~~~~~~~~~~~~ 3 | 4 | This module provides an easy-to-use interface to allow you to run multiple CURL url fetches in parallel in PHP. 5 | 6 | ******************************************************************************** 7 | * I've had reports of problems that appear to be related to changes in * 8 | * curl_multi's behavior. * 9 | * I'm no longer using PHP so I can't verify what's going wrong, but @marcushat * 10 | * has kindly provided a port with fixes: * 11 | * https://github.com/marcushat/rollingcurlx * 12 | * If you are hitting issues, please give it a try! * 13 | * Pete Warden - Dec 16th 2014 * 14 | ******************************************************************************** 15 | 16 | To test it, go to the command line, cd to this folder and run 17 | 18 | ./test.php 19 | 20 | This should run 100 searches through Google's API, printing the results. To see what sort of 21 | performance difference running parallel requests gets you, try altering the default of 10 requests 22 | running in parallel using the optional script argument, and timing how long each takes: 23 | 24 | time ./test.php 1 25 | time ./test.php 20 26 | 27 | The first only allows one request to run at once, serializing the calls. I see this taking around 28 | 100 seconds. The second run has 20 in flight at a time, and takes 11 seconds! Be warned though, 29 | it's possible to overwhelm your target if you fire too many requests at once. You may end up 30 | with your IP banned from accessing that server, or hit other API limits. 31 | 32 | The class is designed to make it easy to run multiple curl requests in parallel, rather than 33 | waiting for each one to finish before starting the next. Under the hood it uses curl_multi_exec 34 | but since I find that interface painfully confusing, I wanted one that corresponded to the tasks 35 | that I wanted to run. 36 | 37 | To use it, first copy parallelcurl.php and include it, then create the ParallelCurl object: 38 | 39 | $parallelcurl = new ParallelCurl(10); 40 | 41 | The first argument to the constructor is the maximum number of outstanding fetches to allow 42 | before blocking to wait for one to finish. You can change this later using setMaxRequests() 43 | The second optional argument is an array of curl options in the format used by curl_setopt_array() 44 | 45 | Next, start a URL fetch: 46 | 47 | $parallelcurl->startRequest('http://example.com', 'on_request_done', array('something')); 48 | 49 | The first argument is the address that should be fetched 50 | The second is the callback function that will be run once the request is done 51 | The third is a 'cookie', that can contain arbitrary data to be passed to the callback 52 | 53 | This startRequest call will return immediately, as long as less than the maximum number of 54 | requests are outstanding. Once the request is done, the callback function will be called, eg: 55 | 56 | on_request_done($content, 'http://example.com', $ch, array('something)); 57 | 58 | The callback should take four arguments. The first is a string containing the content found at 59 | the URL. The second is the original URL requested, the third is the curl handle of the request that 60 | can be queried to get the results, and the fourth is the arbitrary 'cookie' value that you 61 | associated with this object. This cookie contains user-defined data. 62 | 63 | There's an optional fourth parameter to startRequest. If you pass in an array at that position in 64 | the arguments, the POST method will be used instead, with the contents of the array controlling the 65 | contents of the POST parameters. 66 | 67 | Since you may have requests outstanding at the end of your script, you *MUST* call 68 | 69 | $parallelcurl->finishAllRequests(); 70 | 71 | before you exit. If you don't, the final requests may be left unprocessed! 72 | 73 | By Pete Warden , freely reusable, see http://petewarden.typepad.com for more 74 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ParallelCurl 2 | ---- 3 | 4 | This module provides an easy-to-use interface to allow you to run multiple CURL url fetches in parallel in PHP. 5 | 6 | ## Disclaimer 7 | I've had reports of problems that appear to be related to changes in curl_multi's behavior. I'm no longer using PHP so I can't verify what's going wrong, but @marcushat has kindly provided a port with fixes: https://github.com/marcushat/rollingcurlx. 8 | 9 | If you are hitting issues, please give it a try! 10 | Pete Warden - Dec 16th 2014 11 | 12 | ## Testing 13 | To test it, go to the command line, cd to this folder and run 14 | 15 | `./test.php` 16 | 17 | This should run 100 searches through Google's API, printing the results. To see what sort of 18 | performance difference running parallel requests gets you, try altering the default of 10 requests 19 | running in parallel using the optional script argument, and timing how long each takes: 20 | 21 | `time ./test.php 1` 22 | 23 | `time ./test.php 20` 24 | 25 | * The first only allows one request to run at once, serializing the calls. I see this taking around 26 | 100 seconds. 27 | 28 | * The second run has 20 in flight at a time, and takes 11 seconds! Be warned though, 29 | it's possible to overwhelm your target if you fire too many requests at once. You may end up with your IP banned from accessing that server, or hit other API limits. 30 | 31 | The class is designed to make it easy to run multiple curl requests in parallel, rather than waiting for each one to finish before starting the next. Under the hood it uses `curl_multi_exec` but since I find that interface painfully confusing, I wanted one that corresponded to the tasks that I wanted to run. 32 | 33 | ## Usage 34 | 35 | To use it, first copy `parallelcurl.php` and include it, then create the `ParallelCurl` object: 36 | 37 | ```php 38 | $parallelcurl = new ParallelCurl(10); 39 | ``` 40 | 41 | The first argument to the constructor is the maximum number of outstanding fetches to allow 42 | before blocking to wait for one to finish. You can change this later using `setMaxRequests()`. 43 | 44 | The second optional argument is an array of curl options in the format used by `curl_setopt_array()` 45 | 46 | Next, start a URL fetch: 47 | 48 | ```php 49 | $parallelcurl->startRequest('http://example.com', 'on_request_done', array('something')); 50 | ``` 51 | 52 | The first argument is the address that should be fetched 53 | The second is the callback function that will be run once the request is done. 54 | The third is a 'cookie', that can contain arbitrary data to be passed to the callback. 55 | 56 | This `startRequest` call will return immediately, as long as less than the maximum number of 57 | requests are outstanding. Once the request is done, the callback function will be called, eg: 58 | 59 | ```php 60 | on_request_done($content, 'http://example.com', $ch, array('something)); 61 | ``` 62 | 63 | The callback should take four arguments. The first is a string containing the content found at the URL. The second is the original URL requested, the third is the curl handle of the request that 64 | can be queried to get the results, and the fourth is the arbitrary 'cookie' value that you 65 | associated with this object. This cookie contains user-defined data. 66 | 67 | There's an optional fourth parameter to startRequest. If you pass in an array at that position in 68 | the arguments, the POST method will be used instead, with the contents of the array controlling the 69 | contents of the POST parameters. 70 | 71 | Since you may have requests outstanding at the end of your script, you *MUST* call 72 | 73 | ```php 74 | $parallelcurl->finishAllRequests(); 75 | ``` 76 | 77 | before you exit. If you don't, the final requests may be left unprocessed! 78 | 79 | ## Credits 80 | 81 | By Pete Warden , freely reusable, see http://petewarden.typepad.com for more -------------------------------------------------------------------------------- /parallelcurl.php: -------------------------------------------------------------------------------- 1 | startRequest('http://example.com', 'on_request_done', array('something')); 19 | // 20 | // The first argument is the address that should be fetched 21 | // The second is the callback function that will be run once the request is done 22 | // The third is a 'cookie', that can contain arbitrary data to be passed to the callback 23 | // 24 | // This startRequest call will return immediately, as long as less than the maximum number of 25 | // requests are outstanding. Once the request is done, the callback function will be called, eg: 26 | // 27 | // on_request_done($content, 'http://example.com', $ch, array('something')); 28 | // 29 | // The callback should take four arguments. The first is a string containing the content found at 30 | // the URL. The second is the original URL requested, the third is the curl handle of the request that 31 | // can be queried to get the results, and the fourth is the arbitrary 'cookie' value that you 32 | // associated with this object. This cookie contains user-defined data. 33 | // 34 | // By Pete Warden , freely reusable, see http://petewarden.typepad.com for more 35 | 36 | class ParallelCurl { 37 | 38 | public $max_requests; 39 | public $options; 40 | 41 | public $outstanding_requests; 42 | public $multi_handle; 43 | 44 | public function __construct($in_max_requests = 10, $in_options = array()) { 45 | $this->max_requests = $in_max_requests; 46 | $this->options = $in_options; 47 | 48 | $this->outstanding_requests = array(); 49 | $this->multi_handle = curl_multi_init(); 50 | } 51 | 52 | //Ensure all the requests finish nicely 53 | public function __destruct() { 54 | $this->finishAllRequests(); 55 | } 56 | 57 | // Sets how many requests can be outstanding at once before we block and wait for one to 58 | // finish before starting the next one 59 | public function setMaxRequests($in_max_requests) { 60 | $this->max_requests = $in_max_requests; 61 | } 62 | 63 | // Sets the options to pass to curl, using the format of curl_setopt_array() 64 | public function setOptions($in_options) { 65 | 66 | $this->options = $in_options; 67 | } 68 | 69 | // Start a fetch from the $url address, calling the $callback function passing the optional 70 | // $user_data value. The callback should accept 3 arguments, the url, curl handle and user 71 | // data, eg on_request_done($url, $ch, $user_data); 72 | public function startRequest($url, $callback, $user_data = array(), $post_fields=null) { 73 | 74 | if( $this->max_requests > 0 ) 75 | $this->waitForOutstandingRequestsToDropBelow($this->max_requests); 76 | 77 | $ch = curl_init(); 78 | curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); 79 | curl_setopt_array($ch, $this->options); 80 | curl_setopt($ch, CURLOPT_URL, $url); 81 | 82 | if (isset($post_fields)) { 83 | curl_setopt($ch, CURLOPT_POST, TRUE); 84 | curl_setopt($ch, CURLOPT_POSTFIELDS, $post_fields); 85 | } 86 | 87 | curl_multi_add_handle($this->multi_handle, $ch); 88 | 89 | $ch_array_key = (int)$ch; 90 | 91 | $this->outstanding_requests[$ch_array_key] = array( 92 | 'url' => $url, 93 | 'callback' => $callback, 94 | 'user_data' => $user_data, 95 | ); 96 | 97 | $this->checkForCompletedRequests(); 98 | } 99 | 100 | // You *MUST* call this function at the end of your script. It waits for any running requests 101 | // to complete, and calls their callback functions 102 | public function finishAllRequests() { 103 | $this->waitForOutstandingRequestsToDropBelow(1); 104 | } 105 | 106 | // Checks to see if any of the outstanding requests have finished 107 | private function checkForCompletedRequests() { 108 | /* 109 | // Call select to see if anything is waiting for us 110 | if (curl_multi_select($this->multi_handle, 0.0) === -1) 111 | return; 112 | 113 | // Since something's waiting, give curl a chance to process it 114 | do { 115 | $mrc = curl_multi_exec($this->multi_handle, $active); 116 | } while ($mrc == CURLM_CALL_MULTI_PERFORM); 117 | */ 118 | // fix for https://bugs.php.net/bug.php?id=63411 119 | do { 120 | $mrc = curl_multi_exec($this->multi_handle, $active); 121 | } while ($mrc == CURLM_CALL_MULTI_PERFORM); 122 | 123 | while ($active && $mrc == CURLM_OK) { 124 | if (curl_multi_select($this->multi_handle) != -1) { 125 | do { 126 | $mrc = curl_multi_exec($this->multi_handle, $active); 127 | } while ($mrc == CURLM_CALL_MULTI_PERFORM); 128 | } 129 | else 130 | return; 131 | } 132 | 133 | // Now grab the information about the completed requests 134 | while ($info = curl_multi_info_read($this->multi_handle)) { 135 | 136 | $ch = $info['handle']; 137 | $ch_array_key = (int)$ch; 138 | 139 | if (!isset($this->outstanding_requests[$ch_array_key])) { 140 | die("Error - handle wasn't found in requests: '$ch' in ". 141 | print_r($this->outstanding_requests, true)); 142 | } 143 | 144 | $request = $this->outstanding_requests[$ch_array_key]; 145 | 146 | $url = $request['url']; 147 | $content = curl_multi_getcontent($ch); 148 | $callback = $request['callback']; 149 | $user_data = $request['user_data']; 150 | 151 | call_user_func($callback, $content, $url, $ch, $user_data); 152 | 153 | unset($this->outstanding_requests[$ch_array_key]); 154 | 155 | curl_multi_remove_handle($this->multi_handle, $ch); 156 | } 157 | 158 | } 159 | 160 | // Blocks until there's less than the specified number of requests outstanding 161 | private function waitForOutstandingRequestsToDropBelow($max) 162 | { 163 | while (1) { 164 | $this->checkForCompletedRequests(); 165 | if (count($this->outstanding_requests)<$max) 166 | break; 167 | 168 | usleep(10000); 169 | } 170 | } 171 | 172 | } 173 | 174 | 175 | ?> 176 | -------------------------------------------------------------------------------- /test.php: -------------------------------------------------------------------------------- 1 | #!/usr/bin/php 2 | , freely reusable, see http://petewarden.typepad.com for more 10 | 11 | require_once('parallelcurl.php'); 12 | 13 | define ('SEARCH_URL_PREFIX', 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&rsz=large&filter=0'); 14 | 15 | // This function gets called back for each request that completes 16 | function on_request_done($content, $url, $ch, $search) { 17 | 18 | $httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE); 19 | if ($httpcode !== 200) { 20 | print "Fetch error $httpcode for '$url'\n"; 21 | return; 22 | } 23 | 24 | $responseobject = json_decode($content, true); 25 | if (empty($responseobject['responseData']['results'])) { 26 | print "No results found for '$search'\n"; 27 | return; 28 | } 29 | 30 | print "********\n"; 31 | print "$search:\n"; 32 | print "********\n"; 33 | 34 | $allresponseresults = $responseobject['responseData']['results']; 35 | foreach ($allresponseresults as $responseresult) { 36 | $title = $responseresult['title']; 37 | print "$title\n"; 38 | } 39 | } 40 | 41 | // The terms to search for on Google 42 | $terms_list = array( 43 | "John", "Mary", 44 | "William", "Anna", 45 | "James", "Emma", 46 | "George", "Elizabeth", 47 | "Charles", "Margaret", 48 | "Frank", "Minnie", 49 | "Joseph", "Ida", 50 | "Henry", "Bertha", 51 | "Robert", "Clara", 52 | "Thomas", "Alice", 53 | "Edward", "Annie", 54 | "Harry", "Florence", 55 | "Walter", "Bessie", 56 | "Arthur", "Grace", 57 | "Fred", "Ethel", 58 | "Albert", "Sarah", 59 | "Samuel", "Ella", 60 | "Clarence", "Martha", 61 | "Louis", "Nellie", 62 | "David", "Mabel", 63 | "Joe", "Laura", 64 | "Charlie", "Carrie", 65 | "Richard", "Cora", 66 | "Ernest", "Helen", 67 | "Roy", "Maude", 68 | "Will", "Lillian", 69 | "Andrew", "Gertrude", 70 | "Jesse", "Rose", 71 | "Oscar", "Edna", 72 | "Willie", "Pearl", 73 | "Daniel", "Edith", 74 | "Benjamin", "Jennie", 75 | "Carl", "Hattie", 76 | "Sam", "Mattie", 77 | "Alfred", "Eva", 78 | "Earl", "Julia", 79 | "Peter", "Myrtle", 80 | "Elmer", "Louise", 81 | "Frederick", "Lillie", 82 | "Howard", "Jessie", 83 | "Lewis", "Frances", 84 | "Ralph", "Catherine", 85 | "Herbert", "Lula", 86 | "Paul", "Lena", 87 | "Lee", "Marie", 88 | "Tom", "Ada", 89 | "Herman", "Josephine", 90 | "Martin", "Fanny", 91 | "Jacob", "Lucy", 92 | "Michael", "Dora", 93 | ); 94 | 95 | if (isset($argv[1])) { 96 | $max_requests = $argv[1]; 97 | } else { 98 | $max_requests = 10; 99 | } 100 | 101 | $curl_options = array( 102 | CURLOPT_SSL_VERIFYPEER => FALSE, 103 | CURLOPT_SSL_VERIFYHOST => FALSE, 104 | CURLOPT_USERAGENT, 'Parallel Curl test script', 105 | ); 106 | 107 | $parallel_curl = new ParallelCurl($max_requests, $curl_options); 108 | 109 | foreach ($terms_list as $terms) { 110 | $search = '"'.$terms.' is a"'; 111 | $search_url = SEARCH_URL_PREFIX.'&q='.urlencode($terms); 112 | $parallel_curl->startRequest($search_url, 'on_request_done', $search); 113 | } 114 | 115 | // This should be called when you need to wait for the requests to finish. 116 | // This will automatically run on destruct of the ParallelCurl object, so the next line is optional. 117 | $parallel_curl->finishAllRequests(); 118 | 119 | ?> --------------------------------------------------------------------------------