Приглашаем посетить
Тургенев (turgenev-lit.ru)

Section 20.4.  Curl

Previous
Table of Contents
Next

20.4. Curl

The cURL extension to PHP is designed to allow you to use a variety of web resources from within your PHP script. The name cURL (called Curl from now on, for ease of reading) stands either for "Client for URLs" or "Client URL Request Library," but the function is the same: it lets you use several Internet protocols using one uniform interface, most notably FTP, FTPS, HTTP, HTTPS, and LDAP.

The basic premise to using Curl is that there are four steps: initialize Curl, set your options, execute your query, and close Curl. Steps 1, 3, and 4 are easy, with the majority of the work taking place in step 2. Curl is highly configurable, and there are dozens of options you can set to make it do all sorts of weird and wonderful things. While this is undoubtedly a great advantage, it does make the learning curve a little high.

20.4.1. Installing Curl

If you're using Windows, you can enable Curl support by copying the files libeay32.dll and ssleay32.dll into your c:\windows\system32 folder, then enabling the extension in your php.ini file. Look for the line ";extension=php_curl.dll" and take the semicolon off from the beginning.

If you're using Unix, you either have to install Curl support through your package manager, or you need to compile it from source. Compiling Curl support into your PHP takes two steps: installing the Curl development libraries on your machine (do this through your package manager), then recompiling PHP with the with-curl switch in your configure line. As long as you have the development version of Curl installed, this should work fine.

20.4.2. Your First Curl Script

The first Curl script we are going to look at is the simplest Curl script that is actually useful: it will load a web page, retrieve the contents, then print it out. So, keeping the four-step Curl process in mind, this equates to:

  1. Initialize Curl

  2. Set URL we want to load

  3. Retrieve and print the URL

  4. Close Curl

Here is how that looks in PHP code:

    $curl = curl_init( );
    curl_setopt($curl, CURLOPT_URL, "http://www.php.net");
    curl_exec($curl);
    curl_close($curl);

There is a one-to-one mapping of steps to lines of code therestep 1, "Initialize Curl," is done by line one, $curl = curl_init( );, etc. There are four functions in that simple script, which are curl_init( ) for initializing the Curl library, curl_setopt( ) for setting Curl options, curl_exec( ) for executing the Curl query, and curl_close( ) for shutting down the Curl system. As mentioned already, of these four, only the second is complicatedthe rest stay as you see them. Curl's functionality is, for the most part, largely manipulated through repeated calls to curl_setopt( ), and it is this that distinguishes how Curl operates.

The curl_init( ) function returns a Curl instance for us to use in later functions, and you should always store it in a variable. It has just one optional parameter: if you pass a string into curl_init( ), it will automatically use that string as the URL to work with. In the script above, we use curl_setopt( ) to do that for clarity, but it is all the same.

You need to provide three parameters to the curl_setopt( ) function: the Curl instance to use, a constant value for the setting you want to change, and the value you want to use for that setting. There are a huge number of constants you can use for settings, and many of these are listed shortly. In the example we use CURLOPT_URL, which is used to set the URL for Curl to work with, and so the working URL is set to the third parameter.

Calling curl_exec( ) means, "We're finished setting our options, go ahead and do it," and you need to pass precisely one parameter: the Curl resource to use. The return value of curl_exec( ) is true/false by default, although we will be changing that soon.

The final function, curl_close( ), takes a Curl resource as its only parameter, closes the Curl session, then frees up the associated memory.

20.4.3. Trapping Return Values

To improve on the previous script, it would be good if we actually had some control over the output of our retrieved HTML page. As it is, calling curl_exec( ) retrieves and outputs the page, but it would be nice to have the retrieved content stored in a variable somewhere for use when we please. There are two ways of doing this. We already looked at how output bufferingand more specifically, the ob_get_contents( ) functionallows you to catch output before it gets to your visitor and manipulate it as you want. While this might seem like a good way to solve the problem, the second way is even better: Curl has an option specifically for it.

Passing CURLOPT_RETURNTRANSFER to curl_setopt( ) as parameter two and 1 as parameter three will force Curl to not print out the results of its query. Instead, it will return the results as a string return value from curl_exec( ) in place of the usual true/false. If there is an error, false will still be the return value from curl_exec( ).

Capturing the return value from curl_exec( ) looks like this in code:

    $curl = curl_init( )
    curl_setopt($curl, CURLOPT_URL, "http://www.php.net");
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);

    $result = curl_exec($curl);
    curl_close($curl);
    print $result;

That script will output the same as the previous script, but having the web page stored in a variable before printing gives us more flexibilitywe could have manipulated the data in any number of ways before printing.

Alternatively, you can have Curl save its output to a file using CURLOPT_FILE, which takes a file handle as its third parameter. This time the script looks like this:

    $curl = curl_init( );
    $fp = fopen("somefile.txt", "w");
    curl_setopt ($curl, CURLOPT_URL, "http://www.php.net");
    curl_setopt($curl, CURLOPT_FILE, $fp);

    curl_exec ($curl);
    curl_close ($curl);

20.4.4. Using FTP to Send Data

Our next basic script is going to switch from HTTP to FTP so you can see how little difference there is. This next script connects to the GNU FTP server and gets a listing of the root directory there:

    $curl = curl_init( );
    curl_setopt($curl, CURLOPT_URL,"ftp://ftp.gnu.org");
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);

    $result = curl_exec ($curl);
    curl_close ($curl);
    print $result;

We could have made that script more FTP-specific by providing some FTP options to the script. For example, the CURLOPT_FTPLISTONLY option will make PHP return much less information. If you tried the script without this, you would have received read/write information for each of the files and directories, when they were last changed, and so on. CURLOPT_FTPLISTONLY changes this so that you only get the file/directory names.

The second FTP option of interest is CURLOPT_USERPWD, which makes PHP use the third parameter to curl_setopt( ) as the username and password used for logging in. As the third parameter contains both the username and the password, you need to split them using a colon, like this: username:password. When logging onto the GNU FTP server, we want to use the anonymous FTP account reserved for guests. In this situation, you generally provide your email address as the password.

With both of these changes implemented, the new script looks like this:

    $curl = curl_init( );
    curl_setopt($curl, CURLOPT_URL,"ftp://ftp.gnu.org");
    curl_setopt($curl, CURLOPT_FTPLISTONLY, 1);
    curl_setopt($curl, CURLOPT_USERPWD, "anonymous:your@email.com");
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);

    $result = curl_exec ($curl);
    curl_close ($curl);
    print $result;

Try changing the username and password to random values, as this will cause the login to fail. If you run the script again, you will see nothing is printed outno errors, no warnings; nothing. This is because Curl fails silently, and you need to request Curl's error message explicitly using curl_error( ). As with the other basic functions, this takes just a Curl session handler as its only parameter, and returns the error message from Curl. So, with this in mind, here is our final FTP script:

    $curl = curl_init( );
    curl_setopt($curl, CURLOPT_URL,"ftp://ftp.gnu.org");
    curl_setopt($curl, CURLOPT_FTPLISTONLY, 1);
    curl_setopt($curl, CURLOPT_USERPWD, "foo:barbaz");
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);

    $result = curl_exec ($curl);
    echo curl_error($curl);
    curl_close ($curl);
    print $result;

Note the bad username and password and the extra call to curl_error( ) after curl_exec( ). As long as the GNU team don't change their FTP permissions before you read this, running that script should output "Access denied: This FTP server is anonymous only."

20.4.5. Sending Data Over HTTP

The last Curl script we are going to look at, before we go over a list of the most popular options for curl_setopt( ), shows how to send data out to the Web as opposed to just retrieving it.

First, create the file posttest.php in your web server's public directory. Type into the file this code:

    var_dump($_POST);

That simply takes the HTTP POST data that has come in and spits it back out again. Now, create this new script:

    $curl = curl_init( );
    curl_setopt($curl, CURLOPT_URL,"http://localhost/posttest.php");
    curl_setopt($curl, CURLOPT_POST, 1);
    curl_setopt($curl, CURLOPT_POSTFIELDS, "Hello=World&Foo=Bar&Baz=Wombat");

    curl_exec ($curl);
    curl_close ($curl);

If you are running your posttest.php file on a remote server, change "localhost" to the server URL. There are two new values for curl_setopt( ) in there, but otherwise, the script should be clear.

The two new values, CURLOPT_POST and CURLOPT_POSTFIELDS, make our session prepare to send data over HTTP POST and assign the data to send, respectively. CURLOPT_POST just takes a 1 to enable to POST usage, but CURLOPT_POSTFIELDS needs a properly formatted data string to send. The string you use for the third parameter with CURLOPT_POSTFIELDS should be a list of the variables you want to send in the format Variable=Value, with each variable separated by an ampersand, &. Thus, the above script sends three variables over: Hello, Foo, and Baz, with values World, Bar, and Wombat, respectively.

Once the values are sent, Curl captures the response from the server and prints it out directly. Our posttest.php script dumps what it got through HTTP POST, so your output should be this:

    array(3) {
            ["Hello"]=>
            string(5) "World"
            ["Foo"]=>
            string(3) "Bar"
            ["Baz"]=>
            string(6) "Wombat"
    }

Section 20.4.  Curl

The field data you pass in as the third parameter to CURLOPT_POSTFIELDS should not have any spaces or special characters. Spaces should be replaced with %20you can have this and other special characters automatically replaced by using urlencode( ) on the string.


20.4.6. The Abridged List of Curl Options

There are a large number of options available for curl_setopt( )far too many to cover here. However, of the full list, about half or so are used regularly and, therefore, deserve printing here. They are shown in Table 20-2.

Table 20-2. Curl options

If the 2nd parameter is...

3rd parameter should be...

CURLOPT_COOKIE

A string containing the contents of the cookie data to be set in the HTTP header.

CURLOPT_COOKIEFILE

A string containing the name of the file containing cookie data to be sent.

CURLOPT_CRLF

1 if you want Curl to convert Unix new lines to CR/LF new lines.

CURLOPT_FAILONERROR

1 if you want Curl to fail silently if the HTTP code returned is equal to or larger than 300.

CURLOPT_FILE

A string containing the filename where the output of your transfer should be placed. Default is straight to output (STDOUT).

CURLOPT_FOLLOWLOCATION

1 if you want Curl to follow all "Location: " headers that the server sends as part of the HTTP header. You can limit the number of "Location" headers to follow using CURLOPT_MAXREDIRS.

CURLOPT_FTPAPPEND

1 to have Curl append to the remote file instead of overwriting it.

CURLOPT_FTPLISTONLY

1 to list just the names of an FTP directory as opposed to more detailed information.

CURLOPT_HEADER

1 if you want the header to be included in the output. Usually for HTTP only.

CURLOPT_HTTPHEADER

An array of HTTP header fields to be set.

CURLOPT_INFILE

A string containing the filename where the input of your transfer comes from.

CURLOPT_INFILESIZE

The size of the file being uploaded to a remote site.

CURLOPT_MAXREDIRS

The number of "Location:" headers Curl should follow before erroring out. This option is only appropriate if CURLOPT_FOLLOWLOCATION is used also.

CURLOPT_NOBODY

1 to tell Curl not to include the body part in the output. For HTTP(S) servers, this is equivalent to a HEAD requestonly the headers will be returned.

CURLOPT_POST

1 if you want Curl to do a regular HTTP POST.

CURLOPT_POSTFIELDS

A string containing the data to post in the HTTP "POST" operation.

CURLOPT_REFERER

A string containing the "referer" header to be used in an HTTP request. This is only necessary if the remote server relies on this value.

CURLOPT_RESUME_FROM

A number equal to the offset, in bytes, that you want your transfer to start from.

CURLOPT_RETURNTRANSFER

1 if you want Curl to return the transfer data instead of printing it out directly.

CURLOPT_STDERR

A string containing the filename to write errors to instead of normal output.

CURLOPT_TIMEOUT

A number equal to the maximum time in seconds that Curl functions can take.

CURLOPT_UPLOAD

1 if you want PHP to prepare for a file upload.

CURLOPT_URL

A string containing the URL you want Curl to fetch.

CURLOPT_USERPWD

A string formatted in the username:password manner, for Curl to give to the remote server if requested.

CURLOPT_USERAGENT

A string containing the "user-agent" header to be used in a HTTP request.

CURLOPT_VERBOSE

1 if you want Curl to give detailed reports about everything that is happening.

CURLOPT_WRITEHEADER

A string containing the filename to write the header part of the output into.


There is a large selection available online at http://curl.haxx.se/libcurl/c/curl_easy_setopt.html.

20.4.7. Debugging Curl

Because it works with so many different network protocols, it is very easy to make mistakes when using Curl. You can speed up your debugging efforts by using CURLOPT_VERBOSE to have Curl output detailed information about its actions.

To give you an idea of how CURLOPT_VERBOSE affects the output of your script, here is a script we used earlier, rewritten to add CURLOPT_VERBOSE:

    $curl = curl_init( );
    curl_setopt ($curl, CURLOPT_URL, "http://www.php.net");
    curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($curl, CURLOPT_VERBOSE, 1);

    curl_exec ($curl);
    curl_close ($curl);

Note that CURLOPT_RETURNTRANSFER was used but the output from curl_exec( ) was ignoredthis is because the extra data provided by CURLOPT_VERBOSE is actually sent straight to the browser, irrespective of CURLOPT_RETURNTRANSFER. By ignoring the output of curl_exec( ), the script will only print out the debugging information. Here is what you should get:

    * About to connect( ) to www.php.net:80
    * Connected to php.net (64.246.30.37) port 80
    > GET / HTTP/1.1 Host: www.php.net Pragma: no-cache Accept: image/gif,
            image/x-xbitmap, image/jpeg, image/pjpeg, */*
    < HTTP/1.1 200 OK < Date: Fri, 06 Feb 2004 22:13:29 GMT
    < Server: Apache/1.3.26 (Unix) mod_gzip/1.3.26.1a PHP/4.3.3-dev
    < X-Powered-By: PHP/4.3.3-dev
    < Last-Modified: Fri, 06 Feb 2004 22:14:38 GMT
    < Content-language: en
    < Set-Cookie: COUNTRY=GBR%2C213.152.58.41; expires=Fri,
            13-Feb-04 22:13:29 GMT; path=/; domain=.php.net
    < Connection: close
    < Transfer-Encoding: chunked
    < Content-Type: text/html;charset=ISO-8859-1
    * Closing connection #0

Note that lines that start with > are headers sent by Curl, lines that start with < are headers sent by the responding server, and lines that start with * are Curl informational messages.


Previous
Table of Contents
Next