Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Live information

There's a Mailinglist put up for everything concerning netcurl. That's also where you can find release information (for now). You can subscribe to the list here.


Warning
titleDEPRECATED
This document is deprecated. Take a look at v6.1 instead.

 This page

This documentation is about to get outdated. Things here is under maintenance only. 





Going further down will give you deprecated documentation.


Table of Contents

Join the project

...

Inline docs for the develop-version is also located at http://phpdoc.tornevall.net/TorneLIBv5/class-TorneLIB.Tornevall_cURL.html

Why rebuilding the wheel? There are other libraries that do the same job!

Well. Yes. Almost. The other libraries out there, are probably doing the exact same job as we do here. The problem with other libraries (that I've found) is amongst others that they are way too big. Taking for example GuzzleHTTP, is for example a huge project if you're aiming to use a smaller project that not requires tons of files to run. They probably covers a bit more solutions than this project, however, we are aiming to make curl usable on as many places as possible in a smaller format. What we are doing here is turning the curl PHP libraries into a very verbose state and with that, returning completed and parsed data to your PHP applications in a way where you don't have to think of this yourself.

So, this library is built for doing one thing only: Communicate. So instead of including one big package of library-files, this library shoud probably be considered a light weight curl-handler. You could of course use other libraries aswell, but our goal is to keep this one as small as possible, just to be able to fit anywhere you'd like. The whole idea with this curl bridge is to avoid fiddling with all curl settings by yourself.

History

People sometimes wonder why we are (probably) rebuilding the wheel as others have done this too before, but this library are born from a very old project, that was built for testing proxies scraped from different kind of sites and lists. The first step that was made then, was to build a client that actually could scrape sites without having to set up a whole bunch of scripts and libraries (this happened around 2000-2006 so, in that moment, we did know very little about which libraries that existed). As the source code probably reveals it has a quite big section, just to configure tunnels and proxies in different methods. And this is also why this library exists.

As we was scraping proxies, we also needed to find out how the proxy acted, so there is plenty of time wasted on this. Sometimes for example, when connecting to a socks proxy, we not only flagged the proxy as working. We continued to compare the connection to the outbound connection and if the ip address on the outgoing interfaces was matching the "inbound". If we found a mismatch, this proxy became flagged as an "elite proxy" (which this behaviour is called). We also wanted to scan for proxies which revealed the origin ip by HTTP_VIA and similar headers. In short, there's a whole bunch of scenarios, that we could not get out from a library unless we built it ourselves. So, here we are!

Dependencies

To get the most out of this library, there is currently one dependency to php-xml which makes the SOAP client work. Normally, when starting on an empty machine (Ubuntu), very little things needs to be done to run:

Code Block
languagebash
titleDependencies
apt-get install php-curl php-xml


PHP-cURL simplifier library

The network- and cURL class is an independent class library that handles network related things. The cURL library especially, has special features, like parsing data received from the body. For example, calling an URL that returns json strings, you'll get an object back that you could handle immediately, instead of parsing the data yourself.

Instantiation is being done by running a simple $CURL = new \TorneLIB\Tornevall_cURL();

If the curl library is missing (we are checking that curl_init() exists) an exception will be thrown as the instantiation is in progress. In early versions of the library an exception are thrown as soon as the library file gets loaded, which might stop a site completely.

The cURL class

We are with the cURL class in a quite verbose mode, to get the most out of the web request you are doing. The response are separated in an array as follows

array keyarray content value
code

The current HTTP Status code returned from the request (See https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)

This response status code can be used to detect, for example 404 not found errors, or similar, of a page. By means, you may throw optional exceptions by checking this code state as you need.

header

The full response header returned from the request as ...

Code Block
languagephp
titleHeader array
array(
    'info' => array(),
    'full' => 'Full header as string'
);

... where info is an array with keys and values. Example:

Code Block
languagephp
titleInfoArray
array(
    "Server" => "Apache",
    "Strict-Transport-Security" => "max-age=63072000; includeSubDomains",
    "X-Frame-Options" => "SAMEORIGIN",
    "X-Powered-By" => "PHP/5.6.27"
);


bodyThe full returned body of the request
parsed

If the body are recognized as specially formatted (json, xml, etc), the parsed array will transform into an object or an array, depending on the content.

Currently, the parser supports simpler modes like XML (SimpleXMLElement) and by this also RSS-feeds (unconfirmed), JSON and serialized data. The curl library also supports usage of the PEAR-package XML_Serializer.

Posting parameters

As of the release of 20161213 (which can be found in the repo), all functions that allows postData to be sent in a web call may be posted quite freely. You can either put postData in the standard format like:

&param1=value1&param2=value2

You may also sent send your requested data through the function as an array() or even a JSON-object. Such call may look like this:

Code Block
languagephp
doPost("http://test.com/", $postDataObject, \TorneLIB\CURL_POST_AS::POST_AS_JSON)
// Alternative without the POST_AS-constant
// doPost("http://test.com/", $postDataObject, 1)

The function itself tries to detect if the input $postDataObject is encoded or decoded (keep in mind that problems with NAN-values or similar is not supported).

...

$Response = $CURL->doGet("https://test.com/requestService?wsdl", \TorneLIB\CURL_POST_AS::POST_AS_SOAP);

Alternative, without the POST_AS-constant:

$Response = $CURL->doGet("https://test.com/requestService?wsdl", 2);

SSL Certificates and verification

...

  • ByNodes
    Extracted in the simplest way, by nodes, meaning every each of the elements are sorted out with information about the elements tagnames and attributes (name and id)
  • ByClosestTag
    Extracts the DOMDocument and sets its closest element identification to each of the array variables, beginning with the tagname→element name → element id.
  • ById
    Extracts the DOMDocument and sets the element identification by its id (getElementById equivalent)  


Code Block
languagephp
titleHTML Content Parsing
require_once("tornevall_network.php");
$CURL = new \TorneLIB\Tornevall_cURL();
$CURL->setParseHtml(true);
$output = $CURL->doGet("https://my-test-url.com");
var_dump($output['parsed']['ById']);

...