|This document is deprecated. Take a look at v6.1 instead.|
This documentation is about to get outdated. Things here is under maintenance only.
Going further down will give you deprecated documentation.
|Table of Contents|
Join the project
Source code can be viewed at https://bitbucket.tornevall.net/projects/LIB/repos/tornelib-php/browse/classes/tornevall_network.php
Inline docs for the develop-version is also located at http://phpdoc.tornevall.net/TorneLIBv5/class-TorneLIB.Tornevall_cURL.html
Why rebuilding the wheel? There are other libraries that do the same job!
Well. Yes. Almost. The other libraries out there, are probably doing the exact same job as we do here. The problem with other libraries (that I've found) is amongst others that they are way too big. Taking for example GuzzleHTTP, is for example a huge project if you're aiming to use a smaller project that not requires tons of files to run. They probably covers a bit more solutions than this project, however, we are aiming to make curl usable on as many places as possible in a smaller format. What we are doing here is turning the curl PHP libraries into a very verbose state and with that, returning completed and parsed data to your PHP applications in a way where you don't have to think of this yourself.
So, this library is built for doing one thing only: Communicate. So instead of including one big package of library-files, this library shoud probably be considered a light weight curl-handler. You could of course use other libraries aswell, but our goal is to keep this one as small as possible, just to be able to fit anywhere you'd like. The whole idea with this curl bridge is to avoid fiddling with all curl settings by yourself.
People sometimes wonder why we are (probably) rebuilding the wheel as others have done this too before, but this library are born from a very old project, that was built for testing proxies scraped from different kind of sites and lists. The first step that was made then, was to build a client that actually could scrape sites without having to set up a whole bunch of scripts and libraries (this happened around 2000-2006 so, in that moment, we did know very little about which libraries that existed). As the source code probably reveals it has a quite big section, just to configure tunnels and proxies in different methods. And this is also why this library exists.
As we was scraping proxies, we also needed to find out how the proxy acted, so there is plenty of time wasted on this. Sometimes for example, when connecting to a socks proxy, we not only flagged the proxy as working. We continued to compare the connection to the outbound connection and if the ip address on the outgoing interfaces was matching the "inbound". If we found a mismatch, this proxy became flagged as an "elite proxy" (which this behaviour is called). We also wanted to scan for proxies which revealed the origin ip by HTTP_VIA and similar headers. In short, there's a whole bunch of scenarios, that we could not get out from a library unless we built it ourselves. So, here we are!
To get the most out of this library, there is currently one dependency to php-xml which makes the SOAP client work. Normally, when starting on an empty machine (Ubuntu), very little things needs to be done to run:
apt-get install php-curl php-xml
PHP-cURL simplifier library
The network- and cURL class is an independent class library that handles network related things. The cURL library especially, has special features, like parsing data received from the body. For example, calling an URL that returns json strings, you'll get an object back that you could handle immediately, instead of parsing the data yourself.
Instantiation is being done by running a simple $CURL = new \TorneLIB\Tornevall_cURL();
If the curl library is missing (we are checking that curl_init() exists) an exception will be thrown as the instantiation is in progress. In early versions of the library an exception are thrown as soon as the library file gets loaded, which might stop a site completely.
The cURL class
We are with the cURL class in a quite verbose mode, to get the most out of the web request you are doing. The response are separated in an array as follows
|array key||array content value|
The current HTTP Status code returned from the request (See https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)
This response status code can be used to detect, for example 404 not found errors, or similar, of a page. By means, you may throw optional exceptions by checking this code state as you need.
The full response header returned from the request as ...
... where info is an array with keys and values. Example:
|body||The full returned body of the request|
If the body are recognized as specially formatted (json, xml, etc), the parsed array will transform into an object or an array, depending on the content.
Currently, the parser supports simpler modes like XML (SimpleXMLElement) and by this also RSS-feeds (unconfirmed), JSON and serialized data. The curl library also supports usage of the PEAR-package XML_Serializer.
As of the release of 20161213 (which can be found in the repo), all functions that allows postData to be sent in a web call may be posted quite freely. You can either put postData in the standard format like:
You may also sent send your requested data through the function as an array() or even a JSON-object. Such call may look like this:
doPost("http://test.com/", $postDataObject, \TorneLIB\CURL_POST_AS::POST_AS_JSON) // Alternative without the POST_AS-constant // doPost("http://test.com/", $postDataObject, 1)
The function itself tries to detect if the input $postDataObject is encoded or decoded (keep in mind that problems with NAN-values or similar is not supported).
$Response = $CURL->doGet("https://test.com/requestService?wsdl", \TorneLIB\CURL_POST_AS::POST_AS_SOAP);
Alternative, without the POST_AS-constant:
$Response = $CURL->doGet("https://test.com/requestService?wsdl", 2);
SSL Certificates and verification
The cUrl library has it's own method to find out if there are missing SSL certificates during https-calls. Normally, this is not a very big issue, since standard linux installations are smart enough to store bundles, pems and crt-files in standard locations (/etc/ssl/certs for example). This is however not always the case. If the storage is located elsewhere, https-calls may sometimes fail. By using $CURL→sslPemLocations, you can replace the current standard paths and let the library look for them where you believe they should be instead, since this can not be set on fly through ini_set. This whole operation may be tested through the call TestCerts(), that runs an internal function openssl_guess(). This function is always called from the primary constructor, to make sure the certificates can be located as early as possible and it a certificate file are found in the inital run, it does not have to be runned again. Later on, when using the primary GET/POST/etc-calls the certificate bundle will be used in the stream_context-handler to make sure the https gets handled properly.
Since version 5.0.0-20170210 the (
require_once("tornevall_network.php"); $CURL = new \TorneLIB\Tornevall_cURL(); $CURL->setSslUnverified(true); $output = $CURL->doGet("https://my-test-url.com");
SSL Capability checking
In some release packages, where curl is not compiled with SSL support, the library might spit out a lot of exceptions. With 5.0.0-20170425, the library will come with the SSL capability check. With this, a developer might discover before any problem occurs, if it is because of missing SSL libraries in curl. The function call is very easy to use: $LIB->hasSsl() returns a boolean of the first SSL control check, that is being made when the library is about to instantiate.
Parsing regular HTML
The library has, as of 5.0.0-20120211, the ability to convert regular incoming html-documents into a parsed array, if there is a need to read any content as an array. Since this ability may consume memory and/or CPU, it has to be enabled first. It has been tested with HTML documents, so far and it extracts a document by following content:
Extracted in the simplest way, by nodes, meaning every each of the elements are sorted out with information about the elements tagnames and attributes (name and id)
Extracts the DOMDocument and sets its closest element identification to each of the array variables, beginning with the tagname→element name → element id.
Extracts the DOMDocument and sets the element identification by its id (getElementById equivalent)
require_once("tornevall_network.php"); $CURL = new \TorneLIB\Tornevall_cURL(); $CURL->setParseHtml(true); $output = $CURL->doGet("https://my-test-url.com"); var_dump($output['parsed']['ById']);