A proxy in a network context is a sort of middle man, a server in between you as a client and the remote server you want to communicate with. The client contacts the middle man which then goes on to contact the remote server for you.
This sort of proxy use is sometimes used by companies and organizations, in which case you are usually required to use them to reach the target server.
There are several different kinds of proxies and different protocols to use when communicating with a proxy, and libcurl supports a few of the most common proxy protocols. It is important to realize that the protocol used to the proxy is not necessarily the same protocol used to the remote server.
When setting up a transfer with libcurl you need to point out the server name and port number of the proxy. You may find that your favorite browsers can do this in slightly more advanced ways than libcurl can, and we will get into such details in later sections.
libcurl supports the two major proxy types: SOCKS and HTTP proxies. More specifically, it supports both SOCKS4 and SOCKS5 with or without remote name lookup, as well as both HTTP and HTTPS to the local proxy.
The easiest way to specify which kind of proxy you are talking to is to set the scheme part of the proxy host name string (
CURLOPT_PROXY) to match it:
socks4 - means SOCKS4 with local name resolving
socks4a - means SOCKS4 with proxy's name resolving
socks5 - means SOCKS5 with local name resolving
socks5h - means SOCKS5 with proxy's name resolving
http - means HTTP, which always lets the proxy resolve names
https - means HTTPS to the proxy, which always lets the proxy resolve names (Note that HTTPS proxy support was added recently, in curl 7.52.0, and it still only works with a subset of the TLS libraries: OpenSSL, GnuTLS and NSS.)
You can also opt to set the type of the proxy with a separate option if you prefer to only set the host name, using
CURLOPT_PROXYTYPE. Similarly, you can set the proxy port number to use with
In a section above you can see that different proxy setups allow the name resolving to be done by different parties involved in the transfer. You can in several cases either have the client resolve the server host name and pass on the IP address to the proxy to connect to - which of course assumes that the name lookup works accurately on the client system - or you can hand over the name to the proxy to have the proxy resolve the name; converting it to an IP address to connect to.
When you are using an HTTP or HTTPS proxy, you always give the name to the proxy to resolve.
If your network connection requires the use of a proxy to reach the destination, you must figure this out and tell libcurl to use the correct proxy. There is no support in libcurl to make it automatically figure out or detect a proxy.
When using a browser, it is popular to provide the proxy with a PAC script or other means but none of those are recognized by libcurl.
If no proxy option has been set, libcurl will check for the existence of specially named environment variables before it performs its transfer to see if a proxy is requested to get used.
You can specify the proxy by setting a variable named
[scheme]_proxy to hold the proxy host name (the same way you would specify the host with
-x). So if you want to tell curl to use a proxy when accessing a HTTP server, you set the 'http_proxy' environment variable. Like this:
The proxy example above is for HTTP, but can of course also set
https_proxy, and so on for the specific protocols you want to proxy. All these proxy environment variable names except http_proxy can also be specified in uppercase, like
To set a single variable that controls all protocols, the
ALL_PROXY exists. If a specific protocol variable one exists, such a one will take precedence.
When using environment variables to set a proxy, you could easily end up in a situation where one or a few host names should be excluded from going through the proxy. This can be done with the
NO_PROXY variable - or the corresponding
CURLOPT_NOPROXY libcurl option. Set that to a comma- separated list of host names that should not use a proxy when being accessed. You can set NO_PROXY to be a single asterisk ('*') to match all hosts.
The HTTP protocol details exactly how a HTTP proxy should be used. Instead of sending the request to the actual remote server, the client (libcurl) instead asks the proxy for the specific resource. The connection to the HTTP proxy is made using plain unencrypted HTTP.
If a HTTPS resource is requested, libcurl will instead issue a
CONNECT request to the proxy. Such a request opens a tunnel through the proxy, where it basically just passes data through without understanding it. This way, libcurl can establish a secure end-to-end TLS connection even when a HTTP proxy is present.
You can proxy non-HTTP protocols over a HTTP proxy, but since this is mostly done by the CONNECT method to tunnel data through it requires that the proxy is configured to allow the client to connect to those other particular remote port numbers. Many HTTP proxies are setup to inhibit connections to other port numbers than 80 and 443.
A HTTPS proxy is similar to a HTTP proxy but allows the client to connect to it using a secure HTTPS connection. Since the proxy connection is separate from the connection to the remote site even in this situation, as HTTPS to the remote site will be tunnelled through the HTTPS connection to the proxy, libcurl provies a whole set of TLS options for the proxy connection that are separate from the connection to the remote host.
CURLOPT_PROXY_CAINFO is basically the same functionality for the HTTPS proxy as
CURLOPT_CAINFO is for the remote host.
CURLOPT_PROXY_SSL_VERIFYPEER is the proxy version of
CURLOPT_SSL_VERIFYPEER and so on.
HTTPS proxies are still today fairly unusual in organizations and companies.