Complete API Reference (Auto-Generated)

This is the complete technical reference auto-generated from docstrings. For a user-friendly guide, see User Guide - Essential API.

Core Classes

Broker

class proxybroker.Broker(queue=None, timeout=8, max_conn=200, max_tries=3, judges=None, providers=None, verify_ssl=False, loop=None, stop_broker_on_sigint=True, **kwargs)[source]

Bases: object

The Broker.

One broker to rule them all, one broker to find them,
One broker to bring them all and in the darkness bind them.
Parameters:
  • queue (asyncio.Queue) – (optional) Queue of found/checked proxies

  • timeout (int) – (optional) Timeout of a request in seconds

  • max_conn (int) – (optional) The maximum number of concurrent checks of proxies

  • max_tries (int) – (optional) The maximum number of attempts to check a proxy

  • judges (list) – (optional) Urls of pages that show HTTP headers and IP address. Or Judge objects

  • providers (list) – (optional) Urls of pages where to find proxies. Or Provider objects

  • verify_ssl (bool) – (optional) Flag indicating whether to check the SSL certificates. Set to True to check ssl certifications

  • loop – (optional) asyncio compatible event loop

  • stop_broker_on_sigint – (optional) whether set SIGINT signal on broker object. Useful for a thread other than main thread.

Deprecated since version 0.2.0: Use max_conn and max_tries instead of max_concurrent_conn and attempts_conn.

async grab(*, countries=None, limit=0)[source]

Gather proxies from the providers without checking.

Parameters:
  • countries (list) – (optional) List of ISO country codes where should be located proxies

  • limit (int) – (optional) The maximum number of proxies

Example of usage.

async find(*, types=None, data=None, countries=None, post=False, strict=False, dnsbl=None, limit=0, **kwargs)[source]

Gather and check proxies from providers or from a passed data.

Example of usage.

Parameters:
  • types (list) – Types (protocols) that need to be check on support by proxy. Supported: HTTP, HTTPS, SOCKS4, SOCKS5, CONNECT:80, CONNECT:25 And levels of anonymity (HTTP only): Transparent, Anonymous, High

  • data – (optional) String or list with proxies. Also can be a file-like object supports read() method. Used instead of providers

  • countries (list) – (optional) List of ISO country codes where should be located proxies

  • post (bool) – (optional) Flag indicating use POST instead of GET for requests when checking proxies

  • strict (bool) – (optional) Flag indicating that anonymity levels of types (protocols) supported by a proxy must be equal to the requested types and levels of anonymity. By default, strict mode is off and for a successful check is enough to satisfy any one of the requested types

  • dnsbl (list) – (optional) Spam databases for proxy checking. Wiki

  • limit (int) – (optional) The maximum number of proxies

Raises:

ValueError – If types not given.

Changed in version 0.2.0: Added: post, strict, dnsbl. Changed: types is required.

serve(host='127.0.0.1', port=8888, limit=100, **kwargs)[source]

Start a local proxy server.

The server distributes incoming requests to a pool of found proxies.

When the server receives an incoming request, it chooses the optimal proxy (based on the percentage of errors and average response time) and passes to it the incoming request.

In addition to the parameters listed below are also accept all the parameters of the find() method and passed it to gather proxies to a pool.

Example of usage.

Parameters:
  • host (str) – (optional) Host of local proxy server

  • port (int) – (optional) Port of local proxy server

  • limit (int) – (optional) When will be found a requested number of working proxies, checking of new proxies will be lazily paused. Checking will be resumed if all the found proxies will be discarded in the process of working with them (see max_error_rate, max_resp_time). And will continue until it finds one working proxy and paused again. The default value is 100

  • max_tries (int) – (optional) The maximum number of attempts to handle an incoming request. If not specified, it will use the value specified during the creation of the Broker object. Attempts can be made with different proxies. The default value is 3

  • strategy (int) – (optional) The strategy used for picking proxy from pool. The default value is ‘best’

  • min_queue (int) –

    (optional) The minimum number of proxies to choose from

    before deciding which is the most suitable to use. The default value is 5

  • min_req_proxy (int) – (optional) The minimum number of processed requests to estimate the quality of proxy (in accordance with max_error_rate and max_resp_time). The default value is 5

  • max_error_rate (int) – (optional) The maximum percentage of requests that ended with an error. For example: 0.5 = 50%. If proxy.error_rate exceeds this value, proxy will be removed from the pool. The default value is 0.5

  • max_resp_time (int) – (optional) The maximum response time in seconds. If proxy.avg_resp_time exceeds this value, proxy will be removed from the pool. The default value is 8

  • prefer_connect (bool) – (optional) Flag that indicates whether to use the CONNECT method if possible. For example: If is set to True and a proxy supports HTTP proto (GET or POST requests) and CONNECT method, the server will try to use CONNECT method and only after that send the original request. The default value is False

  • http_allowed_codes (list) – (optional) Acceptable HTTP codes returned by proxy on requests. If a proxy return code, not included in this list, it will be considered as a proxy error, not a wrong/unavailable address. For example, if a proxy will return a 404 Not Found response - this will be considered as an error of a proxy. Checks only for HTTP protocol, HTTPS not supported at the moment. By default the list is empty and the response code is not verified

  • backlog (int) – (optional) The maximum number of queued connections passed to listen. The default value is 100

Raises:

ValueError – If limit is less than or equal to zero. Because a parsing of providers will be endless

Added in version 0.2.0.

stop()[source]

Stop all tasks, and the local proxy server if it’s running.

show_stats(verbose=False, **kwargs)[source]

Show statistics on the found proxies.

Useful for debugging, but you can also use if you’re interested.

Parameters:

verbose – Flag indicating whether to print verbose stats

Deprecated since version 0.2.0: Use verbose instead of full.

Proxy

class proxybroker.Proxy(host=None, port=None, types=(), timeout=8, verify_ssl=False)[source]

Bases: object

Proxy.

Parameters:
  • host (str) – IP address of the proxy

  • port (int) – Port of the proxy

  • types (tuple) – (optional) List of types (protocols) which may be supported by the proxy and which can be checked to work with the proxy

  • timeout (int) – (optional) Timeout of a connection and receive a response in seconds

  • verify_ssl (bool) – (optional) Flag indicating whether to check the SSL certificates. Set to True to check ssl certifications

Raises:

ValueError – If the host not is IP address, or if the port > 65535

async classmethod create(host, *args, **kwargs)[source]

Asynchronously create a Proxy object.

Parameters:
  • host (str) – A passed host can be a domain or IP address. If the host is a domain, try to resolve it

  • args – (optional) Positional arguments that Proxy takes

  • kwargs – (optional) Keyword arguments that Proxy takes

Returns:

Proxy object

Return type:

proxybroker.Proxy

Raises:
  • ResolveError – If could not resolve the host

  • ValueError – If the port > 65535

__repr__()[source]

Class representation e.g. <Proxy US 1.12 [HTTP: Anonymous, HTTPS] 10.0.0.1:8080>

property types

Types (protocols) supported by the proxy.

Where key is type, value is level of anonymity (only for HTTP, for other types level always is None).
Available types: HTTP, HTTPS, SOCKS4, SOCKS5, CONNECT:80, CONNECT:25
Available levels: Transparent, Anonymous, High.
Return type:

dict

property is_working

True if the proxy is working, False otherwise.

Return type:

bool

property writer
property reader
property priority
property error_rate

from 0 to 1.

For example: 0.7 = 70% requests ends with error.

Return type:

float

Added in version 0.2.0.

Type:

Error rate

property schemes

Return supported schemes.

property avg_resp_time

The average connection/response time.

Return type:

float

property avgRespTime

Deprecated property, use avg_resp_time instead.

Deprecated since version 2.0: Use avg_resp_time instead.

property geo

Geo information about IP address of the proxy.

Returns:

Named tuple with fields:
  • code - ISO country code

  • name - Full name of country

  • region_code - ISO region code

  • region_name - Full name of region

  • city_name - Full name of city

Return type:

collections.namedtuple

Changed in version 0.2.0: In previous versions return a dictionary, now named tuple.

property ngtr
as_json()[source]

Return the proxy’s properties in JSON format.

Return type:

dict

as_text()[source]

Return proxy as host:port

Return type:

str

log(msg, stime=0, err=None)[source]
get_log()[source]

Proxy log.

Returns:

The proxy log in format: (negotaitor, msg, runtime)

Return type:

tuple

Added in version 0.2.0.

async connect(ssl=False)[source]
close()[source]
async send(req)[source]
async recv(length=0, head_only=False)[source]

Provider

class proxybroker.Provider(url=None, proto=(), max_conn=4, max_tries=3, timeout=20, loop=None)[source]

Bases: object

Proxy provider.

Provider - a website that publish free public proxy lists.

Parameters:
  • url (str) – Url of page where to find proxies

  • proto (tuple) – (optional) List of the types (protocols) that may be supported by proxies returned by the provider. Then used as Proxy.types

  • max_conn (int) – (optional) The maximum number of concurrent connections on the provider

  • max_tries (int) – (optional) The maximum number of attempts to receive response

  • timeout (int) – (optional) Timeout of a request in seconds

property proxies

Return all found proxies.

Returns:

Set of tuples with proxy hosts, ports and types (protocols) that may be supported (from proto).

For example:

{(‘192.168.0.1’, ‘80’, (‘HTTP’, ‘HTTPS’), …)}

Return type:

set

async get_proxies()[source]

Receive proxies from the provider and return them.

Returns:

proxies

async get(url, data=None, headers=None, method='GET')[source]
find_proxies(page)[source]

Errors

Errors.

exception proxybroker.errors.ProxyError[source]

Bases: Exception

exception proxybroker.errors.NoProxyError[source]

Bases: Exception

exception proxybroker.errors.ResolveError[source]

Bases: Exception

exception proxybroker.errors.ProxyConnError[source]

Bases: ProxyError

errmsg = 'connection_failed'
exception proxybroker.errors.ProxyRecvError[source]

Bases: ProxyError

errmsg = 'connection_is_reset'
exception proxybroker.errors.ProxySendError[source]

Bases: ProxyError

errmsg = 'connection_is_reset'
exception proxybroker.errors.ProxyTimeoutError[source]

Bases: ProxyError

errmsg = 'connection_timeout'
exception proxybroker.errors.ProxyEmptyRecvError[source]

Bases: ProxyError

errmsg = 'empty_response'
exception proxybroker.errors.BadStatusError[source]

Bases: Exception

errmsg = 'bad_status'
exception proxybroker.errors.BadResponseError[source]

Bases: Exception

errmsg = 'bad_response'
exception proxybroker.errors.BadStatusLine[source]

Bases: Exception

errmsg = 'bad_status_line'
exception proxybroker.errors.ErrorOnStream[source]

Bases: Exception

errmsg = 'error_on_stream'

Utilities

Utils.

proxybroker.utils.get_headers(rv=False)[source]
proxybroker.utils.get_all_ip(page)[source]
proxybroker.utils.get_status_code(resp, start=9, stop=12)[source]
proxybroker.utils.parse_status_line(line)[source]
proxybroker.utils.parse_headers(headers)[source]
proxybroker.utils.update_geoip_db()[source]