Complete API Reference (Auto-Generated)¶
This is the complete technical reference auto-generated from docstrings. For a user-friendly guide, see User Guide - Essential API.
Core Classes¶
Broker¶
- class proxybroker.Broker(queue=None, timeout=8, max_conn=200, max_tries=3, judges=None, providers=None, verify_ssl=False, loop=None, stop_broker_on_sigint=True, **kwargs)[source]
Bases:
objectThe Broker.
One broker to rule them all, one broker to find them,One broker to bring them all and in the darkness bind them.- Parameters:
queue (asyncio.Queue) – (optional) Queue of found/checked proxies
timeout (int) – (optional) Timeout of a request in seconds
max_conn (int) – (optional) The maximum number of concurrent checks of proxies
max_tries (int) – (optional) The maximum number of attempts to check a proxy
judges (list) – (optional) Urls of pages that show HTTP headers and IP address. Or
Judgeobjectsproviders (list) – (optional) Urls of pages where to find proxies. Or
Providerobjectsverify_ssl (bool) – (optional) Flag indicating whether to check the SSL certificates. Set to True to check ssl certifications
loop – (optional) asyncio compatible event loop
stop_broker_on_sigint – (optional) whether set SIGINT signal on broker object. Useful for a thread other than main thread.
Deprecated since version 0.2.0: Use
max_connandmax_triesinstead ofmax_concurrent_connandattempts_conn.- async grab(*, countries=None, limit=0)[source]
Gather proxies from the providers without checking.
- Parameters:
- async find(*, types=None, data=None, countries=None, post=False, strict=False, dnsbl=None, limit=0, **kwargs)[source]
Gather and check proxies from providers or from a passed data.
- Parameters:
types (list) – Types (protocols) that need to be check on support by proxy. Supported: HTTP, HTTPS, SOCKS4, SOCKS5, CONNECT:80, CONNECT:25 And levels of anonymity (HTTP only): Transparent, Anonymous, High
data – (optional) String or list with proxies. Also can be a file-like object supports read() method. Used instead of providers
countries (list) – (optional) List of ISO country codes where should be located proxies
post (bool) – (optional) Flag indicating use POST instead of GET for requests when checking proxies
strict (bool) – (optional) Flag indicating that anonymity levels of types (protocols) supported by a proxy must be equal to the requested types and levels of anonymity. By default, strict mode is off and for a successful check is enough to satisfy any one of the requested types
dnsbl (list) – (optional) Spam databases for proxy checking. Wiki
limit (int) – (optional) The maximum number of proxies
- Raises:
ValueError – If
typesnot given.
Changed in version 0.2.0: Added:
post,strict,dnsbl. Changed:typesis required.
- serve(host='127.0.0.1', port=8888, limit=100, **kwargs)[source]
Start a local proxy server.
The server distributes incoming requests to a pool of found proxies.
When the server receives an incoming request, it chooses the optimal proxy (based on the percentage of errors and average response time) and passes to it the incoming request.
In addition to the parameters listed below are also accept all the parameters of the
find()method and passed it to gather proxies to a pool.- Parameters:
host (str) – (optional) Host of local proxy server
port (int) – (optional) Port of local proxy server
limit (int) – (optional) When will be found a requested number of working proxies, checking of new proxies will be lazily paused. Checking will be resumed if all the found proxies will be discarded in the process of working with them (see
max_error_rate,max_resp_time). And will continue until it finds one working proxy and paused again. The default value is 100max_tries (int) – (optional) The maximum number of attempts to handle an incoming request. If not specified, it will use the value specified during the creation of the
Brokerobject. Attempts can be made with different proxies. The default value is 3strategy (int) – (optional) The strategy used for picking proxy from pool. The default value is ‘best’
min_queue (int) –
- (optional) The minimum number of proxies to choose from
before deciding which is the most suitable to use. The default value is 5
min_req_proxy (int) – (optional) The minimum number of processed requests to estimate the quality of proxy (in accordance with
max_error_rateandmax_resp_time). The default value is 5max_error_rate (int) – (optional) The maximum percentage of requests that ended with an error. For example: 0.5 = 50%. If proxy.error_rate exceeds this value, proxy will be removed from the pool. The default value is 0.5
max_resp_time (int) – (optional) The maximum response time in seconds. If proxy.avg_resp_time exceeds this value, proxy will be removed from the pool. The default value is 8
prefer_connect (bool) – (optional) Flag that indicates whether to use the CONNECT method if possible. For example: If is set to True and a proxy supports HTTP proto (GET or POST requests) and CONNECT method, the server will try to use CONNECT method and only after that send the original request. The default value is False
http_allowed_codes (list) – (optional) Acceptable HTTP codes returned by proxy on requests. If a proxy return code, not included in this list, it will be considered as a proxy error, not a wrong/unavailable address. For example, if a proxy will return a
404 Not Foundresponse - this will be considered as an error of a proxy. Checks only for HTTP protocol, HTTPS not supported at the moment. By default the list is empty and the response code is not verifiedbacklog (int) – (optional) The maximum number of queued connections passed to listen. The default value is 100
- Raises:
ValueError – If
limitis less than or equal to zero. Because a parsing of providers will be endless
Added in version 0.2.0.
- stop()[source]
Stop all tasks, and the local proxy server if it’s running.
- show_stats(verbose=False, **kwargs)[source]
Show statistics on the found proxies.
Useful for debugging, but you can also use if you’re interested.
- Parameters:
verbose – Flag indicating whether to print verbose stats
Deprecated since version 0.2.0: Use
verboseinstead offull.
Proxy¶
- class proxybroker.Proxy(host=None, port=None, types=(), timeout=8, verify_ssl=False)[source]
Bases:
objectProxy.
- Parameters:
host (str) – IP address of the proxy
port (int) – Port of the proxy
types (tuple) – (optional) List of types (protocols) which may be supported by the proxy and which can be checked to work with the proxy
timeout (int) – (optional) Timeout of a connection and receive a response in seconds
verify_ssl (bool) – (optional) Flag indicating whether to check the SSL certificates. Set to True to check ssl certifications
- Raises:
ValueError – If the host not is IP address, or if the port > 65535
- async classmethod create(host, *args, **kwargs)[source]
Asynchronously create a
Proxyobject.- Parameters:
host (str) – A passed host can be a domain or IP address. If the host is a domain, try to resolve it
args – (optional) Positional arguments that
Proxytakeskwargs – (optional) Keyword arguments that
Proxytakes
- Returns:
Proxyobject- Return type:
proxybroker.Proxy
- Raises:
ResolveError – If could not resolve the host
ValueError – If the port > 65535
- __repr__()[source]
Class representation e.g. <Proxy US 1.12 [HTTP: Anonymous, HTTPS] 10.0.0.1:8080>
- property types
Types (protocols) supported by the proxy.
Where key is type, value is level of anonymity (only for HTTP, for other types level always is None).Available types: HTTP, HTTPS, SOCKS4, SOCKS5, CONNECT:80, CONNECT:25Available levels: Transparent, Anonymous, High.- Return type:
- property is_working
True if the proxy is working, False otherwise.
- Return type:
- property writer
- property reader
- property priority
- property error_rate
from 0 to 1.
For example: 0.7 = 70% requests ends with error.
- Return type:
Added in version 0.2.0.
- Type:
Error rate
- property schemes
Return supported schemes.
- property avg_resp_time
The average connection/response time.
- Return type:
- property avgRespTime
Deprecated property, use avg_resp_time instead.
Deprecated since version 2.0: Use
avg_resp_timeinstead.
- property geo
Geo information about IP address of the proxy.
- Returns:
- Named tuple with fields:
code- ISO country codename- Full name of countryregion_code- ISO region coderegion_name- Full name of regioncity_name- Full name of city
- Return type:
collections.namedtuple
Changed in version 0.2.0: In previous versions return a dictionary, now named tuple.
- property ngtr
- log(msg, stime=0, err=None)[source]
- get_log()[source]
Proxy log.
- Returns:
The proxy log in format: (negotaitor, msg, runtime)
- Return type:
Added in version 0.2.0.
- async connect(ssl=False)[source]
- close()[source]
- async send(req)[source]
- async recv(length=0, head_only=False)[source]
Provider¶
- class proxybroker.Provider(url=None, proto=(), max_conn=4, max_tries=3, timeout=20, loop=None)[source]
Bases:
objectProxy provider.
Provider - a website that publish free public proxy lists.
- Parameters:
url (str) – Url of page where to find proxies
proto (tuple) – (optional) List of the types (protocols) that may be supported by proxies returned by the provider. Then used as
Proxy.typesmax_conn (int) – (optional) The maximum number of concurrent connections on the provider
max_tries (int) – (optional) The maximum number of attempts to receive response
timeout (int) – (optional) Timeout of a request in seconds
- property proxies
Return all found proxies.
- Returns:
Set of tuples with proxy hosts, ports and types (protocols) that may be supported (from
proto).- For example:
{(‘192.168.0.1’, ‘80’, (‘HTTP’, ‘HTTPS’), …)}
- Return type:
- async get(url, data=None, headers=None, method='GET')[source]
- find_proxies(page)[source]
Errors¶
Errors.
- exception proxybroker.errors.ProxyConnError[source]
Bases:
ProxyError- errmsg = 'connection_failed'
- exception proxybroker.errors.ProxyRecvError[source]
Bases:
ProxyError- errmsg = 'connection_is_reset'
- exception proxybroker.errors.ProxySendError[source]
Bases:
ProxyError- errmsg = 'connection_is_reset'
- exception proxybroker.errors.ProxyTimeoutError[source]
Bases:
ProxyError- errmsg = 'connection_timeout'
- exception proxybroker.errors.ProxyEmptyRecvError[source]
Bases:
ProxyError- errmsg = 'empty_response'
Utilities¶
Utils.
- proxybroker.utils.get_headers(rv=False)[source]
- proxybroker.utils.get_all_ip(page)[source]
- proxybroker.utils.get_status_code(resp, start=9, stop=12)[source]
- proxybroker.utils.parse_status_line(line)[source]
- proxybroker.utils.parse_headers(headers)[source]
- proxybroker.utils.update_geoip_db()[source]