sopel.tools.web#
The tools.web package contains utility functions for interaction with web
applications, APIs, or websites in your plugins.
New in version 7.0.
- sopel.tools.web.r_entity = re.compile('&([^;\\s]+);')#
 Regular expression to match HTML entities.
Deprecated since version 8.0: Will be removed in Sopel 9, along with
entity().
- sopel.tools.web.DEFAULT_HEADERS = {'User-Agent': 'Sopel/8.0.0.dev0 (https://sopel.chat)'}#
 Default header dict for use with
requestsmethods.Use it like this:
import requests from sopel.tools import web result = requests.get( 'https://some.site/api/endpoint', headers=web.DEFAULT_HEADERS )
Important
You should never modify this directly in your plugin code. Make a copy and use
update()if you need to add or change headers:from sopel.tools import web default_headers = web.DEFAULT_HEADERS.copy() custom_headers = {'Accept': 'text/*'} default_headers.update(custom_headers)
- sopel.tools.web.USER_AGENT = 'Sopel/8.0.0.dev0 (https://sopel.chat)'#
 User agent string to be sent with HTTP requests.
Meant to be passed like so:
import requests from sopel.tools import web result = requests.get( 'https://some.site/api/endpoint', user_agent=web.USER_AGENT )
- sopel.tools.web.decode(text)#
 Decode HTML entities into Unicode text.
- Parameters:
 text (str) – the HTML page or snippet to process
- Return str:
 textwith all entity references replaced
Changed in version 8.0: Renamed
htmlparameter totext. (Python gained a standard library module namedhtmlin version 3.4.)
- sopel.tools.web.entity(match)#
 Convert an entity reference to the appropriate character.
- Parameters:
 match (str) – the entity name or code, as matched by
r_entity- Return str:
 the Unicode character corresponding to the given
matchstring, or a fallback representation if the reference cannot be resolved to a character
Deprecated since version 8.0: Will be removed in Sopel 9. Use
decode()directly or migrate to Python’s standard-library equivalent,html.unescape().
- sopel.tools.web.iri_to_uri(iri)#
 Decodes an internationalized domain name (IDN).
- sopel.tools.web.quote(string, safe='/')#
 Safely encodes a string for use in a URL.
- Parameters:
 - Return str:
 the
stringwith special characters URL-encoded
Note
This is a shim to make writing cross-compatible plugins for both Python 2 and Python 3 easier.
- sopel.tools.web.quote_query(string)#
 Safely encodes a URL’s query parameters.
- Parameters:
 string (str) – a URL containing query parameters
- Return str:
 the input URL with query parameter values URL-encoded
- sopel.tools.web.search_urls(text, exclusion_char=None, clean=False, schemes=None)#
 Extracts all URLs in
text.- Parameters:
 text (str) – the text to search for URLs
exclusion_char (str) – optional character that, if placed before a URL in the
text, will exclude it from being extractedclean (bool) – if
True, all found URLs are passed throughtrim_url()before being returned; defaultFalseschemes (list) – optional list of URL schemes to look for; defaults to
['http', 'https', 'ftp']
- Returns:
 generator iterator of all URLs found in
text
To get the URLs as a plain list, use e.g.:
list(search_urls(text))
- sopel.tools.web.trim_url(url)#
 Removes extra punctuation from URLs found in text.
- Parameters:
 url (str) – the raw URL match
- Return str:
 the cleaned URL
This function removes trailing punctuation that looks like it was not intended to be part of the URL:
trailing sentence- or clause-ending marks like
.,;, etc.unmatched trailing brackets/braces like
},), etc.
It is intended for use with the output of
search_urls(), which may include trailing punctuation when used on input from chat.
- sopel.tools.web.unquote(string)#
 Decodes a URL-encoded string.
- Parameters:
 string (str) – the string to decode
- Return str:
 the decoded
string
Note
This is a convenient shortcut for
urllib.parse.unquote.
- sopel.tools.web.urlencode(
 - query,
 - doseq=False,
 - safe='',
 - encoding=None,
 - errors=None,
 - quote_via=<function quote_plus>,
 Encode a dict or sequence of two-element tuples into a URL query string.
If any values in the query arg are sequences and doseq is true, each sequence element is converted to a separate parameter.
If the query arg is a sequence of two-element tuples, the order of the parameters in the output will match the order of parameters in the input.
The components of a query arg may each be either a string or a bytes type.
The safe, encoding, and errors parameters are passed down to the function specified by quote_via (encoding and errors only if a component is a str).
- sopel.tools.web.urlencode_non_ascii(b)#
 Safely encodes non-ASCII characters in a URL.