-
-
Notifications
You must be signed in to change notification settings - Fork 33.9k
gh-67041: Allow to distinguish between empty and not defined URI components #123305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
a60c9be
a1dbfa6
b50b778
eaa9ce6
e5c31dd
78bdc13
5846bf2
b578c9d
7d59b7e
d025fa8
af0a520
76f2049
37044d9
ef79d2d
1429c9e
ba37be6
dd4bf2b
f70e5de
669fbb4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -50,12 +50,16 @@ | |
| The URL parsing functions focus on splitting a URL string into its components, | ||
| or on combining URL components into a URL string. | ||
|
|
||
| .. function:: urlparse(urlstring, scheme='', allow_fragments=True) | ||
| .. function:: urlparse(urlstring, scheme=None, allow_fragments=True, *, missing_as_none=False) | ||
|
|
||
| Parse a URL into six components, returning a 6-item :term:`named tuple`. This | ||
| corresponds to the general structure of a URL: | ||
| ``scheme://netloc/path;parameters?query#fragment``. | ||
| Each tuple item is a string, possibly empty. The components are not broken up | ||
| Each tuple item is a string, possibly empty, or ``None`` if | ||
| *missing_as_none* is true. | ||
| Not defined component are represented an empty string (by default) or | ||
| ``None`` if *missing_as_none* is true. | ||
| The components are not broken up | ||
| into smaller parts (for example, the network location is a single string), and % | ||
| escapes are not expanded. The delimiters as shown above are not part of the | ||
| result, except for a leading slash in the *path* component, which is retained if | ||
|
|
@@ -84,6 +88,12 @@ | |
| 80 | ||
| >>> o._replace(fragment="").geturl() | ||
| 'http://docs.python.org:80/3/library/urllib.parse.html?highlight=params' | ||
| >>> urlparse("http://docs.python.org?") | ||
| ParseResult(scheme='http', netloc='docs.python.org', | ||
| path='', params='', query='', fragment='') | ||
| >>> urlparse("http://docs.python.org?", missing_as_none=True) | ||
| ParseResult(scheme='http', netloc='docs.python.org', | ||
| path='', params=None, query='', fragment=None) | ||
|
|
||
| Following the syntax specifications in :rfc:`1808`, urlparse recognizes | ||
| a netloc only if it is properly introduced by '//'. Otherwise the | ||
|
|
@@ -101,47 +111,53 @@ | |
| ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html', | ||
| params='', query='', fragment='') | ||
| >>> urlparse('help/Python.html') | ||
| ParseResult(scheme='', netloc='', path='help/Python.html', params='', | ||
| query='', fragment='') | ||
| ParseResult(scheme='', netloc='', path='help/Python.html', | ||
| params='', query='', fragment='') | ||
| >>> urlparse('help/Python.html', missing_as_none=True) | ||
| ParseResult(scheme=None, netloc=None, path='help/Python.html', | ||
| params=None, query=None, fragment=None) | ||
|
|
||
| The *scheme* argument gives the default addressing scheme, to be | ||
| used only if the URL does not specify one. It should be the same type | ||
| (text or bytes) as *urlstring*, except that the default value ``''`` is | ||
| (text or bytes) as *urlstring* or ``None``, except that the ``''`` is | ||
| always allowed, and is automatically converted to ``b''`` if appropriate. | ||
|
|
||
| If the *allow_fragments* argument is false, fragment identifiers are not | ||
| recognized. Instead, they are parsed as part of the path, parameters | ||
| or query component, and :attr:`fragment` is set to the empty string in | ||
| the return value. | ||
| or query component, and :attr:`fragment` is set to ``None`` or the empty | ||
| string (depending on the value of *missing_as_none*) in the return value. | ||
|
|
||
| The return value is a :term:`named tuple`, which means that its items can | ||
| be accessed by index or as named attributes, which are: | ||
|
|
||
| +------------------+-------+-------------------------+------------------------+ | ||
| | Attribute | Index | Value | Value if not present | | ||
| +==================+=======+=========================+========================+ | ||
| | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter | | ||
| +------------------+-------+-------------------------+------------------------+ | ||
| | :attr:`netloc` | 1 | Network location part | empty string | | ||
| +------------------+-------+-------------------------+------------------------+ | ||
| | :attr:`path` | 2 | Hierarchical path | empty string | | ||
| +------------------+-------+-------------------------+------------------------+ | ||
| | :attr:`params` | 3 | Parameters for last | empty string | | ||
| | | | path element | | | ||
| +------------------+-------+-------------------------+------------------------+ | ||
| | :attr:`query` | 4 | Query component | empty string | | ||
| +------------------+-------+-------------------------+------------------------+ | ||
| | :attr:`fragment` | 5 | Fragment identifier | empty string | | ||
| +------------------+-------+-------------------------+------------------------+ | ||
| | :attr:`username` | | User name | :const:`None` | | ||
| +------------------+-------+-------------------------+------------------------+ | ||
| | :attr:`password` | | Password | :const:`None` | | ||
| +------------------+-------+-------------------------+------------------------+ | ||
| | :attr:`hostname` | | Host name (lower case) | :const:`None` | | ||
| +------------------+-------+-------------------------+------------------------+ | ||
| | :attr:`port` | | Port number as integer, | :const:`None` | | ||
| | | | if present | | | ||
| +------------------+-------+-------------------------+------------------------+ | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
| | Attribute | Index | Value | Value if not present | | ||
| +==================+=======+=========================+===============================+ | ||
| | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter or | | ||
| | | | | empty string [1]_ | | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
| | :attr:`netloc` | 1 | Network location part | ``None`` or empty string [1]_ | | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
| | :attr:`path` | 2 | Hierarchical path | empty string | | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
| | :attr:`params` | 3 | Parameters for last | ``None`` or empty string [1]_ | | ||
| | | | path element | | | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
| | :attr:`query` | 4 | Query component | ``None`` or empty string [1]_ | | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
| | :attr:`fragment` | 5 | Fragment identifier | ``None`` or empty string [1]_ | | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
| | :attr:`username` | | User name | ``None`` | | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
| | :attr:`password` | | Password | ``None`` | | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
| | :attr:`hostname` | | Host name (lower case) | ``None`` | | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
| | :attr:`port` | | Port number as integer, | ``None`` | | ||
| | | | if present | | | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
|
|
||
| .. [1] Depending on the value of the *missing_as_none* argument. | ||
|
|
||
| Reading the :attr:`port` attribute will raise a :exc:`ValueError` if | ||
| an invalid port is specified in the URL. See section | ||
|
|
@@ -187,12 +203,15 @@ | |
|
|
||
| .. versionchanged:: 3.6 | ||
| Out-of-range port numbers now raise :exc:`ValueError`, instead of | ||
| returning :const:`None`. | ||
| returning ``None``. | ||
|
|
||
| .. versionchanged:: 3.8 | ||
| Characters that affect netloc parsing under NFKC normalization will | ||
| now raise :exc:`ValueError`. | ||
|
|
||
| .. versionchanged:: next | ||
| Added the *missing_as_none* parameter. | ||
|
|
||
|
|
||
| .. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&') | ||
|
|
||
|
|
@@ -288,15 +307,27 @@ | |
|
|
||
|
|
||
| .. function:: urlunparse(parts) | ||
| urlunparse(parts, *, keep_empty) | ||
|
|
||
| Construct a URL from a tuple as returned by ``urlparse()``. The *parts* | ||
| argument can be any six-item iterable. This may result in a slightly | ||
| different, but equivalent URL, if the URL that was parsed originally had | ||
| unnecessary delimiters (for example, a ``?`` with an empty query; the RFC | ||
| states that these are equivalent). | ||
| argument can be any six-item iterable. | ||
|
|
||
| This may result in a slightly different, but equivalent URL, if the | ||
| URL that was parsed originally had unnecessary delimiters (for example, | ||
| a ``?`` with an empty query; the RFC states that these are equivalent). | ||
|
|
||
| If *keep_empty* is true, empty strings are kept in the result (for example, | ||
| a ``?`` for an empty query), only ``None`` components are omitted. | ||
| This allows rebuilding a URL that was parsed with option | ||
| ``missing_as_none=True``. | ||
| By default, *keep_empty* is true if *parts* is the result of the | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is easy to miss this "footnote" about keep_empty not being a simple False default. I think the function signature above should be more clear that it has a non-trivial default value when not explicitly specified. something awkward with a fake descriptive name but indicative that people should read further for details keep_empty=_FALSE_UNLESS_PARTS_IS_A_URLSPLIT_RESULT or similar perhaps?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pydoc will output
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think gps is requesting the signature in this doc file to be more explicit, not pydoc output.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I changed also the signature in this doc file. See above. It no longer contains |
||
| :func:`urlparse` call with ``missing_as_none=True``. | ||
|
|
||
| .. function:: urlsplit(urlstring, scheme='', allow_fragments=True) | ||
| .. versionchanged:: next | ||
| Added the *keep_empty* parameter. | ||
|
|
||
|
|
||
| .. function:: urlsplit(urlstring, scheme=None, allow_fragments=True, *, missing_as_none=False) | ||
|
|
||
| This is similar to :func:`urlparse`, but does not split the params from the URL. | ||
| This should generally be used instead of :func:`urlparse` if the more recent URL | ||
|
|
@@ -310,28 +341,31 @@ | |
| The return value is a :term:`named tuple`, its items can be accessed by index | ||
| or as named attributes: | ||
|
|
||
| +------------------+-------+-------------------------+----------------------+ | ||
| | Attribute | Index | Value | Value if not present | | ||
| +==================+=======+=========================+======================+ | ||
| | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter | | ||
| +------------------+-------+-------------------------+----------------------+ | ||
| | :attr:`netloc` | 1 | Network location part | empty string | | ||
| +------------------+-------+-------------------------+----------------------+ | ||
| | :attr:`path` | 2 | Hierarchical path | empty string | | ||
| +------------------+-------+-------------------------+----------------------+ | ||
| | :attr:`query` | 3 | Query component | empty string | | ||
| +------------------+-------+-------------------------+----------------------+ | ||
| | :attr:`fragment` | 4 | Fragment identifier | empty string | | ||
| +------------------+-------+-------------------------+----------------------+ | ||
| | :attr:`username` | | User name | :const:`None` | | ||
| +------------------+-------+-------------------------+----------------------+ | ||
| | :attr:`password` | | Password | :const:`None` | | ||
| +------------------+-------+-------------------------+----------------------+ | ||
| | :attr:`hostname` | | Host name (lower case) | :const:`None` | | ||
| +------------------+-------+-------------------------+----------------------+ | ||
| | :attr:`port` | | Port number as integer, | :const:`None` | | ||
| | | | if present | | | ||
| +------------------+-------+-------------------------+----------------------+ | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
| | Attribute | Index | Value | Value if not present | | ||
| +==================+=======+=========================+===============================+ | ||
| | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter or | | ||
| | | | | empty string [1]_ | | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
| | :attr:`netloc` | 1 | Network location part | ``None`` or empty string [2]_ | | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
| | :attr:`path` | 2 | Hierarchical path | empty string | | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
| | :attr:`query` | 3 | Query component | ``None`` or empty string [2]_ | | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
| | :attr:`fragment` | 4 | Fragment identifier | ``None`` or empty string [2]_ | | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
| | :attr:`username` | | User name | ``None`` | | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
| | :attr:`password` | | Password | ``None`` | | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
| | :attr:`hostname` | | Host name (lower case) | ``None`` | | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
| | :attr:`port` | | Port number as integer, | ``None`` | | ||
| | | | if present | | | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
|
|
||
| .. [2] Depending on the value of the *missing_as_none* argument. | ||
|
|
||
| Reading the :attr:`port` attribute will raise a :exc:`ValueError` if | ||
| an invalid port is specified in the URL. See section | ||
|
|
@@ -356,7 +390,7 @@ | |
|
|
||
| .. versionchanged:: 3.6 | ||
| Out-of-range port numbers now raise :exc:`ValueError`, instead of | ||
| returning :const:`None`. | ||
| returning ``None``. | ||
|
|
||
| .. versionchanged:: 3.8 | ||
| Characters that affect netloc parsing under NFKC normalization will | ||
|
|
@@ -368,15 +402,31 @@ | |
| .. versionchanged:: 3.12 | ||
| Leading WHATWG C0 control and space characters are stripped from the URL. | ||
|
|
||
| .. versionchanged:: next | ||
| Added the *missing_as_none* parameter. | ||
|
|
||
| .. _WHATWG spec: https://url.spec.whatwg.org/#concept-basic-url-parser | ||
|
|
||
| .. function:: urlunsplit(parts) | ||
| urlunsplit(parts, *, keep_empty) | ||
|
|
||
| Combine the elements of a tuple as returned by :func:`urlsplit` into a | ||
| complete URL as a string. The *parts* argument can be any five-item | ||
| iterable. This may result in a slightly different, but equivalent URL, if the | ||
| URL that was parsed originally had unnecessary delimiters (for example, a ? | ||
| with an empty query; the RFC states that these are equivalent). | ||
| iterable. | ||
|
|
||
| This may result in a slightly different, but equivalent URL, if the | ||
| URL that was parsed originally had unnecessary delimiters (for example, | ||
| a ``?`` with an empty query; the RFC states that these are equivalent). | ||
|
|
||
| If *keep_empty* is true, empty strings are kept in the result (for example, | ||
| a ``?`` for an empty query), only ``None`` components are omitted. | ||
| This allows rebuilding a URL that was parsed with option | ||
| ``missing_as_none=True``. | ||
| By default, *keep_empty* is true if *parts* is the result of the | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same comment as above. make it obvious from the function signature that the default depends on the type of parts. |
||
| :func:`urlsplit` call with ``missing_as_none=True``. | ||
|
|
||
| .. versionchanged:: next | ||
| Added the *keep_empty* parameter. | ||
|
|
||
|
|
||
| .. function:: urljoin(base, url, allow_fragments=True) | ||
|
|
@@ -422,30 +472,35 @@ | |
| Behavior updated to match the semantics defined in :rfc:`3986`. | ||
|
|
||
|
|
||
| .. function:: urldefrag(url) | ||
| .. function:: urldefrag(url, *, missing_as_none=False) | ||
|
|
||
| If *url* contains a fragment identifier, return a modified version of *url* | ||
| with no fragment identifier, and the fragment identifier as a separate | ||
| string. If there is no fragment identifier in *url*, return *url* unmodified | ||
| and an empty string. | ||
| and an empty string (by default) or ``None`` if *missing_as_none* is true. | ||
|
|
||
| The return value is a :term:`named tuple`, its items can be accessed by index | ||
| or as named attributes: | ||
|
|
||
| +------------------+-------+-------------------------+----------------------+ | ||
| | Attribute | Index | Value | Value if not present | | ||
| +==================+=======+=========================+======================+ | ||
| | :attr:`url` | 0 | URL with no fragment | empty string | | ||
| +------------------+-------+-------------------------+----------------------+ | ||
| | :attr:`fragment` | 1 | Fragment identifier | empty string | | ||
| +------------------+-------+-------------------------+----------------------+ | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
| | Attribute | Index | Value | Value if not present | | ||
| +==================+=======+=========================+===============================+ | ||
| | :attr:`url` | 0 | URL with no fragment | empty string | | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
| | :attr:`fragment` | 1 | Fragment identifier | ``None`` or empty string [3]_ | | ||
| +------------------+-------+-------------------------+-------------------------------+ | ||
|
|
||
| .. [3] Depending on the value of the *missing_as_none* argument. | ||
|
|
||
| See section :ref:`urlparse-result-object` for more information on the result | ||
| object. | ||
|
|
||
| .. versionchanged:: 3.2 | ||
| Result is a structured object rather than a simple 2-tuple. | ||
|
|
||
| .. versionchanged:: next | ||
| Added the *missing_as_none* parameter. | ||
|
|
||
| .. function:: unwrap(url) | ||
|
|
||
| Extract the url from a wrapped URL (that is, a string formatted as | ||
|
|
@@ -465,8 +520,9 @@ | |
| purity. | ||
|
|
||
| Instead of raising an exception on unusual input, they may instead return some | ||
| component parts as empty strings. Or components may contain more than perhaps | ||
| they should. | ||
| component parts as empty strings or ``None`` (depending on the value of the | ||
| *missing_as_none* argument). | ||
| Or components may contain more than perhaps they should. | ||
|
|
||
| We recommend that users of these APIs where the values may be used anywhere | ||
| with security implications code defensively. Do some verification within your | ||
|
|
@@ -542,7 +598,8 @@ | |
| Return the re-combined version of the original URL as a string. This may | ||
| differ from the original URL in that the scheme may be normalized to lower | ||
| case and empty components may be dropped. Specifically, empty parameters, | ||
| queries, and fragment identifiers will be removed. | ||
| queries, and fragment identifiers will be removed unless the URL was parsed | ||
| with ``missing_as_none=True``. | ||
|
|
||
| For :func:`urldefrag` results, only empty fragment identifiers will be removed. | ||
| For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be | ||
|
|
@@ -559,6 +616,9 @@ | |
| >>> r2 = urlsplit(r1.geturl()) | ||
| >>> r2.geturl() | ||
| 'http://www.Python.org/doc/' | ||
| >>> r3 = urlsplit(url, missing_as_none=True) | ||
| >>> r3.geturl() | ||
| 'http://www.Python.org/doc/#' | ||
|
|
||
|
|
||
| The following classes provide the implementations of the structured parse | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.