4. Resolving Relative URLs

Connected: An Internet Encyclopedia
4. Resolving Relative URLs

Up: Connected: An Internet Encyclopedia
Up: Requests For Comments
Up: RFC 1808
Prev: 3.4. Default Base URL
Next: 5. Examples and Recommended Practice

4. Resolving Relative URLs

This section describes an example algorithm for resolving URLs within a context in which the URLs may be relative, such that the result is always a URL in absolute form. Although this algorithm cannot guarantee that the resulting URL will equal that intended by the original author, it does guarantee that any valid URL (relative or absolute) can be consistently transformed to an absolute form given a valid base URL.

The following steps are performed in order:

The base URL is established according to the rules of Section 3. If the base URL is the empty string (unknown), the embedded URL is interpreted as an absolute URL and we are done.
Both the base and embedded URLs are parsed into their component parts as described in Section 2.4.
1. If the embedded URL is entirely empty, it inherits the entire base URL (i.e., is set equal to the base URL) and we are done.
2. If the embedded URL starts with a scheme name, it is interpreted as an absolute URL and we are done.
3. Otherwise, the embedded URL inherits the scheme of the base URL.
If the embedded URL's <net_loc> is non-empty, we skip to Step 7. Otherwise, the embedded URL inherits the <net_loc> (if any) of the base URL.
If the embedded URL path is preceded by a slash "/", the path is not relative and we skip to Step 7.
If the embedded URL path is empty (and not preceded by a slash), then the embedded URL inherits the base URL path, and
1. if the embedded URL's <params> is non-empty, we skip to step 7; otherwise, it inherits the <params> of the base URL (if any) and
2. if the embedded URL's <query> is non-empty, we skip to step 7; otherwise, it inherits the <query> of the base URL (if any) and we skip to step 7.
The last segment of the base URL's path (anything following the rightmost slash "/", or the entire path if no slash is present) is removed and the embedded URL's path is appended in its place. The following operations are then applied, in order, to the new path:
1. All occurrences of "./", where "." is a complete path segment, are removed.
2. If the path ends with "." as a complete path segment, that "." is removed.
3. All occurrences of "<segment>/../", where <segment> is a complete path segment not equal to "..", are removed. Removal of these path segments is performed iteratively, removing the leftmost matching pattern on each iteration, until no matching pattern remains.
4. If the path ends with "<segment>/..", where <segment> is a complete path segment not equal to "..", that "<segment>/.." is removed.
The resulting URL components, including any inherited from the base URL, are recombined to give the absolute form of the embedded URL.

Parameters, regardless of their purpose, do not form a part of the URL path and thus do not affect the resolving of relative paths. In particular, the presence or absence of the ";type=d" parameter on an ftp URL does not affect the interpretation of paths relative to that URL. Fragment identifiers are only inherited from the base URL when the entire embedded URL is empty.

The above algorithm is intended to provide an example by which the output of implementations can be tested -- implementation of the algorithm itself is not required. For example, some systems may find it more efficient to implement Step 6 as a pair of segment stacks being merged, rather than as a series of string pattern matches.

Next: 5. Examples and Recommended Practice

Connected: An Internet Encyclopedia
4. Resolving Relative URLs