Connected: An Internet Encyclopedia
4. Resolving Relative URLs
Up:
Connected: An Internet Encyclopedia
Up:
Requests For Comments
Up:
RFC 1808
Prev: 3.4. Default Base URL
Next: 5. Examples and Recommended Practice
4. Resolving Relative URLs
4. Resolving Relative URLs
This section describes an example algorithm for resolving URLs within
a context in which the URLs may be relative, such that the result is
always a URL in absolute form. Although this algorithm cannot
guarantee that the resulting URL will equal that intended by the
original author, it does guarantee that any valid URL (relative or
absolute) can be consistently transformed to an absolute form given a
valid base URL.
The following steps are performed in order:
- The base URL is established according to the rules of
Section 3. If the base URL is the empty string (unknown),
the embedded URL is interpreted as an absolute URL and
we are done.
- Both the base and embedded URLs are parsed into their
component parts as described in Section 2.4.
- If the embedded URL is entirely empty, it inherits the
entire base URL (i.e., is set equal to the base URL)
and we are done.
- If the embedded URL starts with a scheme name, it is
interpreted as an absolute URL and we are done.
- Otherwise, the embedded URL inherits the scheme of
the base URL.
- If the embedded URL's <net_loc> is non-empty, we skip to
Step 7. Otherwise, the embedded URL inherits the <net_loc>
(if any) of the base URL.
- If the embedded URL path is preceded by a slash "/", the
path is not relative and we skip to Step 7.
- If the embedded URL path is empty (and not preceded by a
slash), then the embedded URL inherits the base URL path,
and
- if the embedded URL's <params> is non-empty, we skip to
step 7; otherwise, it inherits the <params> of the base
URL (if any) and
- if the embedded URL's <query> is non-empty, we skip to
step 7; otherwise, it inherits the <query> of the base
URL (if any) and we skip to step 7.
- The last segment of the base URL's path (anything
following the rightmost slash "/", or the entire path if no
slash is present) is removed and the embedded URL's path is
appended in its place. The following operations are
then applied, in order, to the new path:
- All occurrences of "./", where "." is a complete path
segment, are removed.
- If the path ends with "." as a complete path segment,
that "." is removed.
- All occurrences of "<segment>/../", where <segment> is a
complete path segment not equal to "..", are removed.
Removal of these path segments is performed iteratively,
removing the leftmost matching pattern on each iteration,
until no matching pattern remains.
- If the path ends with "<segment>/..", where <segment> is a
complete path segment not equal to "..", that
"<segment>/.." is removed.
- The resulting URL components, including any inherited from
the base URL, are recombined to give the absolute form of
the embedded URL.
Parameters, regardless of their purpose, do not form a part of the
URL path and thus do not affect the resolving of relative paths. In
particular, the presence or absence of the ";type=d" parameter on an
ftp URL does not affect the interpretation of paths relative to that
URL. Fragment identifiers are only inherited from the base URL when
the entire embedded URL is empty.
The above algorithm is intended to provide an example by which the
output of implementations can be tested -- implementation of the
algorithm itself is not required. For example, some systems may find
it more efficient to implement Step 6 as a pair of segment stacks
being merged, rather than as a series of string pattern matches.
Next: 5. Examples and Recommended Practice
Connected: An Internet Encyclopedia
4. Resolving Relative URLs