The URL at which your application shall appear is arguably the first part of the application's user interface that any user will see. Remember that a user of your application does not have to be a real person; in fact, a user can be any of the following things:
Some application developers have a fairly rigid view of what kind of information a URL should contain and how it should be structured. In this guide, we shall look at a number of different approaches.
What the URL is supposed to do is to say where (on the Internet or on an intranet) your application resides and which resource or service is being accessed, and these look like this:
http://www.boddie.org.uk/python/WebStack.html
In an application the full URL, containing the address of the machine on which it is running, is not always interesting. In the WebStack API (and in other Web programming frameworks), we also talk about "paths" - a path is just the part of the URL which refers to the resource or service, ignoring the actual Internet address, and so the above example would have a path which looks like this:
/python/WebStack.html
When writing a Web application, most of the time you just need to concentrate on the path because the address doesn't usually tell you anything you don't already know. What you need to do is to interpret the path specified in the request in order to work out which resource or service the user is trying to access.
WebStack provides the following transaction methods for inspecting path information:
get_path
encoding
parameter may be used to assist the process of converting the path to a Unicode object - see below.get_path_without_query
encoding
parameter may be used to assist the process of converting the path to a Unicode object - see below.get_path_without_info
encoding
parameter may be used to assist the process of converting the path to a Unicode object - see below.To obtain the above path using the WebStack API, we can write the following code:
path = trans.get_path()
Really, however, we should explicitly state the character encoding of the path. Unfortunately, as noted in "Character Encodings", some guesswork is required, but if we have decided to use UTF-8 as the encoding of our output, it is reasonable to specify UTF-8 here as well:
path = trans.get_path("utf-8")
path = trans.get_path(self.encoding) # assuming a class/instance attribute defining such things centrally
In many applications such nuances are not particularly important, but consider the following URL:
http://www.boddie.org.uk/python/WebStack-%E6%F8%E5.html
Here, the URL includes non-ASCII characters which must be interpreted somehow. In this case, the "URL encoded" character values refer to ISO-8859-1 values and can be safely inspected as follows:
path = trans.get_path("iso-8859-1")
The above usage of UTF-8 will also work in this case, but only because WebStack will use ISO-8859-1 as a "safe" default for character values it does not understand.
Sometimes, a "query string" will be provided as part of a URL; for example:
http://www.boddie.org.uk/application?param1=value1
The question mark character marks the beginning of the query string which contains encoded parameter information; such information and its inspection is discussed in "Request Parameters and Uploads".
WebStack provides a method to get only the query string from the URL:
get_query_string
%xx
where xx
is a two digit hexadecimal number referring to the byte value of the
unencoded character - see below for discussion of this. Note that unlike the path access methods, get_query_string
does not accept an encoding as a parameter. Moreover, when retrieving a
path including a query string, the encoding is not used to interpret
"URL encoded" character values in the query string itself. Consider
this example URL:
http://www.boddie.org.uk/application-%E6?var%F8=value%E5
Upon requesting the path and the query string, certain differences should be noticeable:
trans.get_path("iso-8859-1") # returns /application-æ?var%F8=value%E5
trans.get_path_without_query("iso-8859-1") # returns /application-æ
trans.get_query_string() # returns var%F8=value%E5
One reason for this seemingly arbitrary distinction in treatment is the way certain servers present path information to WebStack - often the "URL encoded" information has been replaced by raw character values which must then be converted to Unicode characters. In contrast, most servers do not perform the same automatic conversion on the query string.
In fact, it may become impossible to properly interpret the query string if it is decoded prematurely; consider this example URL:
http://www.boddie.org.uk/application?a=%26b
If we were to just decode the query string and then extract the
parameters/fields, the result would be two empty parameters with the
names a
and b
, as opposed to the correct interpretation of the query string as describing a single parameter a
with the value &b
.
Regardless of all this, all inspection of path parameters should be done using the appropriate methods (see "Request Parameters and Uploads"), and direct access to the query string should only occur in situations of a specialised nature such as the building of URLs for output.