Computer guys tend to lack imagination, especially when they work with acronyms. This may lead to a lot of funny stuff. Let’s look at the following acronyms, for instance: URI, URN, URL, and IRI.
In interviews I like to ask this question and only once a guy was able to give an almost 100% correct answer. Somehow, I was not particularly surprised about it, as even widely adopted specifications contain subtle mistakes.
What Is URI?
URI stands for Uniform Resource Identifier. The interesting (and confusing) thing is that URI can be classified as locations (URL – Uniform Resource Locator) or names (URN – Uniform Resource Name) or both. Wikipedia gives a nice example where the URN is compared to a person’s name and the URL to that person’s street address.
Mathematically speaking, the previous definition fails to specify if any URL is an URI and if any URN is an URI.
The answer in both cases is yes. Basically, a URL is a type of URI (subset) that identifies the resource using the representation of its network access.
On the other hand, the URN is just a name. For example, the ISBN for books are all URNs (e.g. 978-0143035008). So, an ISBN number is a URI designed to name a book in a unique manner and at worldwide level.
Technically, both terms (URL and URN) are obsolete and URI should be used instead.
Namespaces
When URIs appear in schemes, it gets very confusing. In the scheme context (for example xmlns), they are used for naming purposes, not for location. But they look like URLs…
For example, a namespace like http://ns.4psa.com/person is very common. If someone goes through a document that specifies such a namespace and does not properly understand the namespacing concept, one might assume this is a location. In the end, the URL turns out to be inaccessible in the browser. But this is perfectly normal!
In A Nutshell
In short, URI is the recommended term to use. A URI can provide indication about naming or location or both of them. What it does depends on the context. Anyone who sees an ISBN number can easily understand that it provides a name identifier, but it does not give any indication on how to locate that book.
Unfortunately, it’s not the same thing when people see something that looks like a hyperlink (as in the above namespace example). Many developers assume (at least) that it is also a location.
What about IRI?
IRI (International Resource Identifier) is a generalization of the URI. While URI supports only ASCI encoding, IRI fully supports international characters. In practice, UTF-8 is the most popular encoding used for IRI.
Understanding the difference between URI and IRI is important for validation and in choosing the correct identifier for the application.
While the temptation to use IRI is hard to resist, this is not always beneficial. By accepting international characters, IRI opens Pandora’s box for a lot of social exploits in the sense that, most of the time, users don’t pay attention to characters that seem to be almost identical (for example ä vs. a). For instance, they can mistakenly assume that http://www.päypal.com is actually http://www.paypal.com. This kind of homograph spoofing attack is widely discussed.
I hope that this article brings more light to the subject.
P.S. And it’s “Uniform”, not “Universal”. You will find this “variation” on the Internet as well 🙂
4 Comments
You can post comments in this post.
Excellent explanation, thx!
Eric Polin 10 years ago
Thanks for the post.
It was concise and easy to understand.
Oscar Palencia 9 years ago
Simply explained and to the point. Thanks a bunch.
David 9 years ago
The RFC 3987 names IRI “Internationalized Resource Identifiers”
Geraldo Xexéo 2 years ago
Post A Reply