Thursday, 31 July 2008

Bug trackers and version control

Written by Martin Kleppmann on Thursday, 31 July 2008, 09:14 GMT.
Filed under: software, techie notes.

I believe strongly that teams of software engineers should have good tools which help to manage the development process (as a minimum, bug tracking and version control of source code). We use Subversion and FogBugz, but there are lots of other good tools too. These tools get particularly useful when connected, so that it’s possible to see e.g. which changes were made to the code in order to fix a particular bug.

The usual way of connecting version control and issue tracking is that developers must enter a bug number every time they make a commit to the source code repository. And because most version control systems don’t foresee integration with a bug tracker, this is usually this is just a special string in the commit message (”BUG:12345″). A bit ugly, but works.

A neater way of doing this, for users of  Subclipse or TortoiseSVN, is to set a few magic properties in the base directory of your project. These graphical front-ends detect the presence of those properties and add a separate input box for the bug number to the commit dialog. The number entered there just gets translated into a line in the commit message, so it’s nothing magic, but it helps as a reminder to put in the bug number, and avoids having to remember the syntax for the bug reference (was it “Bug:12345″ or “Bug#12345″?).

Full details of this ‘bugtraq’ convention are described on the “svn commit ./me” blog.

Here’s what it looks like:

Screenshot of bug tracker, subversion and eclipse integration

Screenshot of bug tracker, subversion and eclipse integration

Saturday, 26 July 2008

Ubuntu upgrade woes

Written by Martin Kleppmann on Saturday, 26 July 2008, 15:19 GMT.
Filed under: software, techie notes.

This page has been offline for the last 24 hours because I messed up the server on which it was running (now I’ve moved it to another one). I officially hate system maintenance…

The server was running Ubuntu Feisty (7.04) which is now getting a bit old and gradually coming towards the end of its support lifetime. An upgrade would be necessary before too long anyway, so I decided to try updating it using the do-release-upgrade tool. My setup is pretty sane, using only standard packages, so I thought the tool would be able to handle the upgrade smoothly.

Unfortunately, no. At some point during the configuration of the new packages, dpkg started dying by segmentation fault, and from there on everything just went downhill. In the end I had a system with dozens of broken packages (and no way of fixing them with dpkg segfaulting), and quite a few daemons dying similarly. Probably some important shared library got replaced with an incompatible version — rubbish! Apache, MySQL and OpenVPN were dead, and I set them up on a new server; fortunately, Bind, Postfix and Dovecot somehow survived, so at least our emails are still getting through, and I could point the domain names to the new server’s IP address. Still, bloody annoying.

Why is server configuration and maintenance such a crap activity? What I really want is a sort of version control system for installed packages and configuration file changes. My wish list:

  • The version control should save only modifications to config files (differences from the package maintainers’ versions) so that the same modifications can be merged into the configuration files for a new version of the packages. This also makes the setup more maintainable, because it is easier for one admin to see the configuration changes made by another admin. Elementary software engineering stuff really.
  • I want to be able to generate a functionally identical system by taking a barebones install, applying the package installations and config file changes from version control, and copying in the contents of /home and /var/lib. Then I would never ever do an upgrade to a new release of the distribution; instead I would do a fresh install of the new release, configure it like the old one, test it thoroughly to make sure that it works, synchronise the contents, and switch them over in an instant. MUCH less stressful and error-prone.
  • It should still be possible to edit the server’s config files like on any other, since all sorts of changes need to be made in day-to-day operation; such changes should be checked into the version control so that they are recorded and documented.

I have already built myself a tool for generating Amazon EC2 images — effectively a few Python and Shell scripts which take a barebones install and configure it completely for a particular application; I keep these scripts in version control so the build process is completely transparent and reproducible. However, if I make any modification to the image by hand, I need to remember to enter the same changes into the scripts, so that they still correctly reflect the build process. Really what I want is that manual step to go away.

I’m thinking that it oughtn’t be too hard to build such a system configuration management framework by putting /etc in a standard version control system, recording changes elsewhere in the filesystem, and making clever wrappers for a few standard maintenance tools such as apt-get (which just remembers somewhere which packages have been installed). Has nobody built something like that yet?

Sunday, 11 May 2008

Ruby on Rails vs. Java Enterprise

Written by Martin Kleppmann on Sunday, 11 May 2008, 01:51 GMT.
Filed under: software, techie notes.

Choosing a web application framework. Aaaaargh, the pain.

Ok, so we’re actually in a pretty lucky situation right now: We have a new, substantial web development project, which we’re designing from scratch. It’s for a feature-rich, complex web application, it has got to scale well, and it has got to be maintainable. Those are the core requirements, and I thought they didn’t sound too demanding really. Between us in the team we have experience in most of today’s common programming languages, so that wasn’t an important point; what we wanted was the web framework which was objectively the best given our requirements.

Most frameworks, it seemed, were essentially suitable only for toy projects. There is a particularly ridiculous number of open-source web frameworks, and most of them don’t appear to be widely used in real-world situations. When I say real world, I don’t mean your average blog about kittens; I mean a site with at least 100 million page views per month or so. And I wanted the framework to be reasonably widely used, so that other people will have found the fatal bugs and already fixed them before we come along.

This left me with three options which seemed to me to be worth taking seriously:

  1. Java Platform Enterprise Edition (JEE)
  2. Ruby on Rails
  3. Microsoft ASP.NET

Of course there are plenty more (Django came into consideration, for example) but judging by the contents of the computer section of my local book shop, those others must be pretty niche. Although ASP.NET looks like a reasonably well-designed platform I had to unfortunately exclude it straight away, because I don’t want the risk of being locked into a Microsoft platform (and I’m not yet sure how reliable Mono is).

So the showdown is between Java Enterprise and Ruby on Rails, and the contenders could hardly differ more in terms of their culture. In a nutshell, Ruby on Rails focuses on fast and agile development, while Java Enterprise focuses on flexibility and integration with enterprise IT. In Ruby on Rails, the common tasks are made very very simple, and if you stick within those common tasks, life as a developer is bliss; in Java Enterprise, even the simplest jobs require you to write ugly XML configuration or auto-generate boilerplate code. Ruby on Rails is what you’ll find on the servers of a cool young Web 2.0 start-up; Java Enterprise is found in the IT nerve-centres of investment banks. In a completely clichéd world view, Ruby on Rails hackers wear Jeans and T-Shirt, have a MacBook and a Linux server, and have a lot of fun; while Java Enterprise software engineers wear suits, have a Windows laptop and a Solaris server, and act very seriously. (I exaggerate.)

The instinctive tendency here is obviously to go with the fun, informal start-up types than with the boring corporate types. But wait, I was trying to make an objective decision, and the discussions about these platforms are already quasi-religious and emotional enough.

I have tried to come up with a list of criteria by which I want to judge these to frameworks. One by one, in no particular order:

  • Developer productivity and learning curve for new developers. Ruby on Rails will get you started quicker, there’s not much doubt about that. In Java, I think the worst problem for a newcomer is actually the massive choice of different libraries to use and different ways to do the same thing — there doesn’t appear to be any combination of libraries which is particularly widely used; instead every tutorial, every manual and every book you find will use a different combination of technologies and libraries, which makes it particularly hard to learn the first steps. Because there isn’t just one obviously right way of doing something in Java EE, developers can spend a lot of time sorting out niggling little details, which slows them down and can be demotivating. I believe that if there was a full standard distribution for Java EE (e.g. Icefaces+JSF+Seam+EJB3+JPA+Hibernate+Glassfish, to name just one possible API stack out of thousands of different combinations), people would write a lot more good documentation for that particular stack, and it would become a lot easier for more developers to start using it effectively. With the right tools and good documentation, productivity could potentially be about the same for Rails and Java, but at the moment Java is shooting itself in the foot in this regard.
  • Ease of recruiting good and motivated developers. I’ve not yet worked out how the two frameworks compare in this point. There are not many people who know Ruby, but those two do tend to be pretty passionate about it, so there’s an increased chance they will do a good job. Just about every computer science student learns Java these days (except at a few misguided institutions where they still teach C++), so there are plenty of Java developers around, although it’s not clear how many of them are actually really good.
  • Standardisation of platform. This point is a combination of the last two points. If a lot of people use a platform, it is likely to be more stable, less buggy, better documented, better designed, faster, more supported in the long term, more interoperable with other systems, and so on. However, you can’t just judge by number of users — so many developers worldwide use plain PHP, the poor souls, and probably don’t even realise that although their LAMP platform is a quasi-standard, it’s complete rubbish for developing non-trivial applications.
  • Manageability of a complex code base. This is the point where I personally really appreciate statically typed programming languages (call me old-fashioned). With Java in Eclipse, you can immediately search for the place where something is defined, you can refactor class and method names without too much fear (except when you use Java names in XML files), you get decent code completion, you know at compile-time which methods an object has (rather than calling something speculatively which may or may not be there), and so on. Dynamically typed scripting languages are great for prototyping and writing things quickly, but I find they start getting a bit scary and tricky to handle once you go past a few thousand lines of code.
  • Testability and robustness — i.e. “if I change something here, how likely is it that I will break something at the other end of the application?”. Fortunately, both frameworks offer reasonable support for automated regression testing; Ruby on Rails probably a bit more so, because it relies primarily on automated tests (rather than a type system) to ensure things don’t fall apart horribly.
  • Scalability, stability and reliability. This is a pretty important one, since any outage immediately gives a very bad impression to your users, and can potentially cost a lot of money. However, it’s hard to get hold of accurate reports on how well the frameworks behave in a harsh production environment, with a large number of concurrent users (hundreds of page views per second) and a large database. I’m inclined to attribute better stability to Java because it’s almost certainly used in corporates for more critical applications than Ruby is. Ruby on Rails, on the other hand, has had some pretty bad press concerning its scalability. The biggest RoR deployment at the moment is apparently Yellow Pages; rather embarrassingly for both them and for Rails, this site has actually been down for at least the last 12 hours as I write this post.
  • Internationalisation and Unicode. There are translation features in both frameworks, and they are both ok, although none really strikes me as brilliant. The bigger issue is with Unicode support. I agree fully with Joel Spolsky’s opinions in this matter — there is no such thing as 8-bit plain text. And yet, I could hardly believe my eyes, Ruby treats strings just as an array of bytes, 8 bits per character, unspecified encoding. You can put UTF-8 in them, but then the standard string editing methods will rip them up and make them invalid. For heaven’s sake, guys, we live in the 21st century! Do you not want to sell your software worldwide or what? Ok, there are some UTF-8-safe methods, but you’ve got to remember to call them explicitly. At least Java encourages you to specify an encoding when you convert between byte streams and strings (although in my opinion it still isn’t radical enough — all conversion methods without an explicit charset parameter ought to be marked as deprecated, to make it really clear to the developer what they are doing).
  • Toolchain support. Java has been around for a long time, and some pretty powerful tools to support it have evolved in that time (we are using the commercial MyEclipse and it’s doing a good job for us). Ruby on Rails is younger, but is actually quite well supported too due to its active developer community — Aptana is pretty neat, for example.
  • Libraries for additional functionality. Since Java is so common, libraries for it are also very common — pretty much whatever you want to do, you can download a jar file to do it for you. With Ruby we’re not that far yet, but more and more libraries are being ported, so this is not likely to be a very limiting factor.
  • Ease of integration with back end. We’ve got some algorithmic, number-crunching applications running in the background handling all the really clever technology. These are written in Java, and the web framework has got to be able to communicate with them. If the web tier is also in Java, that is easy; if it’s not, we have to write an explicit interface, but it’s not the end of the world either.
  • Integrating with and impression on customers. Who’s your customer? If your application is targeted at general consumers who will use it over the web, they don’t care what framework you use as long as it works well. However, if the software is actually going to be licensed to an enterprise where the IT department have to integrate it with their own systems, then the technology matters. Not so much because of real difficulties (most integrations probably happen on the database level anyway, so the application server is pretty irrelevant) but because they are going to ask you what technology you use; if it’s different from what they use themselves, they will suspect that it’s going to be more effort to integrate, which might become their excuse not to buy from you.
  • Long-term support. In 3–5 years’ time, what will have happened to the software? It may be that our application is still running, but will we still get security updates and bugfixes for the framework? Will somebody come along and completely re-write the API so that we in turn have to invest a lot in porting the application to the new API, or else risk running on an unsupported platform? This is crystal ball gazing of course, but to be honest: Java Enterprise is backed by Sun, and if there is one thing which big corporations tend to be good at, it’s keeping things the way they are. With Rails I fear more about the changeability of the framework over time. That said, the move from J2EE 1.4 to JEE 5 (two years ago today, incidentally) was pretty major, so maybe they are both equally changeable.

After all those points I’d like to finish off with a few things which I would rather not have:

  • Too much flexibility in combining technologies. Abstract APIs create an illusion of portability — ok, so you think you can replace your PostgreSQL with Oracle by changing one line in an XML configuration file. Really? Not even subtle differences in query semantics? And how often are you going to make such a major change to the system like changing your database server? I don’t think it’s exactly such a common case that it has to be optimised for. It’s not bad to have those APIs, but they don’t add as much value as you might think at first. And as I mentioned above, having too much choice about the components from which you can assemble your system implies a lack of useful documentation and a slower learning curve for new developers.
  • Boilerplate and duplicate code. You can create it with a code generation tool, with a bit of luck you can even keep it up-to-date with that tool, but it’s still extremely ugly and makes things harder to maintain. The main advantage of scripting languages in my mind is that things like database model objects are defined at runtime rather than compile-time, eliminating all that generated code. However, you then lose the static type system, so you can’t win on this one.

So where does that leave us? Do you have any further aspects or information which we should consider, or any specific experience with these technologies? Let us know by the comments below.

Wednesday, 16 April 2008

One day without computers and digital stuff, is it possible? (Part 3)

Written by Johannes Hauser on Wednesday, 16 April 2008, 22:36 GMT.
Filed under: business, electronic devices, power-off day, techie notes, user experience.

Lunchtime! Vegetables au gratin, not bad after all. In our factory canteen you can only pay cash. We’re nearly the last one. I know of a couple of lunchrooms who do accept only chip cards which you can charge on an automat. This may have some advancements (you don’t have to mess with loose change), but after all, it’s just one step more between me and my food, isn’t it?

Spending the afternoon might become a challenge. My boss cares for the first hour with an unexpected meeting. Meeting is just another word for the collective comparison of PDAs and laptops among my troglodyte colleagues (Me have bigger club. Me leader!). I earn some disbelieving looks and return them with a Yes-I-am-using-paper-and-a-pencil-because-I-have-everything-under-control-anyway expression. — Surprisingly, this works. Even that good that my boss assigns a task to me which was scheduled to someone else in the beginning. I would never have believed that one day ragged paper and an IKEA pencil could become insignia of superiority. Question is: What do they think I want to show that way? That I care for the really important things? That I have everything in mind?

I spend the rest of the day setting up an experiment which is mainly manual work and taking notes. My colleagues are wondering why I’m always coming around instead of using the telephone. This makes me wonder which one is more disturbing: The phone ringing or someone knocking on the door? As for me, the phone causes more stress because it gives you the impression of total urgency: If you don’t pick up the receiver immediately, it will stop ringing and you will miss something important. But once you have picked up, you must start the conversation. If someone comes around, I can tell him to wait for some thirty seconds without him running away again. What do you think?

Sunday, 30 March 2008

Do-it-yourself 3G iPhone

Written by Martin Kleppmann on Sunday, 30 March 2008, 16:46 GMT.
Filed under: mobile, mobile web, techie notes.

I’ve just worked out how you could make a 3G iPhone yourself, even adding GPS support, and still get away with a lower cost than buying a regular iPhone. The solution:

  1. Get an iPod Touch (from £199).
  2. Get a Symbian smartphone, such as the N95, with an internet plan (on 3 you’d pay about £34 per month over 18 months for a N95 and a tariff roughly equivalent to O2’s iPhone tariff).
  3. Download JoikuSpot and install it on the N95. Use it to create an ad-hoc wireless network, and connect the iPod Touch to that network.
  4. Voilà. Total cost is about £811 over 18 months (compared to the iPhone total cost of £899), you get 3G or even HSDPA, and you get a whole additional handset with Nokia’s awesome features.

JoikuSpot is still a bit limited — rather than just routing packets, it proxies HTTP traffic and doesn’t support anything else, so e.g. IMAP isn’t going to work for the time being. I hope that will get fixed soon. I tested JoikuSpot briefly for plain web traffic on my E65 and it seems to be working.

I’m not going to rush out and buy all those things now, I just find this situation curious.

Sunday, 16 March 2008

One day without computers and digital stuff, is it possible (Part 2)

Written by Johannes Hauser on Sunday, 16 March 2008, 13:13 GMT.
Filed under: power-off day, techie notes, user experience.

Having finished breakfast, I need to pack my stuff. It feels somewhat strange not to pack in the usual stuff like the mobile phone and the MP3 player. (I even remove the LED lamp from my key ring.) On locking the door, it comes to me that once I was even planning to install some electronic house access system which would render the house key obsolete. But it turned out that there is no fall-back system in case of a power cut. Who invents such a bullshit?

In the bus to work I have to show my monthly ticket to the conductor. I have to admit the ticket has an integrated chip, but that one is used on a self-service station only, not for daily routines. I guess we can turn a blind eye to that. But I know that some other bus companies give out smart cards which you have to check on a sensor on entering the bus. Luckily, mine hasn’t started that sort of stuff. Anyway, I could buy a single ticket which is printed on paper (by a onboard computer, sadly), paying with loose change. Dodging the fare is not advisable today because the controllers use some sort of handheld computers where they would enter my data, requiring me to subscribe with a digital pen on a touchpad.

While entering my workplace, I realized that usually I would have to check in - using a smart card again. Good thing is, we are allowed to note down the time as an alternative, if the check-in does not work or we’re working abroad. Today I declare it as ‘not working’, period. And now things are getting tricky: What do you do all day long if you are usually working on a computer or in a lab with high-tech equipment? First thing I do is to tidy up my desk and file away piles of old papers. This keeps me busy for about an hour and a half and leaves me with a certain good feeling and a blank desk. But there are at least two hours left until lunch break. Perfect time for the reading of some papers about my next task. I printed them the day before since after all paper is friendlier to read and easier to highlight. You see, I’m cheating again: It’s not like I’m not using computers at all, It’s just that I planned carefully to avoid computers today, having prepared for that before.

All the time I’m happy the phone doesn’t ring, because it is - you may guess already - some high-integrated Voice-over-IP-based digital telephone system bling bling. I am certain my phone possesses more computing power than the machines which controlled the first space flight. Only for comparison: The first working prototype of a telephone was constructed in 1861, and one of the first ever transferred sentences was “Das Pferd frißt keinen Gurkensalat” (The horse doesn’t eat any cucumber salad). Which has about the same amount of meaning as what usually comes out of my phone receiver.

Tuesday, 11 March 2008

One day without computers and digital stuff, is it possible? (Part 1)

Written by Johannes Hauser on Tuesday, 11 March 2008, 01:18 GMT.
Filed under: electronic devices, power-off day, techie notes, user experience.

Since it is Lent now, our roman-catholic friends are doing without meat for 40 days. I am protestant and vegetarian anyway, so this does not really mean much to me. Anyway, some of them asked me if I would also forswear something for that time. I usually answered that I’m trying to pass the days without golfing. This is not a great relinquishment since I’m not golfing anyway, but it usually leaves them sufficiently impressed.

But all that made me think of one thing: Would it be possible to spend at least one day without computers, integrated circuits, digital devices and all that? Let’s think it through. I use the following rule: I may not use any semiconductor-operated device at all. Other electric devices will do ok, although I will try to avoid them. Also I will let others use digital devices for me. You may call that cheating, but I can only control myself, not others as well, and I cannot entirely shut down public life.

First, I would have to replace my radio controlled alarm clock by some good old mechanical device. And there’s the first drawback: I can impossibly sleep with a ticking clock around. Of course, I might wrap it in lots of fabric, but then its ringing will also be muffled. Not good. Also I would have to rewind it from time to time, but that’s a minor problem - at least if it doesn’t stop in the middle of the night, which according to Murphy’s Law it will do the nights before important meetings and stuff. But my digital clock once let me down also, because I failed to program not only the time but also the weekday. (Let’s call that a draw.)

Luckily my bathroom works on a somewhat hydraulic base, I even switched back to shaving foam some years ago since the electric razor broke and left me half-bearded one morning. But on the breakfast table I am unsure again about whether toasters and coffee machines are usually IC-controlled or not. On closer examination, the toaster has a simple bi-metal control, so I can use it safely. About the coffee machine, I still don’t know, so I decide to postpone the coffee to when I am at work. (The machine there is certainly not under digital control, and even if it would be: it’s usually my roommate who’s handling it because in earlier times I never got the amount of coffee powder right. But that’s another story.) Looking around in the kitchen, I notice that nearly all devices have at least an LCD display which means there are semiconductors at work. Well, except for the fridge which is that old it’s even strange it works on electricity and not on steam power. By its energy consumption and noise radiation, it wasn’t invented long after that. Usually this annoys me but this morning I even feel something like gratitude. Good old fridge.

Note to self: Replace it.

Thursday, 3 January 2008

Imitating the iPhone User Agent in Firefox

Written by Martin Kleppmann on Thursday, 3 January 2008, 13:57 GMT.
Filed under: mobile web, techie notes.

There are a number of web sites out there which provide specifically optimised versions for the iPhone. I was curious to test them (and to look at their source code to see what they are doing), but don’t have an iPhone myself. Many sites will only give a visitor the iPhone version of their site if the web browser identifies itself as Safari Mobile. How do you get it?

The solution is the “user agent” — a string sent by the web browser to the server as part of every request. It contains the name and version of the browser software you are using, the operating system, and a few other bits and pieces. It’s a very useful piece of information to website administrators, who can use it to compile anonymous statistics about the people who visit their site.

Many people consider it to be bad practice to serve different versions of a site depending on the user agent, but it happens often enough anyway. And that’s exactly what is going on here. Fortunately there are tools which will let you modify the user agent, so you can see what you would get if you were using some other software. This is sometimes called “masquerading” as another browser. The technique described here is for Firefox, but it’s possible to do the same thing with other browsers too.

Download the User Agent Switcher add-on for Firefox, and restart Firefox. In the menu, go to Tools -> User Agent Switcher -> Options -> Options. Add a new user agent, with description “iPhone”, and the following entry in the user agent field:

Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) AppleWebKit/420.1 (KHTML, like Gecko) Version/3.0 Mobile/3B48b Safari/419.3

The remaining fields (app version etc.) can stay empty. Now you can click Tools -> User Agent Switcher -> iPhone, and your browser instantly “becomes” an iPhone. If the site uses features which are not available in Firefox, it will not render correctly, but at least the site should serve you the same content as it would do to an iPhone. (The user agent above is taken from a real iPhone; there are probably many others which work too, but that one has worked for me.)

One big caveat: you shouldn’t really be doing this! Use it only briefly for testing a site, then reset the user agent to the Firefox default. Otherwise you’ll end up sending the iPhone user agent to all other web sites you visit too, and that isn’t good for anybody. You may up being locked out of certain web sites or getting the wrong version, and administrators of web sites will hate you because you mess up their statistics.

So please, please reset the user agent to the default when you’ve finished testing.

Friday, 30 November 2007

Find my nearest toilet, curry, whatever

Written by Martin Kleppmann on Friday, 30 November 2007, 20:57 GMT.
Filed under: mobile, techie notes.

Some interesting developments in so-called location based services have hit the news in the last few days:

Although it can undoubtedly be very useful in many circumstances (I can certainly see myself using both the toilet service and the map service), these developments do raise questions: How do they know where I am? Does Google now know my location as well as my web searches, emails, contacts, diaries, YouTube video preferences and everything else? How easy is it for a somebody to track where I am, and can they do it without me noticing?

In case you were wondering, this is not GPS. There are phones with in-build GPS, but they are still pretty rare and expensive. The remarkable thing about these location technologies is that they work pretty well with a far broader range of handsets (although Google Maps is more accurate if you have GPS).

So how does it work? As far as I know, there are the following ways of finding your location:

  • GPS (only on a few phones such as the Nokia N95)
  • Operator-based location lookup (as offered by MX Telecom, for example) — this is what SatLav uses.
  • Cell ID and cell location — this is what Google uses.

GPS I won’t discuss any further: it can be accessed only by applications installed on the phone, which need to be given permission to do so — the phone itself controls the information, so the chances of abuse are pretty low. (But see the Google-related caveat below.)

Operator lookup is a bit more concerning. To find out somebody’s location, you need to know their phone number. You send a location request for that number to the operator whom you are registered with. The operator sends a text message to the person you are trying to locate, to ask for their consent. If they agree to release the location, that information (latitude, longitude and an accuracy value) is sent to you who requested it. (I think that’s how it works anyway — I’ve not seen it in action yet, and I can’t try it out since I’m on the only mobile network in the UK which has not yet implemented location requests). The consent is valid for only one look-up, so you don’t need to be concerned about the toilet finder service being able to track you for the rest of your life just because you needed a loo in Westminster once.

The advantage of operator-based lookup is that it works on any phone, provided that phone’s network supports location lookups. (In the UK, Vodafone, T-Mobile, O2 and Orange all do.) No software needs to be installed, and it appears to be reasonably secure too. On the downside, the operators charge for the service — about £0.10 a go, plus a monthly fee. And if you want to use a location-based service (for example, to find your nearest xyz shop) you need to give that shop your mobile number, risking that you may receive unwanted text message advertising from them in future.

Cell location is a very different beast, and more difficult to understand too. You may know that the mobile phone network is split into cells, each cell being the area covered by one particular receiver/transmitter (e.g. on the roof of a building). Cells can be pretty small (a few dozen meters radius) in urban areas, and much larger (several kilometers radius) in the countryside. A mobile phone is usually locked onto one particular cell, and each cell has a unique identifier. On many handsets it is possible for an application running on the phone to find out the identifier of the cell to which it is connected.

So what does that give us? Only the cell ID is not worth much. But if you have a big database which contains approximate locations for every cell in the world, you can make a pretty good guess at where you are (provided you’re in a small cell at least). The problem: there does not seem to be such a database. At least it’s not possible for normal people to get hold of it. The operators (who have built all those cells) know where they are of course, but they won’t simply give away that valuable information.

A number of collaborative projects are attempting to gather location information of cells by combining many volunteering users’ contributions. Among these are CellSpotting, GSM Location and Navizon. The general idea here is: people who have GPS in their handsets walk/drive around, and every time the phone comes across a new cell, it sends the identifier of that cell together with the GPS coordinates to the database. Over time, the database gets a pretty good idea of the range of locations in which you lock onto a particular cell. Then people who don’t have GPS can send their cell ID to the database to get an averaged value of their probable location.

(A note on the side: people talk a lot in theory about using triangulation — measuring signal strengths, angles of directional antennas, signal timings from several adjacent cells and so on. In principle, these techniques could be used to provide a location which is more accurate than simply “you are in cell X, and cell X covers this and that area”. In practise, I don’t think triangulation is feasible on phones for all sorts of reasons — software limitations, hardware support etc. The operator-based location lookup, which uses the cells rather than the handsets to measure timings, may well use it — I don’t know.)

Now how does Google Maps get its location information for non-GPS handsets? I have not yet heard a definite answer, but the general suspicion is that they use precisely one of these databases. They might have bought it off the operators, but that’s a bit unlikely. Chances are they merged together several open source projects, and also drove around in a car themselves, mapping cells to GPS locations. And now that Google have released the application to the public, they do exactly the same as Google always does: collect the data from as many users as possible. Most probably, those people with GPS handsets who use Google Maps are unknowingly helping to expand Google’s cell ID database. When a GPS user encounters a new cell, Google learns both the location and the cell ID. Over time, their cell coverage and location accuracy will increase for the benefit of non-GPS users.

So, does Google know where you are? Yes. If you do a location lookup, at least. They claim to anonymise that data, so you can only hope that they are telling the truth.

One final note: the mobile web does not come into this at all. That means, if a phone accesses a website, there is in general no way of telling where that user is located (unless they explicitly give the site their phone number and the site performs an operator location lookup).

Friday, 23 November 2007

Why the mobile web is so slow

Written by Martin Kleppmann on Friday, 23 November 2007, 12:25 GMT.
Filed under: mobile web, techie notes, user experience.

On the “desktop/laptop web” (in contrast to the mobile web), we’ve become used to a page loading in about a second. On mobile phones, we are still far from seeing anything near that responsiveness — for most people, the mobile web experience is still agonisingly slow. Personally, I start to get irritated after about five seconds, and after as little as ten seconds there is a strong chance that I will go away and do something else. And currently it’s still a challenge to get a page to load within that timeframe on a mobile.

Ever wondered why it is so slow? Yesterday I was reading a few articles, and I repeatedly came across people who made a very simplistic assumption: namely that the available network bandwidth is fully used. For example, they will say: on a standard 3G/UMTS connection, each subscriber can transfer 384 kbit/s, therefore a 24 kB web page (text and a few small pictures) will load in 0.5 seconds. Honestly, if you’ve ever tried to access even a simple web page from a phone you know that this figure is just laughably wrong. It is never that fast.

Finally, after lots of searching, I found this paper by Pablo Rodriguez, Sarit Mukherjee and Sampath Rangarajan. And they give some good reasons why this is the case:

  • The round-trip time for packets is between 400 and 1000 ms in a typical 3G cell.
  • Packets loss is inevitable in wireless transmissions because of radio interference. There are two approaches to packet loss: either let TCP deal with retransmission (in which case it thinks the network is congested, and reduces the transfer rate), or to retransmit lost packets in a low-level protocol (in which case the round-trip times observed by TCP can vary wildly, which confuses TCP and also reduces the transfer rate).

In a nutshell, TCP is really not up to the job, but it’s so widely used that there is basically no chance that it is going to be replaced anytime so on. (WAP included a layer called the Wireless Transaction Protocol, which attempted to be a better replacement for TCP. But we know what happened to WAP — nobody wants to use it.) In their paper, Rodriguez et al. go on to describe a method for partially getting round the problem by rewriting DNS and HTTP responses — it’s not ideal, but at least it removes some of the worst problems.

The real problem here is that round-trip time though. On a normal broadband connection I’d expect a round-trip time between 25 ms (within the UK) and 125 ms (across the Atlantic). On 3G, even though the bandwidth is not that much lower than broadband, we’ve got a round-trip time 8 to 15 times higher. And this time really becomes noticeable every time you click on a link:

  1. Send a DNS request for the hostname we want to contact, and wait for the response. (Unless we’ve cached the DNS record on the handset.)
  2. Send a TCP SYN packet to the server we want to contact, and wait for the response.
  3. Send the HTTP query over the established TCP connection.

This means that every time we click a link, we have to wait 2 or 3 round trip times — i.e. between 0.8 and 3.0 seconds — until we get the very first few bytes of the page we requested. That’s assuming that none of those first few packets got lost (because if one of them was lost, there would be no way of telling — all you can do is wait for a timeout and try again). And then we still have to transfer the whole page, and have all the TCP issues to deal with.

I just hope that mobile phones nowadays use persistent HTTP connections or pipelining which would remove some of the overheads. Does anybody know if they do? I’d like to hear from you.

PS. I thought I had accidentally deleted this post — fortunately I managed to find it again, hidden away in a binary mess, in the MySQL log file! :-)