This is the archive for February 2010. Recent posts can be found at the main blog page.
Wednesday, February 17, 2010 ★ 20:10 ★ Category Programming ★ Permanent url
I’m pleased to see that the people at the ILPS group of the University of Amsterdam have released a project I have been working on in the past under an open source license (LGPL).
Ssscrape is a system for collecting and processing dynamic web data. Ssscrape stands for Syndicated and Semi-Structured Content Retrieval and Processing Environment, and provides a framework for crawling and processing dynamic web data such as RSS/Atom feeds. Ssscrape is mostly implemented in Python and MySQL, but it should be noted that processing tasks can be implemented in any programming language, since Ssscrape simply invokes external executables. More information about Ssscrape can be found at the Ssscrape website.
I am no longer involved in the project, and from a quick glance I see that many things have changed since I last touched the code about one and a half years ago… really nice to see that the project is still alive.
Oh, and yes, I actually invented the bizarre acronym. I still think it’s a really cool and appropriate name.
Sunday, February 7, 2010 ★ 21:36 ★ Category Programming ★ Permanent url
When a web application needs to display many items, e.g. search results or large lists of records, it is often desirable to chunk the total list of items into equal-sized pages for easy navigation. This process is called pagination. Alternative techniques like continuous scrolling might also be worth considering, but this blog article is just about pagination.
If multiple pages of results are available, navigation links should be displayed on the output pages so that users can browse to other result pages. The list of links is what I call a pagination control. A pagination control could look something like this, where each item is a link to the corresponding page.
previous 1 5 6 [7] 8 9 15 next
In my examples the active page is shown in square brackets. I also set the display width to 9 (see below). For all examples the total number of pages is assumed to be 15, unless stated otherwise.
Controls like these are quite intuitive to use, and many websites (e.g. search engines) use pagination controls similar to this one, with subtle differences in their implementations. For example, some have ‘first’ and ‘last’ links, some don’t. There are many other choices to make.
Implementing pagination controls like the above seems trivial at first sight, but there are a few corner cases to consider, and it takes some thinking to get all cases right.
In my implementation, I assume a fixed number of links, so that the resulting output is always has more or less the same size, which I find very useful since the control would look roughly the same on all pages. I use the term display width to denote this value. In the example above the display width is set to 9. The gaps (shown with an ellipsis) are also considered, since those take roughly the same space as the links to the pages. (Optional ‘previous’ and ‘next’ links are not counted.)
A small exception to the fixed display width is that if there are less pages than the display width of the control, the complete list of pages is shown. For example, if there are only 8 pages in total, it looks like this:
previous 1 2 3 4 5 6 [7] 8 next
Note that the display width should be set to an uneven number to ensure a nicely balanced output. (For even display widths, the algorithm favours showing one extra link after the active page, since if a user is making its way through many pages, it is much more likely the user navigates in forward order.)
The control should always show the active, first and last pages, which make for three items in the list. In the remaining space, the control should show as as much context around the current page as space (defined by the display width) permits.
Gaps within the range of page numbers should be easy to spot to make it clear there are more pages available than the visible links. Gaps should be avoided if possible, so when the active page is close to the first or the last page, the control should try to align the numbers so that only one side of the control has a gap. The example below should clarify this:
[1] 2 3 4 5 6 7 15 1 [2] 3 4 5 6 7 15 1 2 [3] 4 5 6 7 15 1 2 3 [4] 5 6 7 15 1 2 3 4 [5] 6 7 15 1 4 5 [6] 7 8 15 1 5 6 [7] 8 9 15 1 6 7 [8] 9 10 15 1 7 8 [9] 10 11 15 1 8 9 [10] 11 12 15 1 9 10 [11] 12 13 14 15 1 9 10 11 [12] 13 14 15 1 9 10 11 12 [13] 14 15 1 9 10 11 12 13 [14] 15 1 9 10 11 12 13 14 [15]
So, given these requirements, how to decide which links to show in the pagination control? The problem at hand is defined by three variables: the display width, the total number of pages, and of course the active page.
I wrote an algorithm that (as far as I can see) satisfies all constraints expressed above for all display widths of at least 7, since a display width of less than 7 items does not make any sense the reason why is left as an exercise to the reader. (Hint: pagination controls are designed for navigating to other pages.) A quite clean Python implementation, which I hereby put in the public domain, can be obtained here:
Download pagination.py
Just run the script to see some example output. Porting this code to other languages should be trivial. Rendering nice XHTML out of the resulting list of numbers is very application-specific and hence left as an exercise to the reader.
With this approach showing back and forward buttons only if appropriate is trivial. If the current page is larger than 1, a ‘previous’ link should be included. Similarly, if the current page is smaller than the number of pages, a ‘next’ link should be shown. ‘First’ and ‘last’ links should not be rendered, since page 1 and the last page are always included in the output and extra links would not offer the user anything that the other links already offer.
Tuesday, February 2, 2010 ★ 20:15 ★ Category Gnome ★ Permanent url
GUADEC (pronounced GWAH-DECK) is an acronym for the GNOME Users’ And Developers’ European Conference. Held annually in cities around Europe, GUADEC is the largest gettogether of GNOME users, developers, foundation leaders, individuals, governments and businesses in the world. Gnome is the Free and open source software stack that drives the user interface of many Linux-based devices, from smartphones to your home pc.
GUADEC 2010, the eleventh edition, will be in The Hague, The Netherlands and takes place on July 24 – July 30.

The organisation team calls you to arms! A community conference like GUADEC only happens when the community puts its weight behind it.
This is your chance to be part of this event. Whether you are a conference rookie or a seasoned GUADEC veteran, your help is much appreciated.
As a volunteer at the conference, you may enjoy special benefits such as a free and limited edition volunteer shirt and free food and drinks during your volunteering hours.
Random photo from Berlin (July, 2005)
Wouter Bolsterlee, also known as uws, a postmodern geek living in the Netherlands. Read more about me…
Unless stated otherwise, all material on this site is available under a Creative Commons Share-Alike license.