This is a category view. Only posts in category Programming are shown. Remove the filter to view posts in all categories.

Plyvel, Pythonic bindings for LevelDB

I’m happy to introduce a new open source project of mine, Plyvel. Plyvel is a fast and feature-rich Python interface to LevelDB, a fast key-value storage library written at Google.

The Plyvel documentation contains a lot of information on its usage and installation. Visit Plyvel on Github for sources and issue tracker.

Introducing HappyBase

I’m happy to announce HappyBase, a developer-friendly Python library to interact with Apache HBase. HappyBase is designed for use in standard HBase setups, and offers application developers a Pythonic API to interact with HBase.

More information is available from the HappyBase documentation, including an installation guide, tutorial, and API docs. The HappyBase sources are on Github. The PyPI page on HappyBase makes it easy to find with tools like pip.

Ssscrape released as open-source software

I’m pleased to see that the people at the ILPS group of the University of Amsterdam have released a project I have been working on in the past under an open source license (LGPL).

Ssscrape is a system for collecting and processing dynamic web data. Ssscrape stands for Syndicated and Semi-Structured Content Retrieval and Processing Environment, and provides a framework for crawling and processing dynamic web data such as RSS/Atom feeds. Ssscrape is mostly implemented in Python and MySQL, but it should be noted that processing tasks can be implemented in any programming language, since Ssscrape simply invokes external executables. More information about Ssscrape can be found at the Ssscrape website.

I am no longer involved in the project, and from a quick glance I see that many things have changed since I last touched the code about one and a half years ago… really nice to see that the project is still alive.

Oh, and yes, I actually invented the bizarre acronym. I still think it’s a really cool and appropriate name.

Calculating the contents of fixed size pagination controls

When a web application needs to display many items, e.g. search results or large lists of records, it is often desirable to chunk the total list of items into equal-sized pages for easy navigation. This process is called pagination. Alternative techniques like continuous scrolling might also be worth considering, but this blog article is just about pagination.

If multiple pages of results are available, navigation links should be displayed on the output pages so that users can browse to other result pages. The list of links is what I call a pagination control. A pagination control could look something like this, where each item is a link to the corresponding page.

previous 1  5 6 [7] 8 9  15 next

In my examples the active page is shown in square brackets. I also set the display width to 9 (see below). For all examples the total number of pages is assumed to be 15, unless stated otherwise.

Controls like these are quite intuitive to use, and many websites (e.g. search engines) use pagination controls similar to this one, with subtle differences in their implementations. For example, some have ‘first’ and ‘last’ links, some don’t. There are many other choices to make.

Implementing pagination controls like the above seems trivial at first sight, but there are a few corner cases to consider, and it takes some thinking to get all cases right.

Display width

In my implementation, I assume a fixed number of links, so that the resulting output is always has more or less the same size, which I find very useful since the control would look roughly the same on all pages. I use the term display width to denote this value. In the example above the display width is set to 9. The gaps (shown with an ellipsis) are also considered, since those take roughly the same space as the links to the pages. (Optional ‘previous’ and ‘next’ links are not counted.)

A small exception to the fixed display width is that if there are less pages than the display width of the control, the complete list of pages is shown. For example, if there are only 8 pages in total, it looks like this:

previous 1 2 3 4 5 6 [7] 8 next

Note that the display width should be set to an uneven number to ensure a nicely balanced output. (For even display widths, the algorithm favours showing one extra link after the active page, since if a user is making its way through many pages, it is much more likely the user navigates in forward order.)

Which links to show?

The control should always show the active, first and last pages, which make for three items in the list. In the remaining space, the control should show as as much context around the current page as space (defined by the display width) permits.

Gaps within the range of page numbers should be easy to spot to make it clear there are more pages available than the visible links. Gaps should be avoided if possible, so when the active page is close to the first or the last page, the control should try to align the numbers so that only one side of the control has a gap. The example below should clarify this:

[1]  2   3    4   5     6    7       15
 1  [2]  3    4   5     6    7       15
 1   2  [3]   4   5     6    7       15
 1   2   3   [4]  5     6    7       15
 1   2   3    4  [5]    6    7       15
 1      4    5  [6]    7    8       15
 1      5    6  [7]    8    9       15
 1      6    7  [8]    9   10       15
 1      7    8  [9]   10   11       15
 1      8    9  [10]  11   12       15
 1      9   10  [11]  12   13   14   15
 1      9   10   11  [12]  13   14   15
 1      9   10   11   12  [13]  14   15
 1      9   10   11   12   13  [14]  15
 1      9   10   11   12   13   14  [15]

So, given these requirements, how to decide which links to show in the pagination control? The problem at hand is defined by three variables: the display width, the total number of pages, and of course the active page.

I wrote an algorithm that (as far as I can see) satisfies all constraints expressed above for all display widths of at least 7, since a display width of less than 7 items does not make any sense the reason why is left as an exercise to the reader. (Hint: pagination controls are designed for navigating to other pages.) A quite clean Python implementation, which I hereby put in the public domain, can be obtained here:

Download pagination.py

Just run the script to see some example output. Porting this code to other languages should be trivial. Rendering nice XHTML out of the resulting list of numbers is very application-specific and hence left as an exercise to the reader.

With this approach showing back and forward buttons only if appropriate is trivial. If the current page is larger than 1, a ‘previous’ link should be included. Similarly, if the current page is smaller than the number of pages, a ‘next’ link should be shown. ‘First’ and ‘last’ links should not be rendered, since page 1 and the last page are always included in the output and extra links would not offer the user anything that the other links already offer.

MySQL last insert id performance issues

After suffering from severe performance problems for a rather data intensive application (which was explicitly designed with scalability in mind), the culprit was found: a often executed piece of code did not execute SELECT LAST_INSERT_ID(), but SELECT LAST_INSERT_ID() FROM some_table. Hard to spot, but very important.

The problem? Well, the LAST_INSERT_ID (see the MySQL manual on LAST_INSERT_ID) function just returns the resulting primary key value from the last INSERT statement in the current MySQL connection thread, regardless of the table into which the row was inserted. Erroneously appending FROM some_table to this statement will result in a query that returns rows from a table, and these rows only contain the same constant value over and over again, once for each row in the table. However, if you only retrieve one row from the result set, you won’t initially notice the bug… at least not until your application mysteriously gets slower. If the table grows significantly over time, which was clearly the case here, since many records are inserted every day, executing this very simple query will become slower and slower, since it returns as many rows as are in the table, and each subsequent row yields exactly the same information as the previous one. How much slower? Well, up to the point that multiple quad core machines with plenty of memory were constantly suffering from such a high load that they became almost unresponsive. Yes, we were running this piece of code many times in parallel since it is supposed to be not CPU bound, but network latency bound.

The solution? Well, a very easy fix: after dropping the FROM some_table part from the query the load on the machines instantly dropped back to almost 0, which was how this applications was designed since it’s mostly network latency bound, which does not use much resources other than occupying some file descriptors for open sockets.

The lesson learnt here? A speed-up of many orders of magnitude, just by removing two words in the code… yes, sometimes performance optimization is completely in the nitty-gritty details. Unfortunately we had to find out the hard way.

(Yay for the person who found this. You know who you are.)

Trac SiblingNav plugin

Recently I’ve become a fan of using Trac for various projects I’m involved with, both professional and personal. The only issue I’m repeatedly having is the lack of a useful navigation box for hierarchical wiki pages. Existing navigation macros and plugins just don’t fit my needs, since they require you to build a menu yourself or make wrong assumptions. Fortunately, writing a plugin is not that hard…

I’m presenting you my shiny new SiblingNav plugin for Trac (tested with version 0.10.4). It looks like this on a subpage in a hierarchial wiki structure:

Trac SiblingNav plugin

This plugin adds a [[SiblingNav]] wiki macro resulting in a listing of the wiki path leading to the current page and a list of sibling pages. This is only useful for a hierarchic wiki structure, e.g. Meeting/20080218 is a subpage of Meeting. This plugin does the right thing for both top pages and subpages.

The macro accepts an optional argument with the page prefix for which to build a navigation menu, e.g. [[SiblingNav(Meeting)]] If omitted, the current page name is used. This covers the vast majority of use cases.

The output is modeled after the standard trac navigation and some other plugins, i.e. it uses the wiki-toc class on the surrounding HTML div, so it should show up as a familiar navigation pane.

You can get this plugin by either branching the Trac SiblingNav Bazaar branch like this:

bzr branch http://uwstopia.nl/geek/projects/trac/SiblingNav.uws/

...or alternatively by just copying the SiblingNav.py file to your wiki-macros/ directory.

I’ve listed the plugin at the Trac Hacks site as well so it’s more easily found by fellow Trac users: SiblingNav at TracHacks

Give it a try if you’re looking for something like this! Feedback is welcome!

A simple preprocessor for CSS files

Anewt, the Almost No Effort Web Toolkit (a collection of PHP libraries designed to aid rapid web development, mostly written by me), offers a simple preprocessor for CSS files, implemented as a Makefile. It is available from the scripts/ directory in the anewt.uws bzr branch, but for your convenience I’ve put up a snapshot over here: css-cpp.make

This Makefile acts as a simple preprocessor for CSS files. It uses the C preprocessor (cpp) to do the actual work. Using this file allows you to use variables (eg. aliases for color names, lengths, sizes, or sets of common properties) in your CSS files.

Since cpp and CSS both use the # character for different purposes, simple substitution (with a very unlikely string) will be done before actually invoking cpp. The % character is used to call cpp directives, so #define and #include become %define and %include, respectively.

Sample input:

%define MY_SHADE_OF_RED  rgb(196,   0,  12)
%define MY_SHADE_OF_BLUE rgb( 10,  42, 240)
html {
background-color: MY_SHADE_OF_RED;
color: MY_SHADE_OF_BLUE;
}

Sample output from the above sample input:

html {
  background-color: rgb(196,   0,  12)
  color: rgb( 10,  42, 240)
}

Usage is simple. First you need to create files with a .csst (CSS Template) extension. The corresponding .css files will be generated automatically, overwriting the existing .css file with the same name. This file may contain defines, includes, and so on.

Now, symlink or copy this file as “Makefile” into the directory containing the .csst and .css files. Then run make:

$ ln -s /path/to/anewt/scripts/css-cpp.make Makefile
$ make css

Alternatively, call make with a parameter:

$ make -f /path/to/anewt/scripts/css-cpp.make css

Or just execute this file from the right directory:

$ /path/to/anewt/scripts/css-cpp.make css

If your own project also uses a Makefile, you can include this file. To do that, put the following line into your own Makefile:

include /path/to/css-cpp.make

Then you have your own targets depend on the targets in this file:

all: css
clean: clean-css

Enjoy!

(This message is based on the comments in the source file, which may be updated in the future. This snapshot is posted to my blog in response to Nicolas Trangez who asked for a CSS preprocessor)

Text-to-Freemind released!

I’m happy to announce the first (and perhaps even last) version of text-to-freemind, a simple text to FreeMind conversion program, licensed under the GPL.

This program converts tab-indented text files into an XML format suitable for display by FreeMind. It was written out of annoyance with the FreeMind user interface, and the lack of ‘merging’ capabilities when collaborating with other people.

Input like this (indentation should be done using tabs)...

uws
    superhero
    hacker
        gnome
        web

...results in:

Text-to-Freemind sample output

Usage

Usage is like most command line programs. Proceed as below. To convert a single text file into a Freemind file, use:

text-to-freemind input-file.txt.mm > output-file.mm

You can use it as a filter (using shell pipes) as well:

cat some-text-data.mm.txt | text-to-freemind > output-file.mm

A Makefile snippet is also included to convert all *.mm.txt files into their *.mm counterparts. First copy or symlink the makefile, than run make.

cp /path/to/text-to-freemind/text-to-freemind.make Makefile
make

Alternatively:

ln -s /path/to/text-to-freemind/text-to-freemind.make Makefile
make

Or execute the Makefile directly if you don’t want to copy files around:

/path/to/text-to-freemind/text-to-freemind.make

Of course you have to chmod +x text-to-freemind.make first.

Requirements

The conversion program is written in Python (tested with 2.4 and 2.5) and requires an ElementTree implementation. Install python-elementtree or python-celementtree (included in Python 2.5) if you run into programs.

The Makefile snippet obviously depends on the make utility. GNU/Make is known to work.

Get it!

Code is available from the Text-to-Freemind bazaar branch (use bzr branch on that url to get it).

If you don’t feel like messing with version control systems, just copy the text-to-freemind and text-to-freemind.make files somewhere in your $PATH (make sure you set the executable bit on both).

Issues, problems, and feedback

This program is a quick hack, so don’t expect too much of it. If you feel like contacting me with problems or suggestions, please mail me. Thanks.

Update: I’ve improved the error message syntax and the input encoding assumptions (always UTF-8 now).

Which in python

If you want the functionality of which in a Python program, i.e. you want to look up the file system path for a given executable, you may use this simple function:

def which(e): return ([os.path.join(p, e) for p in os.environ['PATH'].split(os.pathsep) if os.path.exists(os.path.join(p, e))] + [None])[0]

The return value is the full path or None if the executable was not found. Easy enough!

Verkiezingskijker online!

Today a new version of the verkiezingskijker website was launched. Verkiezingskijker.nl (which means something like ‘elections viewer’ in English) is a topic-guided search engine to search verkiezingsprogramma’s (English: electoral manifestos written by participating political parties) targeted at the upcoming elections for the Provinciale Staten, next month (March 7) in the Netherlands.

Verkiezingskijker

Why this post? Well, I wrote parts of the backend code as part of a student project at the University of Amsterdam.

Older posts in this category