These programs all implement the same basic problem.
The problem is this:

  read stdin, tokenize into words
  for each word count how often it occurs
  output words and counts, sorted in descending order by count

The idea is to measure in each language how well it performs this basic
problem, how much memory it takes, and how elegant the solution looks.

The test is for scripting languages, but I will gladly take
non-scripting languages, too, as long as I can compile / run them on my
Linux laptop on my test data to get comparable results.  The data will
be several hundred megabytes of web logs, so the start-up time should
not matter much.  I will preload the interpreter / VM for each language
before the benchmark so disk I/O does not factor in, and the test data
will come from a tmpfs (a ram disk, basically).

The idea is to solve this problem as you would solve the problem in the
language.  So, for perl, for example, you'd use a hash, not implement
the whole thing as a C extension to gain speed.  In C++ you'd use the
STL and iostreams.  Since I am not an expert in every programming
language, do send me enhancements if you think one implementation here
is unfairly making a language look bad.  I did not write all of these
implementations myself, btw.

Off the top of my head, I'd like to see implementations in:

  - Javascript (there are javascript interpreters you can run from the
    command line, right?)

Send your submissions to felix (dash) wp (at) fefe (dot) de