Knuth and McIlroy Approach a Problem
One of my favorite stories about software comes from a competition between Donald Knuth and Doug McIlroy. It illustrates two completely different takes on solving an algorithmic problem. I read about it originally on Dr. Drang's blog.
Knuth is best known for his contributions to the analysis of algorithms, but his achievements can be all over computer science. McIlroy is equally accomplished, having contributed to the fundamental patterns of how we think about software components and programming languages.
A computer scientist was writing a column about Literate Programming – one of Knuth's ideas on how documentation and code should live side-by-side. So he asked both Knuth and McIlroy to write a program:
Given a text file and integer k, print the k most common words in the file (and the number of their occurrences) in decreasing frequency.
Knuth wrote a clever program in WEB, a language that combined Pascal and TeX. The algorithm incorporated a trie, interleaved 26-element arrays, and insertion-sort.
McIlroy wrote his own.
tr -cs A-Za-z '\n' |
tr A-Z a-z |
sort |
uniq -c |
sort -rn |
sed ${1}q
His program came with the following description:
1. Make one-word lines by transliterating the complement (-c) of the alphabet into newlines (note the quoted newline), and squeezing out (-s) multiple newlines.
2. Transliterate upper case to lower case.
3. Sort to bring identical words together.
4. Replace each run of duplicate words with a single representative and include a count (-c).
5. Sort in reverse (-r) numeric (-n) order.
6. Pass through a stream editor; quit (q) after printing the number of lines designated by the script’s first parameter (${1})
He talks about the problem: How often does one have to do this task? Will the requirements change slightly over time?
The instinct behind how we solve problems is rooted in our culture – Knuth came from algorithms, McIlroy came from Unix.
And finally
Knuth has shown us here how to program intelligibly, but not wisely. I buy the discipline. I do not buy the result. He has fashioned a sort of industrial strength Faberge egg – intricate, wonderfully worked, refined beyond all ordinary desires, a museum piece from the start.
Read the original column with Knuth's complete source code and McIlroy's response.
While the story illustrates the differences between the two approaches and highlights the Unix philosophy, it might be unfair to Knuth. He was trying to demonstrate his ideas on Literate programming. His lengthy explanation reads more Socratic than verbose.