Let’s look at an old application my department started in 2001 and developed sporadically until finally launching it in 2007:
[me@desktop old]$ find . | grep -E '(php|js)$' | xargs wc -l | tail -n 1
53063 total
[me@desktop old]$ find . | grep -E '(html|css)$' | xargs wc -l | tail -n 1
9726 total
Now let’s switch over to the replacement application I started in January, 2009 and launched in May, 2009:
[me@desktop new]$ find . | grep -E '(php|js)$' | xargs wc -l | tail -n 1
5955 total
[me@desktop new]$ find . | grep -E '(html|css)$' | xargs wc -l | tail -n 1
2302 total
For those unfamiliar with find
, grep
, regular expressions, xargs
, or tail
, that means the old application took 53,063 lines of PHP and JavaScript to do what I did in 5,955. The old used 9,726 lines of HTML and CSS; I used only 2,302.
So, basically I had fewer total lines of content in the entire rewritten application than its predecessor had merely of markup and styles. That’s fantastic!
Of course, some of you are thinking that’s just because I write much longer lines of code, right? And you naturally want me to compute the average number of characters per line used in each application to compare, right? And you demand — demand — that it be done with a single command chain in UNIX? As you wish!
[me@desktop old]$ (find . | grep -E '(html|css|php|js)$' | tee temp | xargs wc -l | tail -n 1 | awk '{print $1}' ; cat temp | xargs wc -c | tail -n 1 | awk '{print $1}') | sed 'N;s/\n/ /' | awk '{print $2 " / " $1 " = " $2 / $1}'
1523405 / 51063 = 29.8338
[me@desktop new]$ (find . | grep -E '(html|css|php|js)$' | tee temp | xargs wc -l | tail -n 1 | awk '{print $1}' ; cat temp | xargs wc -c | tail -n 1 | awk '{print $1}') | sed 'N;s/\n/ /' | awk '{print $2 " / " $1 " = " $2 / $1}'
252837 / 8257 = 30.6209
See? I used only one extra character per line! On the other hand, I spent a good 30 minutes writing that command: 1 minute composing what I put there and the other 29 trying to figure out a way to do it without either running the find
twice or writing anything to a temp file. (You’ll notice I gave up and threw in a tee
halfway through.)
UPDATE: I was too focused on the chaining problem to recognize that I could just have wc
calculate the number of characters and lines at the same time. This would have worked just as well, with no temp file, and with far less complexity:
find . | grep -E '(php|js|html|css)$' | xargs wc -l -c | tail -n 1 | awk '{print $2 " / " $1 " = " $2/$1}'
I too have been able to reduce code size dramatically for my department. I don’t know how you do it, but I follow three simple rules:
Make the code speak for itself
It’s astonishing how “chatty” some programmers can be. Explaining their code, embedding test cases, documenting revisions. Stuff like “//do not allow future dates!!!”, or “/*bad idea but allow blanks per Bobo Jones*/”.
Make the compiler earn its keep
At least half of the “code bloat” I see is due to excessively long names. The compiler doesn’t care if you call something “postedDate” or “pd”. Similarly, method name “CalculateElapsedWorkTime” can be renamed to “cewt” without changing the results.
One word: GIGO
I always find tons of code devoted to validating data. As matter of system design, data should be checked once and as close to its source as possible. So what’s closer to the data source than the user him- or her-self? I make users responsible for validation and warn them of the consequences of errors.