• Programming 25.06.2009

    Let’s look at an old application my department started in 2001 and developed sporadically until finally launching it in 2007:

    [me@desktop old]$ find . | grep -E '(php|js)$' | xargs wc -l | tail -n 1
    53063 total
    [me@desktop old]$ find . | grep -E '(html|css)$' | xargs wc -l | tail -n 1
    9726 total

    Now let’s switch over to the replacement application I started in January, 2009 and launched in May, 2009:

    [me@desktop new]$ find . | grep -E '(php|js)$' | xargs wc -l | tail -n 1
    5955 total
    [me@desktop new]$ find . | grep -E '(html|css)$' | xargs wc -l | tail -n 1
    2302 total

    For those unfamiliar with find, grep, regular expressions, xargs, or tail, that means the old application took 53,063 lines of PHP and JavaScript to do what I did in 5,955.  The old used 9,726 lines of HTML and CSS; I used only 2,302.

    So, basically I had fewer total lines of content in the entire rewritten application than its predecessor had merely of markup and styles.  That’s fantastic!

    Of course, some of you are thinking that’s just because I write much longer lines of code, right?  And you naturally want me to compute the average number of characters per line used in each application to compare, right?  And you demand — demand — that it be done with a single command chain in UNIX?  As you wish!

    [me@desktop old]$ (find . | grep -E '(html|css|php|js)$' | tee temp | xargs wc -l | tail -n 1 | awk '{print $1}' ; cat temp | xargs wc -c | tail -n 1 | awk '{print $1}') | sed 'N;s/\n/ /' | awk '{print $2 " / " $1 " = " $2 / $1}'
    1523405 / 51063 = 29.8338

    [me@desktop new]$ (find . | grep -E '(html|css|php|js)$' | tee temp | xargs wc -l | tail -n 1 | awk '{print $1}' ; cat temp | xargs wc -c | tail -n 1 | awk '{print $1}') | sed 'N;s/\n/ /' | awk '{print $2 " / " $1 " = " $2 / $1}'
    252837 / 8257 = 30.6209

    See?  I used only one extra character per line!  On the other hand, I spent a good 30 minutes writing that command: 1 minute composing what I put there and the other 29 trying to figure out a way to do it without either running the find twice or writing anything to a temp file.  (You’ll notice I gave up and threw in a tee halfway through.)

    UPDATE: I was too focused on the chaining problem to recognize that I could just have wc calculate the number of characters and lines at the same time.  This would have worked just as well, with no temp file, and with far less complexity:

    find . | grep -E '(php|js|html|css)$' | xargs wc -l -c | tail -n 1 | awk '{print $2 " / " $1 " = " $2/$1}'

    Posted by Ben @ 8:49 pm

  • One Response

    WP_Modern_Notepad
    • just pixels Says:

      I too have been able to reduce code size dramatically for my department. I don’t know how you do it, but I follow three simple rules:

      Make the code speak for itself
      It’s astonishing how “chatty” some programmers can be. Explaining their code, embedding test cases, documenting revisions. Stuff like “//do not allow future dates!!!”, or “/*bad idea but allow blanks per Bobo Jones*/”.

      Make the compiler earn its keep
      At least half of the “code bloat” I see is due to excessively long names. The compiler doesn’t care if you call something “postedDate” or “pd”. Similarly, method name “CalculateElapsedWorkTime” can be renamed to “cewt” without changing the results.

      One word: GIGO
      I always find tons of code devoted to validating data. As matter of system design, data should be checked once and as close to its source as possible. So what’s closer to the data source than the user him- or her-self? I make users responsible for validation and warn them of the consequences of errors.

    Leave a Comment

    Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.