Code Formatting Conventions

I have vague (and probably inaccurate) memories of an article I read maybe 25 years ago. Please remember that every remembered “fact” is probably wrong…

Some time in the ’80s, Adobe (or some company) did extensive studies to figure out what makes text readable. They looked at the font, the size and style, character and line spacing, line length, foreground & background colors, etc. The fact that stuck in my head was that yellow or green text on a black background could be read 30% faster than black text on a white background. Wow! That’s a huge number. If I could code 30% faster, I think that would justify a 30% raise.

This was about the time that “Paper White” displays were just starting to become affordable, and everything started using white backgrounds, including the latest version of the editor I used at the time. I actually had to turn down the brightness on my monitor because it gave me headaches. The Adobe article made me think, though… Sure, paper is white, but my monitor isn’t a piece of paper. All the cool people are using white backgrounds, but I’ve never been very cool. What if I used a black background…

I changed my editor’s settings to use yellow text on a black background. The difference was amazing. I don’t know if I could code 30% faster, but my head ached at least 70% less. (I’ve spent the last 25 years wondering why anyone uses black text on a white background for anything. I’m sure you’re wondering why this post is black on white. I’m wondering the same thing, but that’s what WordPress defaulted to, and I haven’t wanted to spend the time figuring out how to change it.)

So the question recently occurred to me “is there an objectively better way to format code?” It seems like there must be. There may not be a style which is better for every individual, but there must be a style which is better on average.

When Adobe did these studies, doing something like that was a fairly expensive undertaking (at least I assume it was). I picture them getting thousands of people to come in, read some piece of text (while some guy in a lab coat times them), and take a comprehension test. They probably had to pay the people something for their trouble. They had to pay a bunch of guys (working in parallel) in lab coats to sit there with a stopwatch. They had to pay someone to recruit test subjects. They had to pay someone to grade the tests and interpret the results. They had to pay for office space and coffee and who knows what else.

The Internet changes all this. I don’t have to recruit you or pay you. You don’t have to come to my office. I don’t have to sit around with a stopwatch. Heck, I don’t even have to buy a lab coat. All I have to do is convince you (and a few thousand other people) to spend some time taking my little quiz.

I should rephrase that last bit. All I have to do is decide what makes a formatting convention subjectively better, and figure out how to measure that, and create a quiz that does measure it, and then I have to convince you to take my quiz. Fortunately, I did those other parts before writing this blog post.

What makes a formatting convention objectively better? I think it’s nothing more than the speed at which you can read and understand unfamiliar code. Speed is important – you can understand anything if you spend long enough. And if unfamiliar code is more clear to you, then your own code should be clearer, too.

Take the quiz!

Postscript:

My quiz has been available for a little while now. I don’t have enough data for the result to be statistically meaningful, but the data I have is very suggestive: Yes, code formatting affects understandability. The effect is dramatic. Thirty percent raise, here I come! (Unless, of course, the effect disappears when I get more data…)

jsgrep

jsgrep is like grep, but instead of working on characters, it works on a JavaScript token stream. It’s on github at https://github.com/sfrancisx/jsgrep.

I’m a front-end engineer on Yahoo! Mail. I wrote jsgrep for my own use in mid-2010, and I’ve been (lackadaisically) promoting jsgrep within the Yahoo! Mail team since late 2010. Honestly, I have not met with much success. “It works on tokens instead of characters” seems like a subtle distinction, and it doesn’t really convey how powerful jsgrep can be. I don’t like to blow my own horn, but I have to say that it is just about the coolest command line tool I’ve ever used. I probably use it an average of 20 times a day.

Yahoo! Mail is a big program, with a lot of people working on it. The portion that I work on has 9179 functions in about a quarter million lines of code (well, 117,064 lines have code on them. The rest is whitespace or comments.) I am very familiar with a small part of the code, slightly familiar with a slightly larger part of the code, and just about completely clueless about most of it. Unfortunately, I might have to work on a bug that occurs anywhere. Being able to search the code quickly and easily is extremely important.

Enter jsgrep

Part of what makes jsgrep so cool is that it’s so convenient to use. I have defined file sets for different areas of Mail’s code. My default file set includes only the code that I work in (including the parts that I’m clueless about). By far the most common thing I do with jsgrep is to find function definitions or function calls, so I’ve defined macros for both (along with 29 other macros that I use less often.) Compare jsgrep to grep when finding a function called onUpdate:

jsgrep
   $ jsgrep F:onUpdate

grep
   $ find ~/dev/yahoo/ymail/src -name *.js -exec grep -r -E '([^A-Za-z0-9]onUpdate[ ]*(=|:)[ ]*(new)?[ ]*function)|(function[ ]+onUpdate[^A-Za-z0-9])' {} ";"

It took me a half hour or so to figure out that grep command. It only takes me about 30 seconds to lose my train of thought, so the grep command is essentially useless to me. Simply grepping for onUpdate is a lot easier, but it also returns a lot of stuff, including comments and references to ‘actionUpdate’ and ‘onUpdatesReady’. And, of course, ‘onUpdate’ is not a common string – doing a simple grep for ‘set’ returns 16,570 matches.

Although finding functions is my most common use, I do often use it for more complex tasks. Here are some real life examples:

  1. Developers occasionally accidentally check in debugger or console.log() statements
  2. $ jsgrep (console.log)|debugger

    At the moment, I have 10 console.log() statements and 6 debugger statements in my source. Some of these are local changes, or they’re in debug code, but a couple of them look like they need to be removed.

  3. Trailing commas in object initializers break IE 7:
  4. $ jsgrep ,}

    This happens more often than you’d think. When you define an object’s functions inline, the last comma may be hundreds of lines away from the closing brace. Comment out the last function and you’ve created a very hard to see problem.

  5. Our code coverage tool can’t handle for statements that don’t have braces
  6. $ jsgrep 'for (LPAREN) .* C:1 (!LBRACE)'

    LPAREN matches the open paren token. The macro is defined as \(. If you type \( on the command line without quoting it, the shell thinks you’re escaping the parenthesis for it, and it removes the backslash. To get the shell to pass \( to jsgrep, you have to type \\\( (or quote \() on the command line. Using this macro will always work, and it just seems easier.

    C:1 is a cool feature with an awful syntax. It matches the closing paren, bracket or brace for capture #1.

  7. I’d like a quick & dirty way to find unused code.
  8. $ jsgrep -m -l- -n- F:NAME | sort | uniq > all_funcs

    $ jsgrep -m -l- -n- NAME LPAREN | sort | uniq > called_funcs

    $ diff called_funcs all_funcs

    This is quick and very dirty. The task got de-prioritized before I figured out if the results were too noisy to be useful.

  9. extends is a future reserved word in JavScript, but most browsers allowed it to be used as an identifier until August of 2011, when some of my code mysteriously broke.
  10. $ jsgrep class|const|enum|export|extends|import|super

    class, const, etc. are also future reserved words.

  11. We ship compressed code. It’s not uncommon to reproduce a bug in production, and to know exactly where it’s occurring, but to be unable to find the source. I recently had a problem isolated to x=c.getAttribute(d)
  12. $ jsgrep NAME=NAME.getAttributes LPAREN NAME

    There were several matches, but it happened to be obvious which one I wanted. If it hadn’t been obvious, I would have added a few more tokens.

  13. I don’t know if this is a real life example, but I’m strangely interested in statistical trivia.
  14. 6295 of Mail’s 9179 functions are named, although 1122 of the named functions are anonymous functions assigned to a variable. The most commonly called function name is get (7.1% of function calls), followed by one (5.8%), set (3.3%), push (3.2%) and on (2.6%). We have 10 functions named get in Mail’s code, and there are another 30 in YUI. 33 functions take 5 or more parameters.

    The most common token in our code is “.” (18.5% of all tokens) followed by “(” and “)” (tied (whew!) at 15% of tokens). The most common name token is _this (3.3% of tokens and then 10th most common token overall).

    We have 17,149 bytes of code in 484 log statements, and another 3,412 bytes in 88 assertions.

    We have 266 calls to setTimeout or Y.later. 38 of them have a timeout of 0.