jsgrep

jsgrep is like grep, but instead of working on characters, it works on a JavaScript token stream. It’s on github at https://github.com/sfrancisx/jsgrep.

I’m a front-end engineer on Yahoo! Mail. I wrote jsgrep for my own use in mid-2010, and I’ve been (lackadaisically) promoting jsgrep within the Yahoo! Mail team since late 2010. Honestly, I have not met with much success. “It works on tokens instead of characters” seems like a subtle distinction, and it doesn’t really convey how powerful jsgrep can be. I don’t like to blow my own horn, but I have to say that it is just about the coolest command line tool I’ve ever used. I probably use it an average of 20 times a day.

Yahoo! Mail is a big program, with a lot of people working on it. The portion that I work on has 9179 functions in about a quarter million lines of code (well, 117,064 lines have code on them. The rest is whitespace or comments.) I am very familiar with a small part of the code, slightly familiar with a slightly larger part of the code, and just about completely clueless about most of it. Unfortunately, I might have to work on a bug that occurs anywhere. Being able to search the code quickly and easily is extremely important.

Enter jsgrep

Part of what makes jsgrep so cool is that it’s so convenient to use. I have defined file sets for different areas of Mail’s code. My default file set includes only the code that I work in (including the parts that I’m clueless about). By far the most common thing I do with jsgrep is to find function definitions or function calls, so I’ve defined macros for both (along with 29 other macros that I use less often.) Compare jsgrep to grep when finding a function called onUpdate:

jsgrep
   $ jsgrep F:onUpdate

grep
   $ find ~/dev/yahoo/ymail/src -name *.js -exec grep -r -E '([^A-Za-z0-9]onUpdate[ ]*(=|:)[ ]*(new)?[ ]*function)|(function[ ]+onUpdate[^A-Za-z0-9])' {} ";"

It took me a half hour or so to figure out that grep command. It only takes me about 30 seconds to lose my train of thought, so the grep command is essentially useless to me. Simply grepping for onUpdate is a lot easier, but it also returns a lot of stuff, including comments and references to ‘actionUpdate’ and ‘onUpdatesReady’. And, of course, ‘onUpdate’ is not a common string – doing a simple grep for ‘set’ returns 16,570 matches.

Although finding functions is my most common use, I do often use it for more complex tasks. Here are some real life examples:

  1. Developers occasionally accidentally check in debugger or console.log() statements
  2. $ jsgrep (console.log)|debugger

    At the moment, I have 10 console.log() statements and 6 debugger statements in my source. Some of these are local changes, or they’re in debug code, but a couple of them look like they need to be removed.

  3. Trailing commas in object initializers break IE 7:
  4. $ jsgrep ,}

    This happens more often than you’d think. When you define an object’s functions inline, the last comma may be hundreds of lines away from the closing brace. Comment out the last function and you’ve created a very hard to see problem.

  5. Our code coverage tool can’t handle for statements that don’t have braces
  6. $ jsgrep 'for (LPAREN) .* C:1 (!LBRACE)'

    LPAREN matches the open paren token. The macro is defined as \(. If you type \( on the command line without quoting it, the shell thinks you’re escaping the parenthesis for it, and it removes the backslash. To get the shell to pass \( to jsgrep, you have to type \\\( (or quote \() on the command line. Using this macro will always work, and it just seems easier.

    C:1 is a cool feature with an awful syntax. It matches the closing paren, bracket or brace for capture #1.

  7. I’d like a quick & dirty way to find unused code.
  8. $ jsgrep -m -l- -n- F:NAME | sort | uniq > all_funcs

    $ jsgrep -m -l- -n- NAME LPAREN | sort | uniq > called_funcs

    $ diff called_funcs all_funcs

    This is quick and very dirty. The task got de-prioritized before I figured out if the results were too noisy to be useful.

  9. extends is a future reserved word in JavScript, but most browsers allowed it to be used as an identifier until August of 2011, when some of my code mysteriously broke.
  10. $ jsgrep class|const|enum|export|extends|import|super

    class, const, etc. are also future reserved words.

  11. We ship compressed code. It’s not uncommon to reproduce a bug in production, and to know exactly where it’s occurring, but to be unable to find the source. I recently had a problem isolated to x=c.getAttribute(d)
  12. $ jsgrep NAME=NAME.getAttributes LPAREN NAME

    There were several matches, but it happened to be obvious which one I wanted. If it hadn’t been obvious, I would have added a few more tokens.

  13. I don’t know if this is a real life example, but I’m strangely interested in statistical trivia.
  14. 6295 of Mail’s 9179 functions are named, although 1122 of the named functions are anonymous functions assigned to a variable. The most commonly called function name is get (7.1% of function calls), followed by one (5.8%), set (3.3%), push (3.2%) and on (2.6%). We have 10 functions named get in Mail’s code, and there are another 30 in YUI. 33 functions take 5 or more parameters.

    The most common token in our code is “.” (18.5% of all tokens) followed by “(” and “)” (tied (whew!) at 15% of tokens). The most common name token is _this (3.3% of tokens and then 10th most common token overall).

    We have 17,149 bytes of code in 484 log statements, and another 3,412 bytes in 88 assertions.

    We have 266 calls to setTimeout or Y.later. 38 of them have a timeout of 0.