Flickr Badge

Sunday, February 04, 2007

Can your programmers read code?

Surprising as it might seem, programmers spend a majority of the time reading code, not writing code. Take my case. Even though I am working alone, on a new project, I spend the majority of my time reading code. That's not just my code, but also the code in the various open source modules that I'm using. At some point or the other, I've had to delve headlong into the source of Django, Dojo, Mochikit and ToofPy, and try to make out what the code is doing by ignoring the docs (if there are any!) and reading the source. If you are working in a team, the effect is magnified even more.

This was no sudden revelation. When I was hiring, I had a standard question to see if candidates could read code. Here is the code snippet.

int function f(somestruct *a) {
int x = 0;
if (NULL == a) {
return 0;
}
x += f(a->left);
x += 1;
x += f(a->right);
return x;
}
Candidates had to try to figure out what function f was doing.

First, I expected the candidate to see the left and right fields and guess that "a" was a pointer to a binary tree of some sort. I then expected the candidate to see the pattern of the recursive calls and guess that it was a tree traversal (inorder in this case). Once that's done, its simple to try out the function with some data and figure out what it does (count the number of nodes).

Given that everyone learns about binary trees, and especially inorder traversal, you would have expected many to get it right. Surprise! Only a few have ever got it correct.

Most answers were something related to linked lists or arrays. It seemed that every programmer when confronted with a pointer in any form assumed it was related to linked lists!!

To be successful at code reading, it is the ability to execute the program in our mind that needs to be developed. This is a learned skill, developed by just reading lots of code.

One conclusion is that many programmers don't spend enough time reading code. It also explains why in many teams, most code becomes spaghetti — programmers can't read the original code, so they hack their fixes.

So, do you check to see if programmers can read code? As an aside, you can now find out how most programmers can't write code either.

20 comments:

Anonymous said...

hey that took only 10 seconds! you need to use something harder than that :)

Siddhi said...

Exactly. And everyone gets it wrong. What does that say?

Anonymous said...

that you are interviewing a very inferior kind of candidate
I honestly cant believe a majority get this wrong

Anonymous said...

Don't you mean preorder traversal? Left/Self/Right.

Anonymous said...

no thats in-order.
self-left-right is pre-order

Anonymous said...

Surprisingly enough, not many people who call themselves programmers, understand basic concepts.
For example, I was surprised with extremely easy questions for my current role, however, my interviewer told me that only 20-25% of people get an idea of many-to-many relationship in databases. About 30% will work it out if hinted and about half wouldn't get it anyway. And that was a hard question :) for an experienced software developer to work in a major bank.

Siddhi said...

Especially trees and recursion. I've found that many people get confused by basic trees and recursion.

Depressing.

Anonymous said...

The fact that this function counts nodes is very obvious.

To the (small) credit of those not recognizing binary trees, a linked list can certainly have members called "left" and "right" to post to the other list entires. On the other hand, this would likely hang this function (e.g. going left, then right, then left...) so is a pretty silly idea.

So someone thinking linked lists as a first impression, but who notices that that it will go horribly bad (even after prompting like "are you sure?") should still be a decent candidate, if they can also go from that to the idea of a tree (which is, after all, the next option, if you think of something like a linked list but that doesn't have either left or right to point backward) by themselves.

Anonymous said...

I've been writing software code for about 7 years now, and some of my applications are in production.

I have never needed to write a binary tree. I normally deal with code at the 3rd gen language level where I expect code for binary searches or lists to be a part of the API and all I need is to call the code.

Anonymous said...

To the anonymous poster above...

The basic concepts of recursion and binary tree data structures should be fundamental to any decent software developer. I have no doubt that all software developers utilize an environment's libraries in most of their work. However, if a candidate for a job at my company demonstrated that he did not understand recursion or binary trees, it would be hard to imagine that the candidate could solve more difficult problems that a library might not have an answer for.

A software developer who does not understand these concepts does not belong in the field, in my opinion.

Siddhi said...

We were writing web browsers for low end mobile phones. The code was in C, and it makes heavy use of custom multi-child tree data structures (for example to store the parsed HTML). It was very important that the candidate could read and recognise at least basic tree and recursion code.

Besides, like Hox said, this is trees 101, possibly the simplest tree example around.

Anonymous said...

Anon and Hox, you both make good points.

Anon: I've never written a binary tree either. Questions about databases (such as multi-multi relationships) are probably better as they are far more relevant to the typical business developer.

Hox: However, you are correct that good programmers should be able to understand curve balls that come to everyone from time to time. Or--a better way to say the same thing is that a good programmer should be able to quickly get past incorrect options so they can zero in on finding the right answer. (One way perhaps is to ask the interviewee to describe their thoughts verbally as they look at the code. Good programmers should quickly start narrowing down the options rather than guessing at high-probability results.)

For example--I missed the binary tree connection--however, in seconds I could see that the code snippet was some kind of traversal--and it was obviously not a typical array/linked list traversal. With a bit more context (a definition of the struct for example) it would be trivial to understand (and maintain) the code this came from.

(Over the years as a maintenance programmer I've learned some tricks about picking up context that make it much easier. That would be a good blog topic if anyone wants to pick it up. :-)

Anonymous said...

A formal definition is sort of extraneous in this regard. You can already put together a definition of the struct based on what is in the function.

As far are this function is concerned, only the left and right members are useful, and since they are passed into the f function they must also be somestruct objects.

struct somestruct
{
struct somestruct *left;
struct somestruct *right;

// other data members
};

Anonymous said...

Hox,
Turns out your opinion is quite worthless, because there are plenty of programmers who don't write binary trees and are writing very useful software for a number of customers.

Anonymous said...

I'm slightly sleep deprived right now.
I havent written any tree traveling code.
I havent even seen any tree code for years, and never for C.
I did find out what it did by executing it in my head.
And it should be obvious for everyone what it does.

Its a test and getting past it only requires basic programming knowledge, and some capability of deducing how the code operates.
It doesn't require much of a knowledge about data structures. You learn it in 2 different courses required for engineers *NOT* enrolled for computer science atleast in our university. [Like mechanical engineering students, electrical , chemists... ]
This thing just counts the number of nodes in a tree.

Anonymous said...

Oh I'm too tired to read ENGLISH properly.
You said what it does in article too...
Well I'm not professional programmer just electrical engineering student and I pass this.

Anonymous said...

Just clicking past and stopped here…

If the point of this entry was to demonstrate that it is smart to test candidates programming abilities then I think it makes sense. You should test candidates’ knowledge technical knowledge in the area in which they will be performing. As to whether this piece of code represents some kind of fundamental test I am considerably less certain. Lots of folks make a living and never touch C. Why test them in it? Furthermore my organization will not promote code like this so it is very unlikely that someone is ever going to need to maintain an unnamed and uncommented function. [Sometimes I start to feel sad for terrible fate that unerringly strikes those responsible for promoting code like the example, but then it passes.] The cost of software ownership is in maintenance and extension. It is simply too expensive to try and maintain systems of code like the example. I find that I need people who write well commented and well named code and can maintain and extend the same.

Later.

Siddhi said...

Hi Anonymous,

The point is to see if they can read code, and if they can execute small pieces of code in their head. It need not be trees, it could be anything.

In our case, the codebase that they would be working on was in C and used a lot of trees and recursion - hence this question was very important for us.

Josh said...

I haven't done any C programming in 7 years. I haven't written a tree data structure in 8 years. I got this in about 5 seconds. Even if they had never seen a tree before, they should be able to reason out what it does. The recursive calls and "left/right" are both dead giveaways for "tree." I might have been confused because the question seemed so clear to me, that I must have missed some important detail that made the problem more difficult than it appears initially.

More interesting would be to get someone to translate this into the programming language of their choice. They get to write a little code and let you know what language they feel most comfortable in.

(define (f a)
(cond ((null? a) 0)
(else (+ (f (node-left a))
1
(f (node-right a))))))
;; assuming a structure node with fields left and right

The person should be able to do this in about a minute (that includes debug time). If everybody writing on this blog and reading this blog gets this so quickly, who are the people not getting it? Could it be that people who are constantly reading up on various programming topics are simply better prepared to deal with pointers and recursion than people who aren't?

Anonymous said...

its absurd to expect a person to be able to understand a tree traversal or any other stuff if he hasnt used it for a long time.
most of the times ppl think around things which they are in touch recently.
so it makes sense to ask something basics and see if he can easily relate and grasp the problem. if he can, then it shd be possible for him to brush up anything he has done in the past and pick it up later during the work.