Participate in a Scientific Study about Code Readability

greldak

I'd have to agree on this. A large amount of this was heavily influenced by knowledge of the syntax of particular languages as opposed to looking at the readability of the code in general. In order to properly reflect a readability metric similar examples of more complex parts of syntax need to be included from a wider variety of languages. As it stands this survey is heavily skewed due to the choice of languages - the majority being from the C family.

Sander Rossel

greldak wrote:

the majority all being from the C family.

Or did I miss something? :rolleyes:

It's an OO world.

public class Naerling : Lazy<Person>{
public void DoWork(){ throw new NotImplementedException(); }
}

Peric Zeljko

I agree with Naerling, and it is interesting that there is no variation of text size or font type that is for me most important thing for text readability at all. All the best, Perić Željko

Daniel Vlasceanu

Hi, Will the results be made public somehow?

Nathan Nowak

To my knowledge, the snippets were Java, CUDA, and Python. Overall the survey has around 300 code snippets of which each participant is given a random sample of 20 to rate. I had many different questions about why the test was set up the way it was. I assumed there was some method to the madness. Wes was kind enough to explain some of rationale behind the test design in a forum post at Udacity. I still might have considered doing things differently but after reading his explanations my concerns over the soundness of the approach were relieved. A large reason the test is set up the way it is is because it is duplicating an earlier study that only had around 100 participants. You can read about that study if you like, Raymond P.L. Buse, Westley Weimer: Learning a Metric for Code Readability. IEEE Trans. Software Engineering Vol. 36 No. 4, 546-558, July/Aug 2010[^] There is some statistics jargon in there but it is pretty approachable overall. Part of the thing to realize is that they are using real world code from large or mature open source projects. Not only are they trying to develop some quantifiable measure of readability but they are also trying to determine if measuring readability is actually useful. They do this by looking at the correlation between readability scores and bug counts. They then look to see if changes in readability over time correlate with changes in the number of bugs over time. They are trying to get at the question of whether or not taking the time to make code more readable pays off. There are lots of different things and ways we could go about studying readability and if you read the report you will probably come up with even more ideas than you have now. It is actually kind of surprising how many questions in this area have not even been attempted to be answered. Wes and company are just focusing on one small area and trying to move the ball forward. For better or worse, such is the way with academics.

Nathan Nowak

I'm sure they are headed towards publishing a paper about their findings but I believe they will also make the data set public as they did in their first study that only had about 100 participants. Here is a link to their earlier study if you are interested. Raymond P.L. Buse, Westley Weimer: Learning a Metric for Code Readability. IEEE Trans. Software Engineering Vol. 36 No. 4, 546-558, July/Aug 2010[^] If you have the time the paper is only 14 pages long and other than some statistics jargon is pretty approachable. Having a little context for the current study helps answer a lot of questions. Thanks for taking the time to participate.

Nathan Nowak

Well, in defense of the researchers, not that anyone really seems to be attacking them, it is probably not possible to study all of the factors that impact code readability at one time. They selected a modest group of factors that they thought were most pertinent to the questions they were trying to answer. However, I really do like your idea both for its simplicity and testability. My personal hope is that one of two things will happen. One, researchers will realize that the power of the internet isn't the huge pool of test subjects it makes available but the huge number of quality ideas the community can generate. Ideas not just about how to test but what to test. Two, communities like code project will move a little towards doing something more like traditional research rather than just occasional opinion polls. If a decentralized community can come together to create a semi-respected resource like Wikipedia there is no reason a similar thing couldn't happen for research. There really is no reason code project couldn't do its own readability study.

Sander Rossel

*Shivers* This report reminds me so much of my university time... It isn't very nice to look at :) Thanks for the reply and the report though. I might even read the report when I have some spare time. There is always a method. I guess comparing C to Basic is like comparing apples to pears. I just had not expected C only when I started. It's an interesting study. Please keep us up to date. Perhaps you could post a news article about it when it is done :)

It's an OO world.

public class Naerling : Lazy<Person>{
public void DoWork(){ throw new NotImplementedException(); }
}

Peric Zeljko

I agree with you. Related to Wikipedia here is a link to the article that gives explanation what is 'Readability' and there are some interesting external links to different studies of this problem. Some of these studies are interesting because they find correlation of readability and comprehension of the text. Results are interesting. Readability survey[[^](http://en.wikipedia.org/wiki/Readability survey "New Window")] All the best, Perić Željko

greldak

That may explain why I got the impression that a couple of the snippits were duplicated - It may be interesting to compare how often those cases got the same rating from the same person and also to correlate with the position it was served up as our impressions will change as we work through them meaning that our rating of later samples are likely to be influenced by earlier ratings.