We’re trying to learn more about mutual funds, which we find quite frightening, so let’s start by breaking down some terms, like R-squared, a measure of volatility. Here’s what Vanguard says:

R-squared measures how much a fund’s past returns can be explained by the returns from its benchmark index.

If a fund’s total returns were precisely synchronized with the index’s return, its R-squared would be 1.00 (100%). If a fund’s returns bore no relationship to the index’s returns, its R-squared would be 0.

The higher the R-squared, the more the fund’s return can be explained by the performance of the index, and so the performance of the market or market segment. The lower the R-squared, the more the return can be explained by the fund manager’s decisions.

So, no-load index funds, which we’re interested in, handled completely by computers, which attempt to sync with the performance of a benchmark index, like the S&P 500, should have an R-squared of 1. For example, the Vanguard 500 Index, whereas the “Vanguard Capital Value” fund seeking “companies that are out of favor with investors and that are trading at prices below what the stocks are worth compared to potential earnings” has an R-Squared of 50.88%.

(Don’t think we have a crush on Vanguard or anything, we just have an account there so that’s the easiest place to go for us for this information)

This is also referred to as the “coefficient of determination” and can be determined using scary Greek symbols.

Luckily for mere mortals, places like Google Finance figure out the R-squared for ya.

In addition, the closer the R-squared is to zero, the lower you should demand your expense ratios to be – your returns are not due to a manager, but the market as a whole.

As R-squared diverges from one, you can expect to see expense ratios go up, as the funds are more actively managed.

And i have a huge crush on Vanguard… it’s ok.

Sigh…liberal arts majors…

Technically, r-squared is not a measure of “volatility” per se but rather a measure of goodness-of-fit, or more plainly, how much of the variability can be attributed to the statistical model. In this case, it measures how much of the fund’s returns can be accounted for by the stock index’s returns.

Let’s not forget the saying: “Correlation does not equal causation.”

R-squared values go from 1 to -1. This is a measure of correlation. It means that if r^2 is -1, then the items are inversely correlated.

This only tells you about history and is in no way predictive.

@baa:

Just to be clear, the closer the R-squared is to ONE (not to zero), the lower the fees should be, as the fund is more closely tracking the index.

Some research suggests that the instances in which fund managers improve on (and fall short of) the market return are due almost entirely to chance. That is, there aren’t many good fund managers out there who can consistently beat the market (those that do are lucky or cheating). Maybe a good use of R^2 here is: higher coefficient = higher longterm return.

Also check out http://www.morningstar.com for more info on mutual funds. Lots of info available with a free membership.

@ baa & Justaguy2

While R-squared values closer to 1 DO likely indicate less active work is being done by a manager, R-squared values should be (mostly) irrelevant to fees. You should seek lower fees regardless of the R-squared value. It’s like using asking two pharmacies which has lower hourly wages to decide where to buy your prescription from – while there MAY be some correlation, why don’t you just ask which has the cheaper price instead?

Correlation is an abused tool. If you correlate ice cream sales vs shark attack it will tell you that when ice cream sales are high, shark attacks are more frequent. To the masses this means ice cream sales causes shark attack.

I have to disagree with your definition of R^2…Having just taken a college Statistics class, I feel I am “certified” to explain…

First, in statistics, R^2 is derived from R. R is a variable given to the number that describes how strong a linear relationship is between points. It is limited from being -1 to 1 inclusive. For example, if we take the points (2,10) (4,20) (6,40) we have a perfect linear relationship because if we were to draw a regression line, it would pass through all of those points. Therefore the R value is 1. The sign (+ or -) of the R value determines of it is a positive linear relationship or a negative linear relationship.

In real life situations, nothing will ever have a PERFECT linear relationship so therefore, we use R to describe HOW STRONG/WEAK the POSITIVE/NEGATIVE linear relationship is.

The R^2 value is derived by simply multiplying R by it self. R * R = R^2.

A contextual explanation using mathematics of how R^2 is derived:

It can also be derived by taking a set of points from data and drawing a regression line using statistical software (TI-83 and on calculators will do).

When the regression line is drawn, there will most likely be points above and below the regression line. Those points are called the “actual values”. The regression line contains all of the “expected values”.

Next one needs to find the residuals. Take the “expected/estimated values” (y value) and subtract the “actual value” (y value) of the various points then square that result. The sum of these values is the explained sum of squares.

Then perform the same operation except take the “actual value” (y-value) of the various points and subtract the mean of the y values. Sum those values up and you then have the total sum of squares.

After that, take the “expected/estimated value” (y value) and subtract the mean of the y values (actual/observed values).

Using the formula provided at the start of the pages to find R^2 from the other individual formulas would yield the R^2 value.

R^2 basically gives those that extrapolate from the regression line the amount of “correctness” and how reliable the regression line is.

“If a fund’s returns bore no relationship to the index’s returns, its R-squared would be 0.”

It is important to understand that just because the R value (or R^2) is VERY close to 0 IT DOES NOT MEAN THAT THERE IS NO RELATIONSHIP. It just means there is NO LINEAR RELATIONSHIP.

Oh BTW, I scored a 4 out of 5 on my Advanced Placement Statistics Examâ€¦

People, we’re not trying to correlate anything. We’re just trying to define what R-squared means.

It’s really easy BTW to use a TI calc to derive these values.

If you are using the 83/84 series, press enter the Catalog and turn DiagnosticsOn.

Then press “STAT” then go to “EDIT”. Enter your values for the horizontal axis in L1 and the values for the corresponding x-values in the L2 field. Then go to “STAT” then “CALC” and perform a Linear Regression Line. You will get the values for the slope and the y-intercept. Look down a bit more and you get a r and r^2 value!

Play around with it. Make points that are linear, and you will see it become 1 or -1 depending on the slope. The more non-linear your graph is, the closer to 0 it becomes.

In my previous comment, I mentioned that a R value of 0 means there is no LINEAR relationship. There are ways to make the data more linear such as taking the natural or common logarithm of the dependent variable (y), then performing another regression line. You may be surprised that it becomes closer to -1 or 1.

I really wish people at Vanguard, especially mathematical and statistical related, would refrain from getting too detailed unless they have someone that knows what they are talking about…

But in relation to the article…the closer R^2 is to 1 (100%). The stronger the linear relationship there is between the fund and indexes returns.

@Ben Popken:

R squared is a measure of correlation. It’s Stats 102.

Jkinnyc – R^2 is strictly between 0 and 1. You’re thinking of just plain old r, the correlation coefficient, when you say it’s between -1 and 1.

heh, I’m TAing an intro stats class this summer, and was just prepping a bit on correlation for Monday’s class.

Another fun “Correlation (Assocation) is Not Causation” one: There’s very strong correlation between the number of firefighters who respond to a blaze, and the dollar value of damage done by that fire. So, obviously, we should just send one truck out to all fires, cos less damage is done that way.

@qitaana:

Yeah You’re right, my brain is scrambled today.

@Skiffer:

Another liberal arts major here. Math make head hur…OMG! Ponies! Look!

@qitaana:

Very true that Correlation Does Not Equal Causation. As my stats prof was fond of saying, however, “Correlation Should Make You Pretty Damn Suspicious.”

@JustAGuy2:

Your stats professor was wrong. Without significant other evidence, correlation means nothing. There was once a study correlating the relationship of MBA graduates and lactating mothers in Africa.

@JustAGuy2:

Sorry, your stats prof was wrong. Correlation alone means nothing except a linear relationship. Which can be completely coincidental.

Weird. My one post didnt show up into the second one did.

I have to disagree with your definition of R^2…Having just taken a college Statistics class, I feel I am “certified” to explain…

First, in statistics, R^2 is derived from R. R is a variable given to the number that describes how strong a linear relationship is between points. It is limited from being -1 to 1 inclusive. For example, if we take the points (2,10) (4,20) (6,40) we have a perfect linear relationship because if we were to draw a regression line, it would pass through all of those points. Therefore the R value is 1. The sign (+ or -) of the R value determines of it is a positive linear relationship or a negative linear relationship.

In real life situations, nothing will ever have a PERFECT linear relationship so therefore, we use R to describe HOW STRONG/WEAK the POSITIVE/NEGATIVE linear relationship is.

The R^2 value is derived by simply multiplying R by it self. R * R = R^2.

A contextual explanation using mathematics of how R^2 is derived:

It can also be derived by taking a set of points from data and drawing a regression line using statistical software (TI-83 and on calculators will do).

When the regression line is drawn, there will most likely be points above and below the regression line. Those points are called the “actual values”. The regression line contains all of the “expected values”.

Next one needs to find the residuals. Take the “expected/estimated values” (y value) and subtract the “actual value” (y value) of the various points then square that result. The sum of these values is the explained sum of squares.

Then perform the same operation except take the “actual value” (y-value) of the various points and subtract the mean of the y values. Sum those values up and you then have the total sum of squares.

After that, take the “expected/estimated value” (y value) and subtract the mean of the y values (actual/observed values).

Using the formula provided at the start of the pages to find R^2 from the other individual formulas would yield the R^2 value.

R^2 basically gives those that extrapolate from the regression line the amount of “correctness” and how reliable the regression line is.

“If a fund’s returns bore no relationship to the index’s returns, its R-squared would be 0.”

It is important to understand that just because the R value (or R^2) is VERY close to 0 IT DOES NOT MEAN THAT THERE IS NO RELATIONSHIP. It just means there is NO LINEAR RELATIONSHIP.

Oh BTW, I scored a 4 out of 5 on my Advanced Placement Statistics Examâ€¦

now i kind of want to get an MBA, just so i can be correlated to lactating african mothers.

@JKinNYC: Sigh…that wasn’t a “sigh…(i’m a) liberal arts majors…” it was a “sigh…(i can’t believe anyone’s even asking this question, I guess they’re all a bunch of) liberal arts majors…”

Luckily with my engineering degree I make enough that I don’t have to worry about saving or investing…

Ok, I’ll stop being facetious and utterly arrogant and get back to my CAD drawings…and all you liberal arts majors can go back to your parties…and leave me here…alone…just like in college :P

A great resource for trying to understand investing/finance lingo and concepts is http://www.investopedia.com. That site got me through many of my finance classes that I encountered while working on my MBA. Unfortunately though I never learned how to correlate numbers in relation to “lactating african mothers”.

@JKinNYC:

Yes, of course, without an explanation of _why_ two variables are correlated, the correlation is worthless. He was saying (which is very valid) that when you find a correlation between two numbers, it’s worth investigating.