In some investment circles you often hear statistics such as "90% of actively managed investment funds failed to beat the market" in support of the idea of just buying index funds. For whatever it's worth, I like index funds too and think indexing is a good strategy. But sometimes these statistics are stretched too far to suggest that it's very unlikely that you could beat the market. These are dogmatic indexers. Sometimes it's even implied that this means there is a 10% chance of beating the market. Intuitively this just doesn't make sense.
In response to this idea that an average person has a miniscule chance of beating the market, I ask myself, "If I were to pick 1 stock, isn't there a fair chance that it will beat the market?" Warren Buffet has famously asserted that "diverisification is a protection against ignorance." We may not have the knowledge of Mr. Buffet but we can be confident that minimum portfolio diversity will have a high chance of deviating from the market average. The question at this point is the probablity that it strays in the desired direction.
These claims about beating the market are so common I decided to test the questions I asked myself. My goal is to find this answer for a period of time dating back as far as possible for a set of "normal" companies. The universe of companies is large and their profiles vary. I don't mean finding the next Apple or Amazon, but simply betting on an established, well-known company. Nothing special or complicated. In support of this, I will look at DIA, the oldest mutual fund tracking the Dow Jones, and it's components to see what portion of these outperformed the index as a whole since 1998.
from common import *
import matplotlib.pyplot as plt
pd.set_option('display.max_rows', 10)
Uses functions defined in common.py
to prepare our dataset.
df = pd.read_csv("data/dow.csv", index_col=[0,1])
df_processed = (df
.pipe(startPipeline)
.pipe(clean)
.pipe(trim)
.pipe(flatten_date)
.pipe(remove_outliers)
.pipe(add_percent_change)
)
df_processed
startPipeline: runtime=0:00:00, end shape=(63, 7) clean: runtime=0:00:00.000979, end shape=(62, 7) trim: runtime=0:00:00.000997, end shape=(62, 1) flatten_date: runtime=0:00:00.007032, end shape=(31, 4) remove_outliers: runtime=0:00:00, end shape=(31, 4) add_percent_change: runtime=0:00:00.001985, end shape=(31, 5)
Start Date | End Date | Start Close | End Close | Percent Change | |
---|---|---|---|---|---|
Symbol | |||||
AA | 1998-01-20 | 2021-01-14 | 29.530121 | 25.090000 | -15.035904 |
AXP | 1998-01-20 | 2021-01-14 | 17.698372 | 123.779999 | 599.386359 |
BA | 1998-01-20 | 2021-01-14 | 27.150482 | 209.910004 | 673.135454 |
CAT | 1998-01-20 | 2021-01-14 | 9.994546 | 197.399994 | 1875.077159 |
COKE | 1998-01-20 | 2021-01-14 | 42.274349 | 258.570007 | 511.647517 |
... | ... | ... | ... | ... | ... |
T | 1998-01-20 | 2021-01-14 | 12.724303 | 29.290001 | 130.189428 |
TRV | 1998-01-20 | 2021-01-14 | 23.375343 | 142.320007 | 508.846704 |
UK | 1998-01-20 | 2021-01-14 | NaN | NaN | -200.000000 |
WMT | 1998-01-20 | 2021-01-14 | 14.052621 | 146.970001 | 945.854737 |
XOM | 1998-01-20 | 2021-01-14 | 15.515985 | 50.310001 | 224.246252 |
31 rows × 5 columns
def benchmark_percent_change(df):
return df.loc["DIA"]["Percent Change"]
def print_benchmark_info(df):
dia_pct_change = benchmark_percent_change(df)
print(f"The Dow Jones has risen {dia_pct_change} from " \
f"{df.loc['DIA']['Start Date']} to {df.loc['DIA']['Start Date']}")
print_benchmark_info(df_processed)
The Dow Jones has risen 549.561518336611 from 1998-01-20 to 1998-01-20
As we can see, the DIA has risen 549.56% since 01/20/1998. Now we can compare this to each company to see how many companies rose more.
def outperformers(df):
return df[df["Percent Change"] > benchmark_percent_change(df)]
def underperformers(df):
return df[df["Percent Change"] < benchmark_percent_change(df)]
def print_outperformers_info(df):
total_companies = len(df) - 1 # exclude DIA
df_outperform = outperformers(df)
total_above_dia = len(df_outperform)
float_string = "%.2f" % (total_above_dia/total_companies * 100)
avg_outperform_string = "%.2f" % df_outperform['Percent Change'].median()
print(f"{float_string}% performed better than DIA ({total_above_dia}/{total_companies})")
print(f"{avg_outperform_string}% was the median percent change for the outperformers")
print(f"Outperformers:\n{df_outperform.index.to_list()}")
print_outperformers_info(df_processed)
43.33% performed better than DIA (13/30) 711.66% was the median percent change for the outperformers Outperformers: ['AXP', 'BA', 'CAT', 'CVX', 'DIS', 'HON', 'JNJ', 'JPM', 'MCD', 'MMM', 'MO', 'RTX', 'WMT']
def print_std(df):
std = df.drop("DIA")["Percent Change"].std()
print(f"The standard deviation of the Dow Jones is {std}")
print_std(df_processed)
The standard deviation of the Dow Jones is 504.0683919778865
43.33% (13/30) of the companies in the Dow Jones in 1998 beat DIA, the Dow Jones Index, over the same time period, with a median outperformer rising 711.66%, compared to the 549.56% for DIA. Components of the Dow have a standard deviation of 504.07%.
A few of these companies went bankrupt and were delisted. These are represented by NaN
in our dataframe because there was no stock data for either their start, end, or both. However, we know the percent change for any number to 0 is -200%, so we are able to use this to calculate the standard deviation. And of course, they are part of the 17 companies that did not beat the average.
This suggests if you were to invest in a Dow Jones company your chance of beating the market is likely much higher than the miniscule 10% sometimes implied by dogmatic indexers.
Instead of suggesting that one's chances of beating the market is necessarily low, I think it's better conclude that most people are not willing to accept the risk of straying from the market average.