Problem¶

In some investment circles you often hear statistics such as "90% of actively managed investment funds failed to beat the market" in support of the idea of just buying index funds. For whatever it's worth, I like index funds too and think indexing is a good strategy. But sometimes these statistics are stretched too far to suggest that it's very unlikely that you could beat the market. These are dogmatic indexers. Sometimes it's even implied that this means there is a 10% chance of beating the market. Intuitively this just doesn't make sense.

In response to this idea that an average person has a miniscule chance of beating the market, I ask myself, "If I were to pick 1 stock, isn't there a fair chance that it will beat the market?" Warren Buffet has famously asserted that "diverisification is a protection against ignorance." We may not have the knowledge of Mr. Buffet but we can be confident that minimum portfolio diversity will have a high chance of deviating from the market average. The question at this point is the probablity that it strays in the desired direction.

These claims about beating the market are so common I decided to test the questions I asked myself. My goal is to find this answer for a period of time dating back as far as possible for a set of "normal" companies. The universe of companies is large and their profiles vary. I don't mean finding the next Apple or Amazon, but simply betting on an established, well-known company. Nothing special or complicated. In support of this, I will look at DIA, the oldest mutual fund tracking the Dow Jones, and it's components to see what portion of these outperformed the index as a whole since 1998.

Limitations of this approach¶

Only 1 time period. We are limited by access to data here.
This only looks at the change in price between a start and end date. This does not suggest what performance would be like if you were to invest at intervals during this time period, such as dollar-cost-averaging.
Does not factor in dividends.

Data Pipeline¶

Uses functions defined in common.py to prepare our dataset.

In [15]:

df = pd.read_csv("data/dow.csv", index_col=[0,1])

df_processed = (df
 .pipe(startPipeline)
 .pipe(clean)
 .pipe(trim)
 .pipe(flatten_date)
 .pipe(remove_outliers)
 .pipe(add_percent_change)
)

df_processed

startPipeline:
  runtime=0:00:00, end shape=(63, 7)
clean:
  runtime=0:00:00.000979, end shape=(62, 7)
trim:
  runtime=0:00:00.000997, end shape=(62, 1)
flatten_date:
  runtime=0:00:00.007032, end shape=(31, 4)
remove_outliers:
  runtime=0:00:00, end shape=(31, 4)
add_percent_change:
  runtime=0:00:00.001985, end shape=(31, 5)

Out[15]:

	Start Date	End Date	Start Close	End Close	Percent Change
Symbol
AA	1998-01-20	2021-01-14	29.530121	25.090000	-15.035904
AXP	1998-01-20	2021-01-14	17.698372	123.779999	599.386359
BA	1998-01-20	2021-01-14	27.150482	209.910004	673.135454
CAT	1998-01-20	2021-01-14	9.994546	197.399994	1875.077159
COKE	1998-01-20	2021-01-14	42.274349	258.570007	511.647517
...	...	...	...	...	...
T	1998-01-20	2021-01-14	12.724303	29.290001	130.189428
TRV	1998-01-20	2021-01-14	23.375343	142.320007	508.846704
UK	1998-01-20	2021-01-14	NaN	NaN	-200.000000
WMT	1998-01-20	2021-01-14	14.052621	146.970001	945.854737
XOM	1998-01-20	2021-01-14	15.515985	50.310001	224.246252

31 rows × 5 columns

Benchmark (DIA) Performance Since 1998¶

In [16]:

def benchmark_percent_change(df):
    return df.loc["DIA"]["Percent Change"]

def print_benchmark_info(df):
    dia_pct_change = benchmark_percent_change(df)
    print(f"The Dow Jones has risen {dia_pct_change} from " \
          f"{df.loc['DIA']['Start Date']} to {df.loc['DIA']['Start Date']}")

print_benchmark_info(df_processed)

The Dow Jones has risen 549.561518336611 from 1998-01-20 to 1998-01-20

As we can see, the DIA has risen 549.56% since 01/20/1998. Now we can compare this to each company to see how many companies rose more.

Percent of Individual Dow Companies that Outperformed DIA¶

In [17]:

def outperformers(df):
    return df[df["Percent Change"] > benchmark_percent_change(df)]

def underperformers(df):
    return df[df["Percent Change"] < benchmark_percent_change(df)]

def print_outperformers_info(df):
    total_companies = len(df) - 1 # exclude DIA
    df_outperform = outperformers(df)
    total_above_dia = len(df_outperform)
    
    float_string = "%.2f" % (total_above_dia/total_companies * 100)
    avg_outperform_string = "%.2f" % df_outperform['Percent Change'].median()
    
    print(f"{float_string}% performed better than DIA ({total_above_dia}/{total_companies})")
    print(f"{avg_outperform_string}% was the median percent change for the outperformers")
    print(f"Outperformers:\n{df_outperform.index.to_list()}")
    
print_outperformers_info(df_processed)

43.33% performed better than DIA (13/30)
711.66% was the median percent change for the outperformers
Outperformers:
['AXP', 'BA', 'CAT', 'CVX', 'DIS', 'HON', 'JNJ', 'JPM', 'MCD', 'MMM', 'MO', 'RTX', 'WMT']

In [18]:

def print_std(df):
    std = df.drop("DIA")["Percent Change"].std()
    print(f"The standard deviation of the Dow Jones is {std}")
    
print_std(df_processed)

The standard deviation of the Dow Jones is 504.0683919778865

Conclusions¶

43.33% (13/30) of the companies in the Dow Jones in 1998 beat DIA, the Dow Jones Index, over the same time period, with a median outperformer rising 711.66%, compared to the 549.56% for DIA. Components of the Dow have a standard deviation of 504.07%.

A few of these companies went bankrupt and were delisted. These are represented by NaN in our dataframe because there was no stock data for either their start, end, or both. However, we know the percent change for any number to 0 is -200%, so we are able to use this to calculate the standard deviation. And of course, they are part of the 17 companies that did not beat the average.

This suggests if you were to invest in a Dow Jones company your chance of beating the market is likely much higher than the miniscule 10% sometimes implied by dogmatic indexers.

Instead of suggesting that one's chances of beating the market is necessarily low, I think it's better conclude that most people are not willing to accept the risk of straying from the market average.