Reynald Oliveria - GDP and Markets

Tracking the Economy

In the high school economic course, we measured the (macro) economic performance of a nation in terms of GDP. Gross Domestic Product, or GDP, is the sum of all dollars necessary to buy the final goods and services produced by a nation. Final goods refer to goods which are ready to be sold to consumers like a bicycle, rather than producer goods which are what manufacturers would buy to produce a final good like bicycle chains.

While GDP is a great measure of how the economy performs as a whole, the economic effects more noticeable (and pertinent) to the average person is the movement of wages and prices. For wages, we can measure the disposable personal income of the population. This metric includes income that an individual may choose to spend or save. And for prices, we can use the Consumer Price Index (CPI) which tracks the cost of a fixed list of goods and services an average household might purchase over the years.

Because GDP and disposable income are measured in dollars, we will track the "real" versions which adjust for inflation. CPI, on the other hand, measures how valuable a dollar is. And thus, CPI needs not be adjusted for inflation as it is itself a measure of inflation

Now, it's important to note that the exact numbers of these metrics are not important to the problem. For the stock market to indicate economic performance, we want to see how these metrics "react", or move with, the stock market. In other words, what is more important is the movement of these metrics, rather than the raw numbers. Looking at this another way, asking "If an Amazon stock is worth $400, how much will I get paid today?" is a less interesting question than "If the stock market is doing well, should I expect a raise?"

And so, the metrics we will use to track economic performance will be Real GDP Growth, Real Disposable Income Growth, and the change in CPI. The raw data for these metrics are sourced from the St. Louis Federal Reserve's website and are linked in their names above.

Tracking the Market

On the other side of this is, how do we track the performance of the stock market? Aren't there like billions of stocks in the stock market? The overall performance of the stock market can be calculated by looking at the prices and amount of stock companies have available on the market. It turns out, only a handful of companies make up the vast majority of the market. So, if we just imagine we're holding a portfolio of a mix of the stocks of these companies proportionate to how large they are, then the performance of our portfolio is basically the performance of the whole market.

Luckily, companies and stock exchanges do this for us. You might have heard of the S&P 500 and the Dow Jones, well those do just that. For this study, I opt to use the Russell 3000 index (^RUA), which takes into account the performance of the top 3000 companies. The Dow Jones takes into account 30 companies, and the S&P 500 takes into account, you guessed it, the top 500.

But wait, we can track the performance of the stock market by making a portfolio that invests across the largest companies in a stock market. Can't we make a portfolio that invests in the largest companies in the real estate market and track the performance of that market? Yes, we can! And, financial service companies already do that for us. And so, we can include in this study the performance of the real estate market by including the Dow Jones Real Estate Index (DJUSRE) and the performance of the bond market by including the Vanguard Total Bond Market Index (VBTLX). Again, the data for these index prices can be found through the links on their names.

Rephrasing the Problem

Okay, now that we have all the data, let's rephrase the problem in terms of the data so we can tackle it. We want to know whether the stock (and real-estate and bond) market is a good indicator of the economy. What does it mean to be a "good indicator?" What should we be looking for?

Imagine, for a moment, that you and your friends decide to go to a field to play some volleyball. Now, you find that the field is a little muddy. The mud indicates that it rained recently. Then, after playing a few sets, you notice that dark clouds begin to gather in the sky. So you decide to leave as these dark clouds indicate that it will rain soon.

Here, we see two types of indication. One indicates something that happened in the past: the mud. This type of indication is called "lagging indication." And the other indicates something that will happen in the future: the rain. This type of indication is called "leading indication." But, at the same time, both of these "indicators" do not necessarily mean that it rained or rain is coming. Maybe someone just watered the lawn, and maybe the dark clouds don't break out to rain.

But they are certainly indicators. How do we know? Well we've seen rain and found mud afterwards many many times. And so, from our experiences, the presence of mud makes us more willing to bet that it rained beforehand. Likewise, the gathering of dark clouds, while will not certainly lead to rain, it makes us more willing to bet that it will rain. Again, because we've seen rain come after dark clouds more often than rain coming without those dark clouds.

So, we can ask: "Does the movement of the stock (or real-estate or bond) market give us information about the future (or past) economic performance of our nation?" Or perhaps more precisely, "Can we use the movement of the markets to predict economic performance of our nation or vice-versa?"

Now, there is a third type of indication. For example, if I suspect that I have the flu, I check if my body temperature is high. And if it is high, it indicates that I currently have the flu rather than indicating that I had the flu or I will have the flu. Like the other type of indications, high body temperature does not always mean that I have the flu, but it points in that direction. This type of indication which gives information about present conditions is called "coincident indication."

This raises another question analogous to the ones before: "Does the movement of the stock (or real-estate or bond) market give us information about the current economic performance of our nation?" Now, we have our data, and we have questions to answer with the data. We know what we're looking for, so let's dive in.

Managing the Data

All the sources for the data were linked in the description above. After downloading the data, I renamed the files so that they are easier to keep track of. I will also refer to the Disposable Income metric as DPI from now on. As usual, first we have to import some standard data science packages. I also import a package to perform some Granger causality tests for leading and lagging indication later on.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import stats
from statsmodels.tsa.stattools import grangercausalitytests

Then, I read in the economic performance metric data into a single pandas data frame. These were quarterly growth rates, so apart from starting on different quarters, they had the same dates. The csvs had two columns, one called DATE, and the other with the serial id for the data series which contained the growth rates. I renamed DATE to date and converted the entries to datetime objects. I also renamed the second column to the name of the metric which they represented, and divided by 100 to normalize the percentages.

econ = ['GDP', 'DPI', 'CPI']
econ_df = None
for e in econ:
    curr = pd.read_csv("%s.csv" % e)
    curr[e] = curr[curr.columns[1]]/100
    curr['date'] = pd.to_datetime(curr['DATE'])
    curr = curr[['date', e]]

    if econ_df is None:
        econ_df = curr
    else:
        econ_df = pd.merge(econ_df, curr, how = 'inner', on='date')

Now it's time for the market indices. VBTLX is an index fund, so it can sometimes provide dividends or be subject to stock splits. Each time such an event occurs, we must adjust the price, so we use the Adjusted Closing Price as the price for VBTLX. ^RUA has an Adjusted Closing Price column also which coincide with the Closing Price column as ^RUA is not traded. So, we will use the Adjusted Closing Price column for ^RUA as well. To make coding easier, I renamed the "Price" column for DJUSRE to "Adj Close" to match the other two indices. The data for DJUSRE did not come from the same site as the other two which warranted this renaming.

markets = ['^RUA', 'DJUSRE', 'VBTLX']
markets_df = None
for m in markets:
    curr = pd.read_csv("%s.csv" % m)
    curr[m] = curr['Adj Close']
    curr['date'] = pd.to_datetime(curr['Date'])
    curr = curr[['date', m]]

    if markets_df is None:
        markets_df = curr
    else:
        markets_df= pd.merge(markets_df, curr, how = 'inner', on='date')

The market index data is given as monthly prices instead of quarterly growth rates. So, to convert, we first calculate the growth rate from three months ago as quarters are three months long. After calculating the growth rate, we have rolling quarterly data, updated monthly. So then, we inner join the market index data with the economic performance data so that we only have quarterly data, updated on the same quarters as the economic performance data. One row is dropped as it has NaN entries for the market index data.

markets_df[markets] = (markets_df[markets] - markets_df[markets].shift(3))/markets_df[markets].shift(3)
df = pd.merge(econ_df, markets_df, how = 'inner', on = 'date')
df = df.drop(axis = 0, index = 0)
all_dat = econ + markets

Exploring the Data

My usual process begins with trying to visualize the data to find evidence of what we are trying to look for. Because we are looking for some sort of indication, let's begin by plotting these metrics on the same time axis and look for visual evidence.

metric_to_analyze = 'GDP'
fig, axs = plt.subplots(len(markets) + 1, sharex=True)
fig.suptitle('Growth Rates vs. Time')

axs[0].plot(df['date'], df[metric_to_analyze])
axs[0].set_ylabel(metric_to_analyze)
axs[0].set_ylim(
    [df[metric_to_analyze].min()-0.1 * np.abs(df[metric_to_analyze].min()), 
    df[metric_to_analyze].max()+0.1 * np.abs(df[metric_to_analyze].max())])

for i in range(len(markets)):
    axs[i+1].plot(df['date'], df[markets[i]])
    axs[i+1].set_ylabel(markets[i])
    axs[i+1].set_ylim(
        [df[markets[i]].min()-0.1 * np.abs(df[markets[i]].min()), 
        df[markets[i]].max()+0.1 * np.abs(df[markets[i]].max())])

fig.set_size_inches(9,4)

for ax in axs:
    ax.vlines(pd.to_datetime(['2008-10-01', '2014-01-01', '2020-04-01', '2020-07-01']),
        -1, 1, colors = 'r')

In the plot generated above, the red lines represent the most noticeable minima in the GDP growth vs time plot. By looking at the red lines on the other graphs, we see that they seem to coincide with similar extrema in ^RUA and DJUSRE. There does not seem to be the same pattern in the VBTLX. As far as I see, I don't see anything signifying leading or lagging indication.

Now of course, these are just the extrema, but if there were any indication, we expect the most evident signifiers to be found near the extrema. We will later then look at statistical tests to confirm our observations. We can perform similar analysis for the other metrics.

metric_to_analyze = 'DPI'
    fig, axs = plt.subplots(len(markets) + 1, sharex=True)
    fig.suptitle('Growth Rates vs. Time')
    
    axs[0].plot(df['date'], df[metric_to_analyze])
    axs[0].set_ylabel(metric_to_analyze)
    axs[0].set_ylim(
        [df[metric_to_analyze].min()-0.1 * np.abs(df[metric_to_analyze].min()), 
        df[metric_to_analyze].max()+0.1 * np.abs(df[metric_to_analyze].max())])
    
    for i in range(len(markets)):
        axs[i+1].plot(df['date'], df[markets[i]])
        axs[i+1].set_ylabel(markets[i])
        axs[i+1].set_ylim(
            [df[markets[i]].min()-0.1 * np.abs(df[markets[i]].min()), 
            df[markets[i]].max()+0.1 * np.abs(df[markets[i]].max())])
    
    fig.set_size_inches(9,4)
    
    for ax in axs:
        ax.vlines(pd.to_datetime(['2008-07-01', '2013-01-01', '2020-04-01',
                                  '2020-07-01','2021-01-01', '2021-04-01']),
        -1, 1, colors = 'r')

For DPI, I don't particularly see any strong evidence for indication.

metric_to_analyze = 'CPI'
    fig, axs = plt.subplots(len(markets) + 1, sharex=True)
    fig.suptitle('Growth Rates vs. Time')
    
    axs[0].plot(df['date'], df[metric_to_analyze])
    axs[0].set_ylabel(metric_to_analyze)
    axs[0].set_ylim(
        [df[metric_to_analyze].min()-0.1 * np.abs(df[metric_to_analyze].min()), 
        df[metric_to_analyze].max()+0.1 * np.abs(df[metric_to_analyze].max())])
    
    for i in range(len(markets)):
        axs[i+1].plot(df['date'], df[markets[i]])
        axs[i+1].set_ylabel(markets[i])
        axs[i+1].set_ylim(
            [df[markets[i]].min()-0.1 * np.abs(df[markets[i]].min()), 
            df[markets[i]].max()+0.1 * np.abs(df[markets[i]].max())])
    
    fig.set_size_inches(9,4)
    
    for ax in axs:
        ax.vlines(pd.to_datetime(['2006-10-01','2008-04-01','2008-10-01', '2011-04-01', 
                                  '2014-10-01','2021-04-01', '2022-04-01']),
        -1, 1, colors = 'r')

For CPI, it seems to me that before maxima there is a minimum preceding in both the ^RUA and DJUSRE plots. This may be evidence of leading indication. Again, VBTLX does not show evidence of indication.

We will use a significance level of 5% for the tests that will be performed.

Coincident Indication

To check for coincident indication, we check the movement of our economic metrics against the movement of our markets. We can perform an F-test to test for the significance of a linear model. In less technical terms, the most basic thing that we can do if we had to "guess" the movement of the economy at a given point in our dataset is to just guess the mean. This, of course, will produce some errors. But, what we can try to do is to integrate the information of the coinciding movement of the markets. Then, we'll check if the errors are significantly reduced by integrating this information.

to_disp = pd.DataFrame(index = markets, columns = econ)
for e in econ:
    for m in markets:
        result = stats.pearsonr(df[m], df[e])
        to_disp.loc[m][e] = "%.3f%%" % (result[1] * 100)
to_disp

	GDP	DPI	CPI
^RUA	0.000%	82.213%	0.281%
DJUSRE	0.004%	45.931%	0.598%
VBTLX	84.583%	22.067%	3.642%

Above is a table of p-values for attempting a linear regression between economic metrics and market performance. The interpretation of the 0.281% for the CPI and ^RUA entry is as follows: If we assume that integrating information provided by ^RUA does not help in predicting CPI, then there is only a 0.281% chance to observe data as contradictory or more contradictory to that assumption than the data we had.

The predetermined significance level of 5% means that p-values below that 5% indicate sufficient evidence in the data to conclude the hypothesis we are testing. This means that from the table, we can conclude that there is coincident indication between GDP and each of ^RUA and DJUSRE, as well as coincident indication between CPI and each of the three market indices.

These p-values however do not particularly tell us about how exactly we should read these indicators. We can get better interpretations of how the indicators work by looking at the slopes of the linear regression.

to_disp = pd.DataFrame(index = markets, columns = econ)
for e in econ:
    for m in markets:
        slope, intercept, _, _, _ = stats.linregress(df[m], df[e])
        to_disp.loc[m][e] = "%.3f" % (slope)
to_disp

	GDP	DPI	CPI
^RUA	0.406	-0.035	0.039
DJUSRE	0.258	-0.084	0.026
VBTLX	0.053	0.565	-0.081

The above table displays the slopes of linear regressions made between the two variables. The slopes written in gray text represents ones that aren't statistically siginicant which corresponds to the entries on the p-values table over the significance level of 5%. An interpretation of the 0.258 entry in the DJUSRE row and the GDP column is that a 1% increase in DJUSRE leads us to predict, on average, a 0.258% increase in GDP.

A step that perhaps should have been performed prior to the actual test, is to check the assumptions necessary for the test to apply. More precisely, check whether we can find evidence that assumptions are violated. The assumptions are rather technical, but a good way to check for their violation is looking at scatterplots and residual plots to find unusual patterns.

fig, axs = plt.subplots(len(econ),len(markets))
for i in range(len(markets)):
    for j in range(len(econ)):
        axs[j, i].scatter(
            df[markets[i]], df[econ[j]])
            
        axs[j ,i].set_title(
            '%s vs. %s' % (econ[j], markets[i]))
        axs[j,i].tick_params(left = False, right = False , labelleft = False , 
                labelbottom = False, bottom = False) 

fig.set_size_inches(8,8)

The scatterplots above all either show no pattern or a linear pattern. There are a few outliers, but none of these are compelling evidence that the assumptions to perform the tests we did above were broken by this dataset. Next, we can check the residual plots which plot the error of a prediction against the prediction. We should find no patterns here.

fig, axs = plt.subplots(len(econ),len(markets))
for i in range(len(markets)):
    for j in range(len(econ)):
        slope, intercept, _, _, _ = stats.linregress(df[markets[i]], df[econ[j]])
        predicted = slope * df[markets[i]] + intercept
        axs[j,i].scatter(predicted, predicted - df[econ[j]])
        axs[j,i].set_title(
            '%s vs. %s' % (econ[j], markets[i]))
        axs[j,i].tick_params(left = False, right = False , labelleft = False , 
                labelbottom = False, bottom = False)
        axs[j,i].axhline(y = 0, linestyle = '--', color = 'black') 

fig.set_size_inches(8,8)

The residual plot seems to have no patterns and very few outliers relative to the amount of data we have. Which means, that by looking at most the data, it seems to be centered around 0, which is indicated by the dashed black line on the charts. Again, we find no compelling evidence that the assumptions to perform the statistical tests above were broken by this dataset.

Leading Indication

Testing for leading indication is slightly more complicated than testing for coincident indication. For now suppose that performance of the economy (as measured by our chosen metrics) in the next quarter is partially explained by the current performance of the economy or its performance in recent quarters. In other words, the current and recent performance of the economy can help inform a prediction for the performance of the economy over the next quarter.

Since the performance of the economy coincides with the performance of some investment markets, then we should also expect that the current and recent performance of investment markets can help inform a prediction for the performance of the economy over the next quarter. But, this does not make the markets good leading indicators for economic performance if they act only as a proxy for actual economic performance since these data are equally accessible. Instead, what we should check is whether including the performance of markets with actual economic performance makes for better predictions for future economic performance than only using current and recent economic performance alone.

And this is precisely what a Granger Causality test does. First, it tries to create a prediction model for economic performance using previous economic performance. Then, it adds to the model previous market performance. After which, a p-value is calculated by comparing the change in errors between the two models.

granger_results = {}

for e in econ:
    granger_results[e] = {}
    for m in markets:
        granger_results[e][m] = grangercausalitytests(
            df[[e,m]], 7, verbose = False)

for lag in range(1,8):
    print('Looking back at most %d quarter(s)' % lag)
    for e in econ:
        for m in markets:
            print("%s Granger causes %s: p = %.3f%%" % 
                (m, e, 
                granger_results[e][m][lag][0]['ssr_ftest'][1] * 100))
        print()
    
    print('-------')

I investigated models going back up to 7 quarters to predict the following quarter. Not many p-values were less than 10%. Using the preceding 2, 3, 4, and 6 quarters of ^RUA to help predict GDP yields p-values between 7.195% and 7.549%. And using 5 preceding quarters of ^RUA to predict GDP yields a p-value of 5.919%, and using the same to help predict CPI yields a p-value of 8.481%. Full results can be found here.

We can interpret this p-value of 5.919% as follows: If we were to assume that adding the information of 5 preceding quarters of ^RUA to the information given by the 5 preceding quarters of GDP does not make better predictions than just using the 5 preceding quarters of GDP, then there is only a 5.919% chance to observe data as contradictory or more contradictory to that assumption than the data we had.

None of these p-values, using the significance level of 5%, lead us to the conclude that the data is sufficient evidence that the performances of investment markets are leading indicators of economic performance. However, let's introduce a little bit of humanity into this data science. In my experience working with financial data, finding p-values this low will at least mark potential indicators as interesting and warrant further study in the absence of statistically signicant p-values.

Lagging Indication

Luckily, the step from leading indication to lagging indication is less steep than from coincident to leading. If A is a leading indicator of B, then B is a lagging indicator of A, and vice versa. And so, all we need to do to check whether economic performance is a leading indicator of market performance.

granger_results = {}

for e in econ:
    granger_results[e] = {}
    for m in markets:
        granger_results[e][m] = grangercausalitytests(
            df[[m,e]], 3, verbose = False)

for lag in range(1,4):
    print('Looking back at most %d quarter(s)' % lag)
    for e in econ:
        for m in markets:
            print("%s Granger causes %s: p = %.3f%%" % 
                (e, m, 
                granger_results[e][m][lag][0]['ssr_ftest'][1] * 100))
        print()
    
    print('-------')

I investigated models going back up to 3 previous quarters. We find p-values below 1% when testing for VBTLX and ^RUA being lagging indicators for CPI using 2 and 3 previous quarters, and also for VBTLX being a lagging indicator for CPI in 1 previous quarter. Additionally, we find a p-value of 3.446% when testing whether ^RUA is a lagging indicator of DPI using only 1 previous quarter, and a p-value of 3.823% using 3 previous quarters. Full results can be found here.

Somewhere within the sections collapsed by default, I began to call the growth rate of Real Disposable Income as DPI. So, I note that here in case the reader has skipped those sections. Below is a table that summarizes what we found in statistical testing.

	GDP	DPI	CPI
^RUA	Coincident	Lagging	Coincident Lagging
DJUSRE	Coincident		Coincident
VBTLX			Coincident Lagging

GDP

DPI

CPI

^RUA

Coincident

Lagging

Coincident

Lagging

DJUSRE

Coincident

VBTLX

Coincident

Lagging

The entry in the VBTLX row and the CPI row is interpreted as "The quarterly growth rate of VBTLX was found to be a statistically significant coincident indicator and lagging indicator for the quarterly growth rate of CPI."

Was Mr. Hinkle Correct?

Technically, no. From the table above, we see that in fact, investment markets are indicators for the metrics of economic performance. Not only that, Mr. Hinkle spoke specifically about the stock market and was probably referring to GDP in his initial statement. We see that the performance of the stock market is a statistically significant coincident indicator for GDP.

But in another sense, he was also correct. A notable feature of our findings is that there is no statistically significant evidence in the data that lead us to conclude that any of the investment markets are leading indicators of the economic metrics. An economic pundit would do best to deal in leading indicators to forecast the economy. The coincidence of growth in GDP and the good performance of the stock market is more so an evidence in support of economic theories, rather than a tool for pundits to predict future economic performance.

There is a third answer here also. It's true that I found no statistic that demonstrated that the data contained significant evidence for the investment markets being a leading inidcator of economic performance. However, I also noted that there were some interesting statistics (relatively low p-values) that pointed in the performance of the stock market perhaps being leading indicator of GDP. This warrants further investigation. Especially with the stock market index and GDP, we can find data with finer resolution than quarterly: Mothly for GDP, and perhaps even down to every hour or every minute for the stock market.

Higher resolution data will give more resolute conclusions. However, this is not a trivial extension. Whoever conducts this study must consider the random noise that is more evident in higher resolution data. And, if they so choose to compare GDP and the performance of the stock market with different resolutions, how to aggregate the higher resolution data must also be considered.

Further Back and Another Way

We can also expand the data and reach more conclusive findings by using a larger time window. The window for this project was chosen to simply be the highest-resolution and most expansive data we can find available in the chosen indices and economic metrics. There was not a serious effort to find indices and metrics that had existed for larger time windows or in higher resolution. But such data exist. The Russell 3000, for example was launched in 1984, but the S&P 500 was launched in 1957 with predecessors going back to 1923. It would be interesting to re-perform this study with more data.

But, perhaps we are looking at this in a limited way. Mr. Hinkle kind of set us up here. Well, we can't blame him as his focus is the economy. For many of us, these macroeconomic trends are not that interesting. Maybe, especially those of us who are younger and just started earning an income, we are more interested in how to invest. And in that sense, we should be looking for indicators of the investment markets, rather than indicators of macroeconomic trends.

The data here suggests that the performance of some investment markets are lagging indicators of economic performance. From an investor's point of view though, this means that economic performance are leading indicators of the performance of investment markets. In other words, we can use the performance of the economy at large to inform us of our investment decisions.

A study down this path will be rather different than the one performed here. Here we created models to attempt to analyze the relationship between economic performance and the performance of investment markets in the past. But, to convert this into a useful study for investors, we have to move away from analysis to prediction. And that presents new challenges as we somehow have to "test" our prediction model without having future data. Further, in such a study, a goal would also be to turn this prediction model to an investment strategy, which also needs to be tested.

Poker for the Rich?

Analyzing the relationship between trends in macroeconomic performance and investment markets

The Problem

Some Background

Tracking the Economy

Tracking the Market

Rephrasing the Problem

Dealing with Data

Managing the Data

Exploring the Data

Testing Hypotheses (with Statistics Ooooh)

Coincident Indication

Leading Indication

Lagging Indication

Conclusions and Further Questions

Was Mr. Hinkle Correct?

Further Back and Another Way