Calculating correlation between two stocks with Python! by toalsty

View this thread on steempeak.com
· @toalsty · (edited)
$0.07
Calculating correlation between two stocks with Python!
![Intrfo.png](https://cdn.steemitimages.com/DQmfP47hHVTeHTHSxnz9gvBYrUe3ZLZqSYovzPKWojfmT66/Intrfo.png)

Today we‘ll look at how we can utilize python to give us the correlation between two stocks. Our goal is to write a command line programm that takes in two .csv files with the closing prices. Furthermore we want to be able to specify the points in time between which we‘ll look for the correlation.

Here is the link to the script we are talking about:

https://github.com/toslaty/steem/blob/master/corr.py

You might ask youself where you are able to get the data. For now we‘ll just use the pricing data we can download on yahoo.com. Here are the links to the files we‘ll use in this tutorial.

https://finance.yahoo.com/quote/AAPL/history?p=AAPL
https://finance.yahoo.com/quote/GOOGL/history?p=GOOGL

There are definitely better ways to acquire the data. If you want to know more about it you can look  for the Pandas Webreader on the web. For now we‘ll run with the csv-files from Yahoo.

First we import all the libraries we‘ll use in this tutorial. Those are the following:

![impos.png](https://cdn.steemitimages.com/DQmYWsnFEUzsSEQ9H3sWWdb3QrRqWHFY2bTGcHvq9DnVKti/impos.png)

First we import Pandas, which is a library for analyzing and working with data. The next two lines might look a bit confusing. It looks like we are importing dateime twice but that isn‘t true. The first import is the import of the module datetime. The second ones refers to the class, also called datetime. Later we‘ll look at why we imported it this way.
After that we‘ll import argparse which you might know from [this tutorial](https://steemit.com/technology/@toalsty/a-little-python-script-for-evaluating-subdomains) . Argparse is an easy to use module that helps us create a command line interface. We'll also import math to help us with the math later on.


Now to the main function:

![main.png](https://cdn.steemitimages.com/DQmQ7ViA4JbSpbuXgbuwr9QjmV7Xv6oMaL1rgaw6zeZV1fE/main.png)

We define four arguments with the *add_argument()* method so that we can call our script with the following parameters.

-f is the first company 

-s is the second company

-st starting time in the format YY-MM-DD 

-et end date in the format YY-MM-DD 

![clicmd.png](https://cdn.steemitimages.com/DQmeJpxKk5NTyv9DHEq43Rrs7SdSqSRtgzjWk7Ta6uNKN2e/clicmd.png)

After we parse the arguments with the *parse_args()* method. Now we define two variables called start and end. Here we use datetime with its strptime() method to format the Time given. *strptime()* takes in two arguments. The first one is the string and the second one is the format specifier.

The next two variables called one and two each call our prep_data() function that we defined in line 9. 

![prep.png](https://cdn.steemitimages.com/DQmcK6L6erg6i5cRBiBAybbpQTT1GkszMpuTqokWzFyTCoz/prep.png)

The function takes in three arguments(The stocks as .csv, the start date, the end date ). The first variabe we define is one called name where we simply strip the ‘.csv‘ part of the string.

Now we define our dataframe with the fr variable. It calls pandas read_csv() function to import our data from the .csv file. 
Then we use *drop()* to drop the columns we don‘t need because we just want the „Adj Close“ column. In the next line we rename the Adj Close column to the stock name. 
We now define the rng variable that will be returned by the function. It uses the loc indexer which we pass the start and end date to, so that we only get the timeframe we specified in the command line.

Back in our *main()* function we noe call the concat function that will concatenate the two datasets. Furthermore the axis on which to concatenate will be specified.

If you now *print()* the ind variable you should get the following.

![table.png](https://cdn.steemitimages.com/DQmQxGqd4kEQ6nDBVQQVrnWGF5eRjARWji6MS5R9SRmxhHZ/table.png)

In the last line of the main() function we simply print the sentence „The correlation between the stocks is :“ and then calls the corr_stocks() function that is defined in line 21. 

![calc.png](https://cdn.steemitimages.com/DQmbRpGbKms281nN1xwA4JbKmUToWHdDxNfteT2cfQDEcmk/calc.png)

Whats happens here? In general we broke down the following formula for calculating the correlation, into some smaller steps.

Correlation Function:

Corr = (n * Sum(X,Y) – (Sum(X) * Sum(Y)) / SquareRoot((n * Sum(X^2) – Sum(X)^2) * (n * Sum(Y^2) - Sum(Y)^2))

Where:

n – is the number of days in our case
Sum- The Sum of whats in the parenthesis 

In our function *corr_stocks()* we brake that down into several smaller steps. First we calculate five different sums. 
The first one is the sum of all the values in column 0(Stock A). The second one does the same for the values of column 1(Stock B).
The third one is Sum(X,Y) from our function above. It multiplies the values in each row and then gives us the sum of these.
The fourth and fifth are the sum of each value squared. 

After that we simply define t by counting the length of the index. That gives us our n in the above function. We then calculate the values for each side of the divisor and then define and return the correlation between the two stocks.

![result.png](https://cdn.steemitimages.com/DQmU3FFd3ULQBuDonXUDSfxKW3pz9XDrzq5eiPTQsEh144Y/result.png)

So that‘s it for today! You can leave questions in the comments if you want to.
👍  , , , , , ,
properties (23)
post_id70,448,226
authortoalsty
permlinkcalculating-correlation-between-two-stocks-with-python
categorytrading
json_metadata{"tags":["trading","python","technology","programming","finance"],"image":["https:\/\/cdn.steemitimages.com\/DQmfP47hHVTeHTHSxnz9gvBYrUe3ZLZqSYovzPKWojfmT66\/Intrfo.png","https:\/\/cdn.steemitimages.com\/DQmYWsnFEUzsSEQ9H3sWWdb3QrRqWHFY2bTGcHvq9DnVKti\/impos.png","https:\/\/cdn.steemitimages.com\/DQmQ7ViA4JbSpbuXgbuwr9QjmV7Xv6oMaL1rgaw6zeZV1fE\/main.png","https:\/\/cdn.steemitimages.com\/DQmeJpxKk5NTyv9DHEq43Rrs7SdSqSRtgzjWk7Ta6uNKN2e\/clicmd.png","https:\/\/cdn.steemitimages.com\/DQmcK6L6erg6i5cRBiBAybbpQTT1GkszMpuTqokWzFyTCoz\/prep.png","https:\/\/cdn.steemitimages.com\/DQmQxGqd4kEQ6nDBVQQVrnWGF5eRjARWji6MS5R9SRmxhHZ\/table.png","https:\/\/cdn.steemitimages.com\/DQmbRpGbKms281nN1xwA4JbKmUToWHdDxNfteT2cfQDEcmk\/calc.png","https:\/\/cdn.steemitimages.com\/DQmU3FFd3ULQBuDonXUDSfxKW3pz9XDrzq5eiPTQsEh144Y\/result.png"],"links":["https:\/\/github.com\/toslaty\/steem\/blob\/master\/corr.py","https:\/\/finance.yahoo.com\/quote\/AAPL\/history?p=AAPL","https:\/\/finance.yahoo.com\/quote\/GOOGL\/history?p=GOOGL","https:\/\/steemit.com\/technology\/@toalsty\/a-little-python-script-for-evaluating-subdomains"],"app":"steemit\/0.1","format":"markdown"}
created2019-02-21 13:59:00
last_update2019-03-22 10:28:09
depth0
children3
net_rshares133,122,495,748
last_payout2019-02-28 13:59:00
cashout_time1969-12-31 23:59:59
total_payout_value0.052 SBD
curator_payout_value0.016 SBD
pending_payout_value0.000 SBD
promoted0.000 SBD
body_length5,355
author_reputation22,675,431,258
root_title"Calculating correlation between two stocks with Python!"
beneficiaries[]
max_accepted_payout1,000,000.000 SBD
percent_steem_dollars10,000
author_curate_reward""
vote details (7)
@luueetang ·
Wow. This is something great that you have made. I will check it out.

Posted using [Partiko Android](https://steemit.com/@partiko-android)
👍  
properties (23)
post_id70,454,239
authorluueetang
permlinkluueetang-re-toalsty-calculating-correlation-between-two-stocks-with-python-20190221t164523085z
categorytrading
json_metadata{"app":"partiko","client":"android"}
created2019-02-21 16:45:24
last_update2019-02-21 16:45:24
depth1
children0
net_rshares505,471,366
last_payout2019-02-28 16:45:24
cashout_time1969-12-31 23:59:59
total_payout_value0.000 SBD
curator_payout_value0.000 SBD
pending_payout_value0.000 SBD
promoted0.000 SBD
body_length139
author_reputation46,773,514,128,719
root_title"Calculating correlation between two stocks with Python!"
beneficiaries[]
max_accepted_payout1,000,000.000 SBD
percent_steem_dollars10,000
author_curate_reward""
vote details (1)
@partiko ·
Hello @toalsty! This is a friendly reminder that you have 3000 Partiko Points unclaimed in your Partiko account!

Partiko is a fast and beautiful mobile app for Steem, and it’s the most popular Steem mobile app out there! Download Partiko using the link below and login using SteemConnect to claim your 3000 Partiko points! You can easily convert them into Steem token!

https://partiko.app/referral/partiko
properties (22)
post_id70,629,987
authorpartiko
permlinkpartiko-re-toalsty-calculating-correlation-between-two-stocks-with-python-20190225t225415170z
categorytrading
json_metadata{"app":"partiko"}
created2019-02-25 22:54:15
last_update2019-02-25 22:54:15
depth1
children0
net_rshares0
last_payout2019-03-04 22:54:15
cashout_time1969-12-31 23:59:59
total_payout_value0.000 SBD
curator_payout_value0.000 SBD
pending_payout_value0.000 SBD
promoted0.000 SBD
body_length407
author_reputation39,204,266,552,701
root_title"Calculating correlation between two stocks with Python!"
beneficiaries[]
max_accepted_payout1,000,000.000 SBD
percent_steem_dollars10,000
@steemitboard ·
Congratulations @toalsty! You received a personal award!

<table><tr><td>https://steemitimages.com/70x70/http://steemitboard.com/@toalsty/birthday1.png</td><td>Happy Birthday! - You are on the Steem blockchain for 1 year!</td></tr></table>

<sub>_[Click here to view your Board](https://steemitboard.com/@toalsty)_</sub>


**Do not miss the last post from @steemitboard:**
<table><tr><td><a href="https://steemit.com/carnival/@steemitboard/carnival-2019"><img src="https://steemitimages.com/64x128/http://i.cubeupload.com/rltzHT.png"></a></td><td><a href="https://steemit.com/carnival/@steemitboard/carnival-2019">Carnival Challenge - Collect badge and win 5 STEEM</a></td></tr></table>

###### [Vote for @Steemitboard as a witness](https://v2.steemconnect.com/sign/account-witness-vote?witness=steemitboard&approve=1) and get one more award and increased upvotes!
properties (22)
post_id70,843,256
authorsteemitboard
permlinksteemitboard-notify-toalsty-20190302t142845000z
categorytrading
json_metadata{"image":["https:\/\/steemitboard.com\/img\/notify.png"]}
created2019-03-02 14:28:45
last_update2019-03-02 14:28:45
depth1
children0
net_rshares0
last_payout2019-03-09 14:28:45
cashout_time1969-12-31 23:59:59
total_payout_value0.000 SBD
curator_payout_value0.000 SBD
pending_payout_value0.000 SBD
promoted0.000 SBD
body_length864
author_reputation38,705,954,145,809
root_title"Calculating correlation between two stocks with Python!"
beneficiaries[]
max_accepted_payout1,000,000.000 SBD
percent_steem_dollars10,000