Linear Regression – Natural Gas Prices and Weather

I used to be a trader and scheduler at an energy company that supplied natural gas to residential and commercial customers.  I knew nothing about commodity trading before joining the firm, but quickly learned that it was a complex pricing process when combining financial instruments with physical products.  We were a small company and handled all our customer care and marketing in house, so when asked what goes into natural gas pricing, our blanket answer would be, “weather, supply, and demand”.  Natural gas is mostly used for industrial purposes, to generate electricity, or to heat homes and businesses, which is seasonal and weather dependent.

Plotted in the chart below are the daily spot prices of Henry Hub, the main pricing point for the North American natural gas market.  The sudden spike you see in January 2018 was, in part, caused by extreme cold weather.




As I developed my data science skills at Metis, I wanted to quantify how much weather really impacted natural gas prices.  To do this, I first had to gather some data.  I got natural gas spot prices from the past three years (Oct 1, 2015 through Sep 30, 2018) through the Wall Street Journal for 5 natural gas hubs.

I compiled this with daily weather from the nearest physical locations using Dark Sky API.  I pulled the daily high and low temperatures (in Fahrenheit), and if there was any, the type of precipitation and the amount of snow.

Natural Gas Hub Nearest Physical Location
Henry Hub Erath, Louisiana
Transco Zone 3 Beauregard Parish, Louisiana
Transco Zone 6 NY Linden, New Jersey
Panhandle East Haven, Kansas
Opal Lincoln County, Wyoming

Continue reading “Linear Regression – Natural Gas Prices and Weather”

MTA Turnstile Analysis

Our first project at Metis was assigned on the first day of class, and was due at the end of the week.  Fortunately, this exploratory data analysis project was also the only one to be done with a group, and the dataset to be used was a popular one.

Setting the Scene

Our group was to consult for a fictional non-profit organization that is fundraising and raising awareness for a “women in tech” gala in early summer, by stationing teams outside of subway stations to pass out flyers and gather emails.  We used the MTA’s turnstile data to determine the most popular stations during which times of the day.  The project was great practice for cleaning a large and quirky dataset.

In addition, we cross referenced the stations with the most foot traffic with New York Census data to determine which neighborhoods had 1) a higher average income, as we were soliciting for donors, 2) a higher ratio of females to males, as we were looking for both supporters and participants of women in tech, and 3) a larger proportion of transit takers, as we needed to actually come across these people.

Finally, we also researched which areas of Manhattan contained a higher concentration of tech jobs and companies.

union square signage
Photo by Ana Paula Nardini on

Continue reading “MTA Turnstile Analysis”