Regression Analysis for Business

AI Code for Business


Regression analysis in business is an approach used to discover the statistical relations among two or more independent and dependent variables. In other words, one variable is independent and its effect on the opposite structured dependent variables can be measured with regression analysis. There exists simple or multiple type regressions. If there is only one dependent and independent variable, or predictor variable, we say that it is a simple regression. Contrary, when there are many independent variables influencing one dependent variable, we call that multiple regression.

Why Regression Analysis is Important

Regression analysis is all about getting computers to comprehend data. It helps businesses to understand the data/informational points that they have and their potential usage – explicitly the relations among data points. Such analysis leads to make better business decisions. It can benefit anything from predicting sales to understanding inventory levels and also supply and demand. Of all the business analysis techniques in machine learning, regression analysis is often referred to as one of the most significant. Business analysts and data professionals are frequently the ones that employ regression analysis as it helps them by extracting the relevant data, and from this,create reports for organizations department heads, management teams, sales units, board members, or anyone looking for significant data to guide or support decisions. The analysis is used to understand all kinds of patterns that pop up in the data. The new derived insights can be extremely valuable in understanding what can make a measurable difference in your business.

Benefits

Predictive Analytics

Organizations are turning to predictive analytics to help solve difficult problems and uncover new opportunities and risks. Predictive or Demand analytics are also used to determine customer responses or purchases, as well as promote cross-sell opportunities. Predictive models help businesses attract, retain and grow their most profitable customers. However, demand is not the only dependent variable when it comes to business prediction. The analysis can go far beyond foretelling the impact on direct revenue. For example, forecasting can be used to determine the number of shoppers who will pass in front of a particular billboard. The data is then useful to estimate the maximum to bid for an advertisement on that spot. Airlines use predictive analytics to set ticket prices. Hotels try to predict the number of guests for any given night to maximize occupancy and increase revenue. Insurance companies rely heavily on regression analysis to estimate the credit standing of policyholders as well as the possible number of claims in a given time period. Hence, predictive analytics enables organizations to gain insights from their data and improve their efficiencies.

Operational Efficiency

Operational efficiency is primarily a metric that measures the efficiency of profit earned as a function of operational cost. Understanding the relationships between business happenings and variables related to business operations is exceedingly important.  Regression models are used regularly to optimize business processes. A factory manager, for example, can create a statistical model to understand the impact of oven temperature on the shelf life of the cookies baked in those ovens. In a call center, we can analyze the relationship between wait times of callers and number of complaints. The resulting data-driven analysis is helpful in eliminating guesswork, hypothesis, and corporate politics from the decision making. And so, by highlighting the areas that have the maximum impact on operational efficiency and revenue, business performance is enhanced.

Example: Bicycle Traffic Prediction with Regression Analysis

For this exercise, we will combine a bicycle count with a weather dataset to determine the extent to which weather and seasonal factors —temperature, precipitation, and daylight hours — affect the volume of bicycle traffic through a corridor. The bicycle dataset is the Fremont Bridge Bicycle Counter made publicly available by the city of Seattle, WA. And the corresponding weather dataset is obtained from NOAA which makes available daily weather station data.


Follow the steps below and run the code on the colab notebook linked here. (To run the code, click on the round ▶️ next to each cell)

Cell 1: Imports the python libraries needed.

import pandas as pd

from pandas.tseries.holiday import USFederalHolidayCalendar

import numpy as np

import matplotlib.pyplot as plt

from sklearn.linear_model import LinearRegression

Cell 2: Downloads a copy of the datasets, unzips, and loads the bicycle and weather datasets into pandas dataframe variables.

!gdown --id 1vog09z5NCXqTpU-2vEyQt82m8v481HwD

!unzip Fremont_Bridge_Bicycle_Weather_Data.zip


counts = pd.read_csv('Fremont_Bridge_Bicycle_Counter.csv', index_col='Date', parse_dates=True)

weather = pd.read_csv('2942484.csv', index_col='DATE', parse_dates=True)

Cell 3: Computes the total daily bicycle traffic and put this into its own dataframe. Display the dataframe.

daily = counts.resample('d').sum()

daily['Total'] = daily.sum(axis=1)

daily = daily[['Total']]

daily

Cell 4: As the patterns of use generally vary from day to day, we identify the day of the week from the date and add this info to the dataframe with a binary denotation. Display the updated dataframe.

days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']

for i in range(7):

    daily[days[i]] = (daily.index.dayofweek == i).astype(float)

daily

Cell 5: Similarly, we expect riders to behave differently on holidays and account for this with an additional indicator. Display the updated dataframe.

cal = USFederalHolidayCalendar()

holidays = cal.holidays('2012', '2022')

daily = daily.join(pd.Series(1, index=holidays, name='holiday'))

daily['holiday'].fillna(0, inplace=True)

daily

Cell 6: Another factor to take into consideration is the number of daylight hours on any given day. We make use of the standard astronomical calculation, add this info to the dataframe, and graph the results. As you can see, for the Fremont Bridge location in Seattle, WA, the amount of daylight is cyclical from a low of ~8 hours (winter) to ~16 hours for summer.

def hours_of_daylight(date, axis=23.44, latitude=47.61):

    """Compute the hours of daylight for the given date"""

    days = (date - pd.datetime(2000, 12, 21)).days

    m = (1. - np.tan(np.radians(latitude))

         * np.tan(np.radians(axis) * np.cos(days * 2 * np.pi / 365.25)))

    return 24. * np.degrees(np.arccos(1 - np.clip(m, 0, 2))) / 180.


daily['daylight_hrs'] = list(map(hours_of_daylight, daily.index))

daily[['daylight_hrs']].plot(figsize=(16,7))

plt.ylim(8, 17)

Cell 7: Next, we can add the average temperature and total precipitation to the dataframe. In addition to the inches of precipitation, we add a flag that indicates whether a day is dry (has zero precipitation). Display the updated dataframe.

# temperatures are in 1/10 deg C; convert to C

weather['TMIN'] /= 10

weather['TMAX'] /= 10

weather['Temp (C)'] = 0.5 * (weather['TMIN'] + weather['TMAX'])


# precip is in 1/10 mm; convert to inches

weather['PRCP'] /= 254

weather['dry day'] = (weather['PRCP'] == 0).astype(int)


daily = daily.join(weather[['PRCP', 'Temp (C)', 'dry day']])

daily

Cell 8: In this step, a counter is added starts increasing from day 1 which allows measuring how many years have passed. This will allow observation of annual increases or decreases in daily crossings. Display the first 5 rows of the updated dataframe.

daily['annual'] = (daily.index - daily.index[0]).days / 365

daily.head()

Cell 9: The dataframe is prepared for linear regression by dropping any rows with null values. The model is then fitted with the X (independent) variables and Y (dependent) variable.

# Drop any rows with null values

daily.dropna(axis=0, how='any', inplace=True)


column_names = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun', 'holiday',

                'daylight_hrs', 'PRCP', 'dry day', 'Temp (C)', 'annual']

X = daily[column_names]

y = daily['Total']


model = LinearRegression(fit_intercept=False)

model.fit(X, y)

Cell 10: Using the constructed multiple regression model, we perform a prediction and graph this in comparision with the total actual bicycle count for said date.

daily['predicted'] = model.predict(X)

daily[['Total', 'predicted']].plot(alpha=0.5, figsize=(16,10));


The generate an up-to-date comparison, download and use the latest bicycle count and weather data using the links provided above.


Conclusion

The above exercise provides an overview of how a multiple regression machine learning model can reliably predict the number of riders on a given day and show how different conditions and parameters can affect ridership. In a broad sense, regression analysis is applicable across the board to all types of companes and their functions.

Check out the other articles to see more applications and related code on maching learning. If you need support and would like to find out more, get in touch with the contact link.