Hany Fahmy

# Rationale

Graduate students engaging in quantitative research need to transform a verbal statement or a process (their research question and hypotheses), $W$, into a set $X$ of concrete measurable variables, $X_1,X_2,X_3,…$, which might have some kinds of relations among them. The student researcher must quantify and test the empirical validity/reality of any relationships that might exist. Therein lies the significance of statistics as a powerful analysis tool for working professionals who are doing action-oriented applied research.

I feel the two best places to start learning about data exploration are descriptive statistics and regression analysis. These two subjects remain among the most popular subjects taught in graduate business programs due to their importance and vast applications in applied research (Rose et al., 1988). My choice of one foundation subject (descriptive statistics) and one application subject (regression analysis) is intended to emphasize the importance of including a proper business statistics sequence in graduate business and social sciences and humanities programs.

# Overview: Descriptive Statistics and Regression Analysis

Once random variables pertaining to a particular process $W$ are defined,[1] the next step is to perform a preliminary analysis, known as descriptive statistics, to present and summarize the data. This description could be in the form of graphical representations of data, e.g., bar charts, pie charts, and histograms, or through summary statistics, e.g., means, standard deviations, or pairwise correlations (Thorndike, 1982). The descriptive and summary statistics reveal patterns among the variables which are often useful, but not sufficient, in hypothesizing the relation among these variables.

Hypothesizing relations among research variables is not an easy task. Although descriptive statistics as a subject is useful in suggesting the type of the relation, e.g., linear or non-linear, the researcher should develop their knowledge of various applied statistical techniques such as: regression analysis, time series analysis or analysis of variance, in order to select the proper way of modelling the relation under consideration. Therein lies the importance of introducing graduate students to applied statistics as part of their quantitative training in graduate programs. In any of these forms of analysis, the student’s intent must be translated into a measurable research question. This process is an active learning exercise that requires a sound thought process from the student and much-needed guidance from the instructor. The student is expected to conduct a thorough literature review on the subject before selecting a particular technique of analysis.

Linear regression analysis is a statistical technique used to estimate linear relationships among variables (Fahmy, 2017; Gujarti, 2003; Johnston and Dinardo, 1997; Pindyck and Rubinfeld, 1998; Studenmund, 2006). My intent here is not to discuss regression analysis per se, but to use the subject to demonstrate how this thought process can be translated into a measurable (and solvable) research question.

## Application: assessing the performance risk of search and rescue operations in Canada’s polar regions

This case addresses a Canadian helicopter company that provides aeronautical and maritime Search and Rescue (SAR) in Canada’s polar regions.[2] To reduce the cost of crew transportation, the company seeks to extend the scheduled tour of duty for crews beyond the normal 14-day duration for a period up to 30 days. The company, however, is keen to maintain a target SAR call-out dispatch time of 29 minutes.[3] To better understand the performance risk of extending tour duty beyond 14 days, the company has tasked its safety, quality, and flight operations department to conduct an empirical study to determine the risk of this enhanced performance on the 29-minute-target SAR call-out dispatch time.

## The thought process: defining the key study variables

This case study, part of a doctoral dissertation by Chris Burt in the School of Business at Royal Roads University, seeks to assess the impact of extending the scheduled on-call duty beyond 14 days. This broad statement is considered the process $W$. The two key variables that map $W$ into a meaningful relation are performance and dispatch time. The latter is properly defined and has a measurable scale, i.e., minutes. The former, however, is too broad and not clearly defined. In this case, the researcher should think of a proxy variable that captures performance. Since SAR performance may be influenced by increasing the duration of the on-call shift beyond 14 days, it is sensible to use the on-call duration in days as a proxy to measure the fatigue level, and hence, performance of the on-call team. Let $D_0=14$ days be the original on-call shift duration. Define $d=D-D_0$ as the number of days, $D$, in excess of $D_0=14$ days. The previous thought process, therefore, suggests two key study variables: The excess on-call shift duration in days, $d$, as a proxy of performance, and the SAR call-out dispatch time in minutes, which will be denoted by $T$. We can now formally specify the research question (RQ) as follows: Will $d$ have a negative impact on $T$? This is the mathematical formalization of the process $W$. The RQ simply examines whether extending the duration of the on-call shift by more than 14 consecutive days will negatively impact the performance of the on-call team, which will result in a SAR call-out dispatch time that exceeds 29 minutes.

The previous thought process (as well as the parameterization exercise and the statistical analysis that will follow) is the added value the instructor brings to the teaching of statistics. Active learning is achieved when students attempt to apply these processes in their own research.

### The parameterization exercise: linear regression

As mentioned above, the result of the thought process is a clear definition of the study variables and the RQ. The next step is to use data and statistical analysis to examine the RQ. At this stage, Descriptive Statistics can be used to deduce patterns and infer relations between the study variables, i.e., $d$ and $T$, in the present context. If a linear relation is suspected, linear regression analysis can then be used to see if a linear model will capture the relation between the variables.

In the present case, one way to capture the causality from $T$ to $d$ is to fit a simple linear regression model, where $T$ acts as a dependent variable (the variable that changes due to changes in the independent variable) and $d$ as the independent variable (the variable that changes due to the circumstances of the study, in this case the extended shift duration).

After specifying the model, the next step is to see if it fits the data. In this example, the data pertains to a particular company and, therefore, is not publicly available. Burt, however, was able to simulate the performance of one SAR crew and record the SAR call-out dispatch time, $T$, every time a call was made and the corresponding number of on-duty days more than 14 and up to 30 days, $d$, pertaining to the crew under study. By doing that, he was able to obtain a sample of 25 observations on $d$ and $T$.

Using this sample data, Burt, following my suggestion, was able to estimate the following equation:

$T_t=29+0.3d_t, t=1,…,n. (1)$

The results in Equation 1 show that when $d_t=0$, dispatch time is predicted to achieve the target of 29 minutes; that is, an on-call crew with an excess shift duration of zero days, is predicted to take 29 minutes to respond to a SAR call. This is precisely the target SAR response time under the standard 14-day shift. The 0.3 estimated value of the parameter next to the variable $d_t$ means an increase in the shift duration by one additional day (in excess of the standard 14 days), will increase the SAR dispatch time by 0.3 minutes (20 seconds). This value is significant, for it represents a quantification of the marginal performance risk as a result of extending the crew shift by one more day, which is the objective of the study.

It is worth noting that quantifying the parameters of the model does not only capture the effect of excess duration on dispatch, but it also provides a concrete way of predicting/forecasting this effect. For instance, it is easy to predict from Equation 1 that an increase in shift duration by 5 days will result in a SAR dispatch time $T$ of $29+(.03×5)=30.5$ minutes. Thus, the risk of this extension is a 90-second delay in dispatch.

# Reflection

During the development and execution of the thought process pertaining to any appropriate applied study, it is essential that the critical thinking stage (the stage where the researcher is contemplating how to approach a research problem) be used to define the key variables of the study and the relation that ought to be tested empirically. This mapping, from a general statement or process to a more precise set of measurable variables, is founded on mathematical reasoning. I find it challenging to teach how to think like a mathematician, for it is a learning-by-doing skill. Nevertheless, only a solid foundation of elementary algebra and a bit of mathematical logic is needed for this process to be successful. You do not need a degree in mathematics to think like a mathematician; however, you need to know the tools of the trade.

Once variables and relations are defined, the next step is to carry on the statistical analysis. A thorough review of the literature on the subject, coupled with mastering the statistical applications, should enable the researcher to identify which technique is suitable to carry on the analysis. Mastering the applications, however, requires clear understanding of the statistical foundations. Hence the importance of the business statistics sequence in graduate programs (Rose et al., 1988).

To sum up, approaching any quantitative study requires the thought process of a mathematician and the actions of a statistician. Acquiring these skills is challenging, especially for graduate students who lack the proper foundations. Despite this challenge, it is not impossible to acquire this knowledge and carry out credible research.

### References

Fahmy, H. (2017). The mathematics of statistical modelling: abstract to specific. Waterloo, ON. HF Consulting. https://viurrspace.ca/handle/10613/13117

Gujarati, D. N. (2003). Basic Econometrics (Fourth Edition). McGraw Hill.

Johnston, J., & Dinardo, J. (1997). Econometric Methods. McGraw Hill.

Pindyck, R. S., & Rubinfeld, D. L. (1998). Econometric Models and Economic Forecasts. McGraw Hill.

Rose, E. L., Machak, J. A., & Spivey, W. A. (1988). A survey of the teaching of statistics in M.B.A programs. Journal of Business and Economic Statistics, 6(2). 273-282.

Studenmund, A. H. (2006). Using Econometrics: A Practical Guide. Pearson, Addison Wesley.

Thorndike, R. M. (1982). Data Collection and Analysis. Gardner Press, Inc.

1. See Chapter 15 for more on the thought process of parameterizing a researchable question.
2. I would like to thank Chris Burt for providing this case as part of his dissertation.
3. SAR call-out dispatch time, also known as "wheels-off time," is a time metric measured in minutes from the time a SAR Dispatch call-out is received from the Rescue Coordination Center to the actual aircraft airborne time

Hany Fahmy is an Associate Professor and the Finance Intellectual Lead of the Faculty of Management at Royal Roads University. His research interests include Energy Finance, Energy Economics, Financial Economics, Financial Econometrics, and Climate Finance and Economics. Fahmy’s work has been published in top tier peer-reviewed journals in economics and finance. His research papers have been accepted and presented at top economics and finance conferences such as the Canadian Economics Association Annual Conference. His work has been featured in the local media and other news outlets.