In research studies, confounding variables influence both the cause and effect that the researchers are assessing. Consequently, if the analysts do not include these confounders in their statistical model, it can exaggerate or mask the real relationship between two other variables. By omitting confounding variables, the statistical procedure is forced to attribute their effects to variables in the model, which biases the estimated effects and confounds the genuine relationship. Statisticians refer to this distortion as omitted variable bias.

[Read more…] about Confounding Variables Can Bias Your Results

# Blog

## Assessing Normality: Histograms vs. Normal Probability Plots

Because histograms display the shape and spread of distributions, you might think they’re the best type of graph for determining whether your data are normally distributed. However, I’ll show you how histograms can trick you! Normal probability plots are a better choice for this task and they are easy to use. Normal probability plots are also known as quantile-quantile plots, or Q-Q Plots for short!

[Read more…] about Assessing Normality: Histograms vs. Normal Probability Plots

## Sample Statistics Are Always Wrong (to Some Extent)!

Here’s some shocking information for you—sample statistics are *always* wrong! When you use samples to estimate the properties of populations, you never obtain the correct values exactly. Don’t worry. I’ll help you navigate this issue using a simple statistical tool! [Read more…] about Sample Statistics Are Always Wrong (to Some Extent)!

## Luck and Statistics: Do You Feel Lucky, Punk?

Luck, statistics, and probabilities go together hand-in-hand. Clint Eastwood, playing Dirty Harry, famously asked a bad guy who was about to reach for his rifle whether he felt lucky. I’m quite sure that the crook carefully pondered the nature of luck, probabilities, and expected outcomes before deciding not to grab his rifle!

A while ago, I did something shocking . . . something that I hadn’t done for several decades. Just like the thief in the Dirty Harry movie, I started thinking about luck. Yes, you guessed it: I bought a lottery ticket for the record-breaking Mega Millions Jackpot. This purchase *is* shocking for someone like me who knows statistics and is fully aware of how unlikely it is to win. Did I feel lucky? Or was I just a punk? [Read more…] about Luck and Statistics: Do You Feel Lucky, Punk?

## Populations, Parameters, and Samples in Inferential Statistics

Inferential statistics lets you draw conclusions about populations by using small samples. Consequently, inferential statistics provide enormous benefits because typically you can’t measure an entire population.

However, to gain these benefits, you must understand the relationship between populations, subpopulations, population parameters, samples, and sample statistics.

In this blog post, I discuss these concepts, and how to obtain representative samples using random sampling.

**Related post**: Difference between Descriptive and Inferential Statistics

[Read more…] about Populations, Parameters, and Samples in Inferential Statistics

## Types of Errors in Hypothesis Testing

Hypothesis tests use sample data to make inferences about the properties of a population. You gain tremendous benefits by working with random samples because it is usually impossible to measure the entire population.

However, there are tradeoffs when you use samples. The samples we use are typically a minuscule percentage of the entire population. Consequently, they occasionally misrepresent the population severely enough to cause hypothesis tests to make errors.

In this blog post, you will learn about the two types of errors in hypothesis testing, their causes, and how to manage them. [Read more…] about Types of Errors in Hypothesis Testing

## Practical vs. Statistical Significance

You’ve just performed a hypothesis test and your results are statistically significant. Hurray! These results are important, right? Not so fast. Statistical significance does not necessarily mean that the results are practically significant in a real-world sense of importance.

In this blog post, I’ll talk about the differences between practical significance and statistical significance, and how to determine if your results are meaningful in the real world.

[Read more…] about Practical vs. Statistical Significance

## The Gauss-Markov Theorem and BLUE OLS Coefficient Estimates

The Gauss-Markov theorem states that if your linear regression model satisfies the first six classical assumptions, then ordinary least squares (OLS) regression produces unbiased estimates that have the smallest variance of all possible linear estimators. [Read more…] about The Gauss-Markov Theorem and BLUE OLS Coefficient Estimates

## 7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression

Ordinary Least Squares (OLS) is the most common estimation method for linear models—and that’s true for a good reason. As long as your model satisfies the OLS assumptions for linear regression, you can rest easy knowing that you’re getting the best possible estimates. [Read more…] about 7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression

## Normal Distribution in Statistics

The normal distribution is a continuous probability distribution that is symmetrical around its mean, most of the observations cluster around the central peak, and the probabilities for values further away from the mean taper off equally in both directions. Extreme values in both tails of the distribution are similarly unlikely.

As with any probability distribution, the normal distribution describes how the values of a variable are distributed. It is the most important probability distribution in statistics because it fits many natural phenomena. For example, heights, blood pressure, measurement error, and IQ scores follow the normal distribution. It is also known as the Gaussian distribution and the bell curve.

In this blog post, learn how to use the normal distribution, about its parameters, the Empirical Rule, and how to calculate Z-scores to standardize your data and find probabilities. [Read more…] about Normal Distribution in Statistics

## Understanding Probability Distributions

Probability distributions are statistical functions that describe the likelihood of obtaining possible values that a random variable can take. In other words, the values of the variable vary based on the underlying probability distribution.

Suppose you draw a random sample and measure the heights of the subjects. As you measure heights, you create a distribution of heights. This type of distribution is useful when you need to know which outcomes are most likely, the spread of potential values, and the likelihood of different results.

In this blog post, you’ll learn about probability distributions for both discrete and continuous variables. I’ll show you how they work and examples of how to use them. [Read more…] about Understanding Probability Distributions

## Interpreting Correlation Coefficients

Correlation coefficients measure the strength of the relationship between two variables. A correlation between variables indicates that as one variable changes in value, the other variable tends to change in a specific direction. Understanding that relationship is useful because we can use the value of one variable to predict the value of the other variable. For example, height and weight are correlated—as height increases, weight also tends to increase. Consequently, if we observe an individual who is unusually tall, we can predict that his weight is also above the average. [Read more…] about Interpreting Correlation Coefficients

## Estimating a Good Sample Size for Your Study Using Power Analysis

Determining a good sample size for a study is always an important issue. After all, using the wrong sample size can doom your study from the start. Fortunately, power analysis can find the answer for you. Power analysis combines statistical analysis, subject-area knowledge, and your requirements to help you derive the optimal sample size for your study.

Statistical power in a hypothesis test is the probability that the test will detect an effect that actually exists. As you’ll see in this post, both under-powered and over-powered studies are problematic. Let’s learn how to find a good sample size for your study! [Read more…] about Estimating a Good Sample Size for Your Study Using Power Analysis

## Measures of Variability: Range, Interquartile Range, Variance, and Standard Deviation

A measure of variability is a summary statistic that represents the amount of dispersion in a dataset. How spread out are the values? While a measure of central tendency describes the typical value, measures of variability define how far away the data points tend to fall from the center. We talk about variability in the context of a distribution of values. A low dispersion indicates that the data points tend to be clustered tightly around the center. High dispersion signifies that they tend to fall further away.

In statistics, variability, dispersion, and spread are synonyms that denote the width of the distribution. Just as there are multiple measures of central tendency, there are several measures of variability. In this blog post, you’ll learn why understanding the variability of your data is critical. Then, I explore the most common measures of variability—the range, interquartile range, variance, and standard deviation. I’ll help you determine which one is best for your data. [Read more…] about Measures of Variability: Range, Interquartile Range, Variance, and Standard Deviation

## Measures of Central Tendency: Mean, Median, and Mode

A measure of central tendency is a summary statistic that represents the center point or typical value of a dataset. These measures indicate where most values in a distribution fall and are also referred to as the central location of a distribution. You can think of it as the tendency of data to cluster around a middle value. In statistics, the three most common measures of central tendency are the mean, median, and mode. Each of these measures calculates the location of the central point using a different method.

Choosing the best measure of central tendency depends on the type of data you have. In this post, I explore these measures of central tendency, show you how to calculate them, and how to determine which one is best for your data.

[Read more…] about Measures of Central Tendency: Mean, Median, and Mode

## Difference between Descriptive and Inferential Statistics

Descriptive and inferential statistics are two broad categories in the field of statistics. In this blog post, I show you how both types of statistics are important for different purposes. Interestingly, some of the statistical measures are similar, but the goals and methodologies are very different. [Read more…] about Difference between Descriptive and Inferential Statistics

## Guide to Data Types and How to Graph Them in Statistics

In the field of statistics, data are vital. Data are the information that you collect to learn, draw conclusions, and test hypotheses. After all, statistics is the science of learning from data. However, there are different types of variables, and they record various kinds of information. Crucially, the type of information determines what you can learn from it, and, importantly, what you cannot learn from it. Consequently, it’s essential that you understand the different types of data. [Read more…] about Guide to Data Types and How to Graph Them in Statistics

## Maximize the Value of Your Binary Data with the Binomial and Other Probability Distributions

Binary data occur when you can place an observation into only two categories. It tells you that an event occurred or that an item has a particular characteristic. For instance, an inspection process produces binary pass/fail results. Or, when a customer enters a store, there are two possible outcomes—sale or no sale. In this post, I show you how to use the binomial, geometric, negative binomial, and the hypergeometric probability distributions to glean more information from your binary data. [Read more…] about Maximize the Value of Your Binary Data with the Binomial and Other Probability Distributions

## Learn How Anecdotal Evidence Can Trick You!

Anecdotal evidence is a story told by individuals. It comes in many forms that can range from product testimonials to word of mouth. It’s often testimony, or a short account, about the truth or effectiveness of a claim. Typically, anecdotal evidence focuses on individual results, is driven by emotion, and presented by individuals who are not subject area experts. [Read more…] about Learn How Anecdotal Evidence Can Trick You!

## The Importance of Statistics

The field of statistics is the science of learning from data. Statistical knowledge helps you use the proper methods to collect the data, employ the correct analyses, and effectively present the results. Statistics is a crucial process behind how we make discoveries in science, make decisions based on data, and make predictions. Statistics allows you to understand a subject much more deeply. [Read more…] about The Importance of Statistics