How to Calculate Precision of Data
Today, more than at any other time in history, small businesses and large businesses alike have access to staggering amounts of data. Using website analytics software, you can get hard data on customer buying behaviors, where they go on your website, what they like and what they do not like. Using client relationship management (CRM) software, you can look even deeper at customer data to determine what products or services they may be interested in next year based on their behavior today.
However, with this vast amount of data and the potential to use it for making business decisions, it is more important than ever to understand how precise and accurate your data is. Without understanding standard deviations or margins of error, you could find yourself making decisions on the wrong data.
The words accuracy and precision are often used interchangeably; however, they do in fact refer to two different things. Accuracy refers to how close a measured value is to a standard value or a known value. Precision refers to how close two or more measurements are to each other.
Accuracy and precision are independent of each other. It is quite possible to be precise but inaccurate. For example, if a golfer hits five balls at the same hole and they all land on the green, his shots would be accurate. If every ball landed within a few inches of each other in the water, his shots would be precise but inaccurate. If each ball landed within a few inches of the hole, the shots would be both accurate and precise.
Rarely, and only in specific cases, will you have sufficient data on every single person in a given population in order to make conclusions about them. For example, if you are analyzing your website traffic, your analytics software should give you information from every visitor so that you can determine exactly how long the average visitor stays on your website. However, if you want to know why they stayed, or why they left, or what would have caused them to make a purchase, your website software will not have those answers. You will need to contact them.
Not every customer will reply to your questions, but those who do will give you a sample of customers from which you can make inferences about your entire customer base. The same principle is used by pollsters. Instead of contacting every voter in the United States, they contact a sample of voters and infer, with varying degrees of precision, how the entire population would respond to the same questions.
Whenever you use samples to gather data, there are two areas that can vary in their precision. Because data from any individual can fall over a range of different values, a sample is best described by using an average (mean) value. This average can vary depending on circumstances and concepts applied.
As an example, suppose you asked 10 people to rate a product's quality using a scale of one to 10. Five people gave it a one and five people gave it a 10. The average score would be 5.5 and, if it was not given with any additional explanation, the average score would suggest customers felt the product quality was average, which is not the case at all. To gauge the precision of the average compared to individual responses, you should calculate the standard deviation.
Similarly, a sample of 10 people is quite small if you have thousands of customers who did not complete the survey. Calculating the precision of the sample compared to the population as a whole is called the standard error. In statistics, 25 is the generally-accepted minimum amount for a sample to be reflective of the population, as long as the sample population is selected in an unbiased fashion.
A standard deviation can range from zero to infinity. The lower the standard deviation is, the more precise the average is compared to individual results. For example, if everyone in a survey gave the same answer, no single answer would deviate from the average at all, so the standard deviation would be zero. In the example above, where half of the customers gave the product a perfect score of 10 and half gave it the lowest score of one, the average of 5.5 would have a standard deviation of 4.5.
Calculating Standard Deviation in Excel
The standard deviation formula is quite complicated to do manually, but most spreadsheet programs, including Microsoft Excel, can calculate it for you with just a few clicks.
- To begin, put all of your data in a single column in an Excel worksheet.
- Next, have Excel calculate the mean average. Click the cell below your data and then click the arrow beside the AutoSum button in the Home ribbon and select Average.
- Click another empty cell where you want the standard deviation to appear. Click the arrow beside the AutoSum button again and select More Functions. Type STDEV in the Search field and then select STDEV.S (the .S stands for sample). A dialog box opens with three empty fields.
- Click the first empty field and drag the cursor over the cells containing the raw data to select them. Then, click the second empty field and click the cell containing the average. Click OK. Excel calculates the standard deviation for your data set based on the calculated average.
The standard error of the mean (SEM) represents how accurate the sample mean is from the population mean, while the standard deviation represents the spread of the data from the mean. The formula for calculating standard error is somewhat easier than the formula for calculating standard deviation. The SEM is equal to the standard deviation divided by the square root of the number of values in your dataset. In our current working example, there were 10 responses, so the square root of 10 is 3.16. Dividing the standard deviation of 4.5 by 3.16, you get a standard error of 1.42.
If there were 20 responses instead of 10, with half giving a perfect score and half giving the lowest score, the mean average would still be 5.5 and the standard deviation would still be 4.5, but the standard error, based on the square root of 20, would be smaller, rounding down to one.
Calculating SEM in Excel
The relative simplicity of the SEM formula is the good news. The bad news is that Excel does not have an SEM function, so you will have to do it yourself. To do this, you need three functions:
- STDEV.S calculates the standard deviation of your sample compared to the mean average.
- SQRT calculates the square root.
- COUNT literally counts the number of data points in your sample.
The Excel formula you would enter is: = STDEV.S(sample)/SQRT(COUNT(sample)) where "sample" are the cells containing your sample data, such as A1:A20, etc.
If you have already calculated the standard deviation as in the example in the section above, you can apply the other two functions to it in a new cell. For example, if your STDEV.S formula is already in place in cell A21, then you would enter: =A12/ SQRT(COUNT(A1:A20))
Margin of error, which is also known as the confidence interval, tells you within how many percentage points you can expect your calculation to be reflective of the true population. The smaller the margin of error is, the less your sample can vary from the whole population. For example, if you asked a sample of customers if they'd buy from you again and you had a 75 percent response as "yes," with a margin of error of 5%, then you could expect between 70 and 80 percent of all of your customers would buy from your company again.
To calculate the margin of error is a complex process, even if you are using Excel. To simplify matters, there are many online margin of error calculators you can use for free. There are three numbers you need to know:
- The size of your sample.
- The size of the entire population.
- The confidence level for your sample (standard is 95 percent).
The confidence level essentially means that if you took all the possible samples from your entire population, the confidence level is the number of times your average would work out to the same number. Fortunately, in the world of marketing, this is not something you normally need to calculate: The industry standard is 95 percent.
With the margin of error involved, you get a better description of your data and estimates. For example, with a confidence level of 95% and margin of error of 6%, you can expect your statistics to be within a 6% range of your sample statistic in 95% of samples.
Example of Margin of Error
Suppose you have 5,000 customers who bought a product from you in the past year and you want to get their opinion on a new logo you have designed. You send them all an email and after one hour, 12 respond saying they prefer it, while eight respond saying they like your old logo better. This means that 60% like the new logo better.
At first, this may seem like good news; however, you have not calculated how precise this result is in light of the margin of error. With a sample of 20 people from a population of 5,000 and a confidence level of 95%, you have a margin of error of 22%. This means that if all of your customers responded, the result would be anywhere between 38% and 78% like the new logo better. It would probably be better to wait a few more hours until you get a larger sample.
The larger your sample size is, the more precise your data will be when compared to the whole population. While a sample of 20 people from 5,000 (at a 95% confidence level) has a margin of error of 22%, increasing that sample drastically reduces the margin of error. For example:
- A sample of 25 of 5,000 has a margin of error of 20%.
- A sample of 50 of 5,000 has a margin of error of 14%.
- A sample of 100 of 5,000 has a margin of error of 10%.
- A sample of 500 of 5,000 has a margin of error of 4%.
- A sample of 1,000 of 5,000 has a margin of error of 2%.
In many instances, like when you are sending emails to your own clients, getting the largest sample size possible makes the most sense. However, if you are paying money to gather data, or if you are doing a survey and paying for ads to get respondents, you will want to find a balance that gives you a good margin of error for the lowest investment.
While all of the above techniques will help you to measure the precision of your data, it is always important to remember that this will not necessarily mean your data is accurate. There are innumerable factors that could influence the accuracy of your data.
The age of your data can be a cause for inaccuracy. If there was a drastic change in the economy last month, for example, any data you have from last year on consumer buying habits may now be highly inaccurate. How you word questions in a survey, where you approach people and what time you approach them can also affect the accuracy of your data. Imagine doing a survey on consumer confidence the day before a stock market crash.
As another example, suppose you decided to send a survey to your clients asking them about their opinion on your new organic food products. In appreciation for their time, you offer them a $10 coupon to the local farmers market. While this may be a kind gesture, you need to realize that you have immediately skewed your result to favor people who would go to the farmers market. Those who may not want to spend more for organic foods may not bother answering your survey, making it biased and inaccurate.