#FutureSTEMLeaders - Wiingy's $2400 scholarship for School and College Students

Apply Now

AP Statistics

How to Find Z-Score?

Written by Prerit Jain

How to Find Z-Score?

How to Find Z-Score?

What is z-score?

Z-score, in statistics, is a value related to a data point that represents the accuracy or closeness of the said data point to a central tendency, usually, an arithmetic mean, with respect to the standard deviation. Z-score of 0 represents that the data point is equal to the mean. The Z-score can be positive or negative depending on if the data point is greater or smaller than the mean.

Z-score is used by investors and statisticians to scope out good data with a smaller z-score range for a high percentage of data points. Most normally distributed data has a z-score range of -3 to 3 for 99.7%, but investors prefer to use the range -1.5 to 1.5 so as to find more reliable data.

How to calculate the z-score?

Z-score is a number associated with a particular data point in a given data.

Let X = \{ {x_i}:i = 1,2,3, \ldots n,n \in \mathbb{N}\} be a collection of n points that represent a data, with \muas its arithmetic mean and \sigma as its standard deviation. Then we can calculate the z-score,{z_i}, of a data point {x_i} \in X using the following formula,

    \[{z_i} = \frac{{{x_i} - \mu }}{\sigma }\]

Step-by-step procedure to find the z-score of a given data.

Let, X = \{ {x_i}:i = 1,2,3, \ldots n,n \in \mathbb{N}\} represent a data.

Then to find the z-score we first need to find the arithmetic mean \mu using the following formula,

    \[\mu  = \frac{{{\rm{Sum of all the data points}}}}{{{\rm{Number of data points}}}} = \frac{{\sum\limits_{i = 1}^n {{x_i}} }}{n}\]

Once, we have the mean, we can move onto finding standard deviation \sigma given by the formula,

    \[\sigma  = \sqrt {{\rm{Var}}\left( X \right)} \]

Where Var(X) represents the Variance of X given by the formula,

    \[{\rm{Var}}\left( X \right) = \mu \left( {{X^2}} \right) - {\left( {\mu \left( X \right)} \right)^2}\]

Here, \mu \left( X \right){\rm{ and }}\mu \left( {{X^2}} \right) represent the mean of data set X{\rm{ and }}{X^2} and the data set {X^2}is given by, {X^2} = \{ x_i^2:{x_i} \in X\}, i.e., the collection of squares of all the points.

Once, we have the Variance, we can simply find its square root to get the standard deviation.

After we have both mean and standard deviation, we can find the z-score of all the data points in X using the formula,

    \[{z_i} = \frac{{{x_i} - \mu }}{\sigma },\left( {\forall i = 0,1,2,3, \ldots, n} \right)\]

Examples of the z-score in action

Let’s see how do we calculate the z-score of data using the following example.

{x_i}510152025303540
{f_i}356812862

Let’s find the mean and standard deviation of the given data set,

{x_i}{f_i}{f_i}{x_i}x_i^2{f_i}x_i^2
52102550
10550100500
156902251350
2091804003600
25123006257500
3082409007200
35621012257350
4028016003200
\Sigma {f_i} = 50\Sigma {f_i}{x_i} = 1160\Sigma {f_i}x_i^2 = 30750

Now we can find the Mean,

    \[\begin{array}{l}\mu (X) = \frac{{\Sigma {f_i}{x_i}}}{{\Sigma {f_i}}}\\ \Rightarrow \mu (X) = \frac{{1160}}{{50}}\\ \Rightarrow \mu (X) = 23.2\end{array}\]

And the Standard Deviation,

    \[\begin{array}{l}\sigma  = \sqrt {\mu \left( {{X^2}} \right) - {{\left( {\mu (X)} \right)}^2}} \\ \Rightarrow \sigma  = \sqrt {\frac{{\Sigma {f_i}x_i^2}}{{\Sigma {f_i}}} - {{\left( {23.2} \right)}^2}} \\ \Rightarrow \sigma  = \sqrt {\frac{{30750}}{{50}} - 538.24} \\ \Rightarrow \sigma  = \sqrt {615 - 538.24} \\ \Rightarrow \sigma  = \sqrt {76.76} \\ \Rightarrow \sigma  = 8.76\end{array}\]

Thus, we have the values of both mean and standard deviation, and now we can simply substitute the values of {x_i}, \mu, and \sigma in the following formula to find the respective z-score.

    \[{z_i} = \frac{{{x_i} - \mu }}{\sigma }\]

{x_i}{f_i}{z_i}
52-2.07
105-1.51
156-0.94
209-0.36
25120.20
3080.78
3561.35
4021.92

We can also use z-score to find a good distribution.

Let X and Y be two data, with 10 observations each. Let {Z_X}{\rm{ and }}{Z_Y} represent their z-scores given by,

{Z_X} = \{  - 2.17, - 1.32, - 0.81, - 0.56, - 0.21,0.33,0.78,1.22,1.75,2.57\}

{Z_Y} = \{  - 1.97, - 1.32, - 0.85, - 0.42, - 0.17,0.13,0.56,0.76,1.25,1.65\}

Here, we can see that Distribution Y has z-score in the range -2 to 2, while distribution X has z-score range -3 to 3. Thus, Distribution Y is a good distribution.

Advantages and disadvantages of the z-score

Advantages

  1. Z-score gives an insight into the variation of data points in a given data. It also tells us how far a data point is from the mean of the data.
  2. Z-score tells us the spread of the data in terms of standard deviation.
  3. If we have data, its mean and standard deviation. Then we can accurately place a data point based on its z-score alone, we wouldn’t actually need the data point.
  4. Z-scores are standardized thus, we can compare two data points from different data which aren’t exactly related. For example, if a student scored 87 and 84 in two exams, we cannot exactly place which exam did the student better in since the two exams might have different difficulties, but if we are given that the z-score of the two exams is 1.2 and 1.5. Then clearly the student’s performance was overall better since he scored higher than an average student in the second exam.
  5. Z-scores help determines if data has too much or too little variability. A good distribution is said to follow the 68-95-99.7 rule.

Disadvantages

  1. Without standard deviation, we cannot determine exactly how far away a data point with a given z-score is from the mean.
  2. The original data-point values cannot be recovered from the z-score unless we know the mean and the standard deviation of the distribution.
  3. We cannot compute z-scores for the nominal or ordinal types of data.
  4. We calculate the value of z-scores based on the assumption that the data is roughly approximated by the normal distribution. But not all data is normally distributed.

Conclusion

This article gives a brief insight into the concept of z-score. Z-score is a value assigned to a data point in a distribution, that represents how many standard deviations away the data point is from a central tendency, usually the arithmetic mean.

Z-score helps to find the position of a data point in a distribution without actually knowing the value of the data point. Z score also helps find and compare data with different conditions and a number of observations. Although the z-score is calculated on the assumption that the data is a normal distribution which is not always true.

Solved Examples

Example 1: Find the z-scores for the following data:

X = 1, 1, 2, 2, 3, 3, 3, 4, 5, 6

Solution:

First, we need to find the mean of the data.

\begin{array}{l}\mu  = \frac{{1 + 1 + 2 + 2 + 3 + 3 + 3 + 4 + 5 + 6}}{{10}}\\\mu  = \frac{{30}}{{10}}\\\mu  = 3\end{array}

Next, we will find the standard deviation,

{X^2} = 1,1,4,4,9,9,9,16,25,36

\begin{array}{l}\sigma  = \sqrt {\mu \left( {{X^2}} \right) - {{\left( {\mu (X)} \right)}^2}} \\\sigma  = \sqrt {\frac{{114}}{{10}} - {{(3)}^2}} \\\sigma  = \sqrt {11.4 - 9} \\\sigma  = \sqrt {2.4} \\\sigma  = 1.55\end{array}

Now that we have both mean and standard deviation, we can start calculating z score using the formula,

    \[{z_i} = \frac{{{x_i} - \mu }}{\sigma }\]

Then the respective z-scores are,

Z = -1.29, -1.29, -0.64, -0.64, 0, 0, 0, 0.64, 1.29, 1.94

Example 2: In the previous example, what percent of data lies within -1.5 to 1.5 z-score range? Can we call this distribution a good distribution?

Solution:

We have the z-scores of the given data as follows.

Z = -1.29, -1.29, -0.64, -0.64, 0, 0, 0, 0.64, 1.29, 1.94

Here the number of data points lying in the -1.5 to 1.5 z-score range is 9.

Thus 9/10 or 90% of the data lies within the -1.5 to 1.5 z-score range, which is more than sufficient than the criteria for good distribution, i.e., 68%.

Example 3: If the distribution of data is given by the function F(n) = \left\{ {\begin{array}{*{20}{c}}{5n + 2}&{{\rm{if n is odd}}}\\{6n}&{{\rm{if n is even}}}\end{array}} \right., where n represents the position of the data point. Find the function that represents its z-score.

Solution:

Let’s assume that the data has 2k observations, then we have k observations at odd positions and k at even.

Let’s find the mean function.

    \[\begin{array}{l}\mu (2k) = \frac{{\sum\limits_{i = 1}^k {5(2i + 1) + 2}  + \sum\limits_{i = 1}^k {6(2i)} }}{{2k}}\;\;\;\;({\rm{We can replace the n with 2i + 1 for odd and 2i for even}})\\\mu (2k) = \frac{{\sum\limits_{i = 1}^k {10i + 7}  + \mathop \sum \limits_{i = 1}^k 12i}}{{2k}}\\\mu (2k) = \frac{{10\left( {\frac{{k(k + 1)}}{2}} \right) + 7k + 12\left( {\frac{{k(k + 1)}}{2}} \right)}}{{2k}}\\\mu (2k) = \frac{{5k(k + 1) + 7k + 6k(k + 1)}}{{2k}}\\\mu (2k) = \frac{{5k + 5 + 7 + 6k + 6}}{2}\\\mu (2k) = \frac{{11}}{2}k + 9\end{array}\]

Replacing 2k=n, we have the mean function,

    \[\mu (n) = \frac{{11}}{4}n + 9\]

Standard deviation is given by,

    \[\begin{array}{l}\sigma  = \sqrt {\mu ({X^2}) - {{(\mu (X))}^2}} \\\sigma  = \sqrt {\frac{{\sum\limits_{i = 1}^k {{{(10i + 7)}^2}}  + \sum\limits_{i = 1}^k {{{(12i)}^2}} }}{{2k}} - {{\left( {\frac{{11}}{2}k + 9} \right)}^2}} \\\sigma  = \sqrt {\frac{{\sum\limits_{i = 1}^k {100{i^2} + 140i + 49}  + \sum\limits_{i = 1}^k {144{i^2}} }}{{2k}} - {{\left( {\frac{{11}}{2}k + 9} \right)}^2}} \\\sigma  = \sqrt {\frac{{100\left( {\frac{{k(k + 1)(2k + 1)}}{6}} \right) + 140\left( {\frac{{k(k + 1)}}{2}} \right) + 49k + 144\left( {\frac{{k(k + 1)(2k + 1)}}{6}} \right)}}{{2k}} - \left( {\frac{{121}}{4}{k^2} + 11k + 81} \right)} \\\sigma  = \sqrt {\frac{{100}}{{12}}\left( {(k + 1)(2k + 1)} \right) + \frac{{140}}{4}(k + 1) + \frac{{49}}{2} + \frac{{144}}{{12}}\left( {(k + 1)(2k + 1)} \right) - \frac{{121}}{4}{k^2} + 11k + 81} \\\sigma  = \sqrt {\frac{{50}}{3}{k^2} + 25k + \frac{{25}}{3} + 35k + 35 + \frac{{49}}{2} + 24{k^2} + 36k + 12 - \frac{{121}}{4}{k^2} - 11k - 81} \\\sigma  = \sqrt {\frac{{125}}{{12}}{k^2} + 85k - 7} \end{array}\]

Replacing 2k = n,

    \[\sigma (n) = \sqrt {\frac{{125}}{{48}}{n^2} + \frac{{85}}{2}n - 7} \]

Then, the function for z-score, is given by,

    \[Z(n) = \left\{ {\begin{array}{*{20}{c}}{\frac{{9n - 28}}{{4\sqrt {\frac{{125}}{{48}}{n^2} + \frac{{85}}{2}n - 7} }}}&{{\rm{if n is odd}}}\\{\frac{{13n - 36}}{{4\sqrt {\frac{{125}}{{48}}{n^2} + \frac{{85}}{2}n - 7} }}}&{{\rm{if n is even}}}\end{array}} \right.\]

Frequently asked questions (FAQs)

What is a good distribution?

A1. A good distribution is defined as the distribution, in which at least 68% of the data lies within the -1.5 to 1.5 range of z-score, i.e., 68% or more data points have z-score in the range (-1.5, 1.5).

How is the z-score used by investors to their benefit?

Investors and statisticians use the z-score to compare data from different companies over the past few years. This is done by checking for their stock market value changes and charting each, they then find the data with high mean, low variability, and the data with dense z-score around the mean, i.e., they can check for percent of data points in a certain z-score range. After doing the calculations, the company that is giving a high average with a dense z-score range and low variability is assumed to be the most profitable and good for investment.

What is the 68-95-99.7 rule?

A large data set, normal distribution, usually has 99.7% of data points z-score in the range -3 to 3, 95% in the range -2 to 2, and 68% in the range -1.5 to 1.5.

What is a good z-score?

Most normal distributions have 99.7% of data points with the z-score within the range of -3.0 to 3.0. But a higher or lower z-score is not necessarily bad. It all depends on the context and the type of data.
For example, in an exam students’ scores are recorded. Then the student with a high (positive) z-score scored above the average scores of all the students which is a good z-score for this context.
Another example, we have investors looking for companies with data with high numbers of data points z-score within the range -1.5 to 1.5, because too high or low a z-score represents the market isn’t stable and it’s not good for investment.

Can we calculate the z-score about other central tendencies? If yes, then why do we prefer to mean over the others?

Yes, we can find a z-score about other central tendencies. Mode and Median are the two other central tendencies that aren’t as reliable in practicality as Mean. Mean is a number that defines the average of the data, and it is always representing the central point of the data, more or less, whereas Mode can occur at extreme ends of the data and the same for the median, if one side of the data has high density, then median tends to shift to that side.

References

  1. Kindness, D. (2022, December 21). How to Calculate Z-Score and Its Meaning. Investopedia. Retrieved January 13, 2023, from https://www.investopedia.com/terms/z/zscore.asp
  2. Qadri, M. M. (2022, July 24). Advantages and Disadvantages of Z Scores – All Things Statistics. Things Statistics. Retrieved January 13, 2023, from https://allthingsstatistics.com/miscellaneous/advantages-disadvantages-z-scores/

Written by

Prerit Jain

Share article on

tutor Pic
tutor Pic