Handbook of statistical distributions
Following the steps of Section 1. For these cumulative probabilities, standard normal quantiles are computed, and they are given in the fourth column of Table If the data are from a normal popula- tion, then the pairs x j , z j will be approximately linearly related.
The plot of the pairs Q—Q plot is given in Figure The Q—Q plot is nearly a line suggesting that the data are from a normal population. If a graphical technique does not give a clear-cut result, a rigorous test, such as Shapiro—Wilk test and the correlation test, can be used. We shall use the test based on the correlation coefficient to test the normality of the exposure data. At the level 0.
Since the observed correlation coefficient is larger than the critical value, we have further evidence for our earlier conclusion that the data are from a normal population. Tail Probabilities, Percentiles, and Moments. Test and Confidence Interval for the Mean [Section Power of the t-test [Section Test and Confidence Interval for the Variance [Section Two-Sample t-test and Confidence Interval [Section Power of the Two-Sample t-test [Section Tolerance Intervals for Normal Distribution [Section Tolerance Intervals Controlling both Tails [Section Simultaneous Tests for Quantiles [Section Example Click [Mean].
Click [Std Dev]. To compute moments: Enter the values of the mean and standard deviation; click [M] button. Assume that the life hours distribution is normal. Find the percentage of bulbs that will last at least h. Find the percentage of bulbs with lifetime between and h. Find the 90th percentile of the life hours. To find the 90th percentile, enter for the mean, for the stan- dard deviation, and 0. If the store stocks sacks every week, find the percentage of weeks that the store has overstocked onions.
Let X denote the weekly demand. We need to find percentage of the weeks the demand is less than the stock. Over a long period of time, it was found that the average amount packed was 3 lb with a standard deviation of 0. Assume that the weights of the packages are normally distributed. Find the percentage of packages weighing more than 3. Solution: Let X be the actual weight of a randomly selected package. Then, X is normally distributed with mean 3 lb and standard deviation 0.
To find the percentage, enter 3 for the mean, 0. To get the value of the mean, enter 0. That is, the machine needs to be set at about 3. A bolt is useable if it is 3. Inspection of a sample of 50 bolts revealed that the average length is 3. Assume that the distribution of lengths is normal. Find an approximate proportion of bolts is useable. Find an approximate proportion of bolts that are longer than 4. Find an approximate 95th percentile of the lengths of all bolts.
Since they are unknown, we can use the sample mean and standard deviation to find approximate solutions to the problem. The proportion of bolts useable is given by P 3. To find an approximate 95th percentile, enter 3. Click [p-values for] to get the p-values for various alternative hypotheses. That is, the p-value for testing above hypotheses is 0. Now, the null hypothesis cannot be rejected at the level of significance 0.
The value of the t-test statistic for this problem is 1. Click [Sample Size for]. From the past study, he learned that the population standard deviation is 1. He decides to use one-sample t-test, and wants to determine the sample size to attain a power of 0. That is, the interval 2. This means that the interval 2. Some Illustrative Examples The following examples illustrate the one-sample inferential procedures for a normal mean. Assume that the incomes follow a normal distribution. Click [2-sided] to get However, to understand the significance of the result, we formulate the following hypothesis testing problem.
Click [p-values for] to get 0. The manufacturer decides to test if the new method really increases the mean life hour of the bulbs. How many new bulbs should he test so that the test will have a power of 0. Click [Sample Sizes for] to get Thus, nineteen bulbs should be manufactured and tested to check if the new method would increase the average life hours of the bulbs.
Click [1-sided] to get 8. That is, the interval 8. The interval 0, This means that the interval 7. Since this p-value is not less than 0. We conclude that the summary statistics do not provide sufficient evidence to indicate that the true population variance is greater than 9.
Does this sample standard deviation indicate that the actual standard deviation of all the screws produced during that day is less than 0. Note that the sample variance is 0. To compute the p-value for the above test, enter 27 for the sample size, 0.
The computed p-value is 0. Since this p-value is not smaller than any practical level of significance, we can not conclude that the standard deviation of all the screws made during that day is less than 0. He selected a sample of 18 plants for the study. After the harvest, he found that the mean yield was Click [2-sided] to get 7. Thus, the true variance of tomato yields is somewhere between 7. When H0 is true, then the F statistic in Let F0 be an observed value of the F in A sample of 11 observations from another normal population yielded a variance of 2.
Click [1-sided] to get 0. This means that the interval 0. Click [2-sided] to get 0. To compute the p-value, enter the summary statistics in the dialog box, and click on [p-values for] to get 0. Since the p-value is greater than 0. In practice, the equality of the variances is tested first using the F test given above. If the assumption of equal- ity of variances is tenable, then we use the two-sample t procedures see Section Remark In general, many authors suggested using the Welch test when the variances are unknown.
Nevertheless, for the sake of completeness and illustrative purpose, we consider both approaches in the sequel. Let t20 be an observed value of t2. Since the inferential procedures given in this section are appropriate only when the population variances are equal, we first want to test that if the variances are indeed equal see Section The test for equality of variances yielded a p-value of 0.
Since the p-value is less than 0. Click [Power]. To compute the power when each sample size is 27, enter 0. Sample Size Calculation: In practical applications, it is usually desired to com- pute the sample sizes required to attain a given power. This can be done by a trial-error method. Suppose in the above example we need to determine the sample sizes required to have a power of 0.
By trying a few sample sizes more than 27, we can find the required sample size as 32 from each population. In this case, the power is 0. The following examples illustrate the inferential procedures for the difference between two normal means.
The summary statistics are given in the following table. Male Female sample size 23 19 mean Do these summary statistics indicate that the average salaries of the male programmers higher than that of female programmers? Solution: Since the salaries are from normal populations, a two-sample proce- dure for comparing normal means is appropriate for this problem.
Furthermore, to choose between the two comparison methods one assumes that the popula- tion variances are equal and the other is not , we need to test the equality of the population variances.
Therefore, the assumption that the variances are equal is tenable, and we can use the two-sample t procedures for the present problem. To compute the p-value for the above test, enter the sample sizes, means and standard deviations, click [p-values for] to get 2. Since this p-value is much less than any practical levels, we reject the null hypothesis, and conclude that the mean salaries of male programmers is higher than that of female programmers. This approximate method is commonly used, and the results based on this method are very accurate even for small samples.
Specifically, this dialog box computes confidence intervals and p-values for hypothesis testing about the difference between two normal means when the population variances are unknown and arbitrary. Therefore, we should use the approximate degrees of freedom method described above. To get one-sided limits, click [1-sided]. To compute the p-values using StatCalc, click [p-values for] to get 0. Thus, we can not conclude that the means are significantly different.
An explicit expression for k is not available and has to be computed numerically. The one-sided tolerance limits have interpretation similar to that of the two-sided tolerance limits. To compute the factors, enter 23 for [Sample Size n], 22 for [DF], 0. For example, if X follows a lognormal distribution, then ln X follows a normal distribution.
Therefore, the factors given in the preceding sections can be used to construct tolerance in- tervals for a lognormal distribution. Specifically, if the sample Y1 ,. In many practical situations one wants to assess the proportion of the data fall in an interval or a region. For example, engineering products are usually required to satisfy certain tolerance specifications. The proportion of the prod- ucts that are within the specifications can be assessed by constructing a suitable tolerance region based on a sample of products.
In order to save time and cost, typically a sample of items is inspected and a. In some situations, each item in the lot is required to satisfy only the lower specification.
In this case, a. If the lower tolerance limit is greater than or equal to L, then the lot will be accepted. Specifically, if the upper tolerance limit based on exposure measurements from a sample of employees is less than a per- missible exposure limit PEL , then it indicates that a majority of the exposure measurements are within the PEL, and hence exposure monitoring might be reduced or terminated until a process change occurs.
The tolerance factor for a. Using these numbers, we compute the tolerance interval as The one-sided upper limit is The one-sided lower tolerance limit is Tails] uses an exact method due to Owen for computing the tolerance factor k satisfying the above probability requirement.
Also, for this example, the sample size is 15, the sample mean is To compute the. We also observe that this equal-tail tolerance interval is wider than the tolerance interval Owen pointed out an acceptance sampling plan where a lot of items will be accepted if the sample provides evidence in favor of the alternative hypothesis given below: H0 : Hac vs. Test for Quantiles] uses an exact method due to Owen for computing the factor k satisfying the above probability requirement. We would like to test if the lower 2.
That is, our H0 : Hac vs. To find the factor k, enter 15 for the sample size, 0. To get the limits, click on [2-sided] to get Thus, we have enough evidence to conclude that the lower 2. The approximate tolerance limit in The data are given in the following table.
X1j X2j X3j X4j X5j 2. The above formulas and the method of constructing one-sided tolerance lim- its are also applicable for the balanced case i. Let U1 and U2 be independent uniform 0,1 random variables. Let Z be a standard normal random variable. Let X and Y be independent normal random variables with common vari- ance but possibly different means.
Then, U and V are independent normal random variables. Gamma: Let Z be a standard normal random variable. For more results and properties, see Patel and Read Then X1 and X2 are independent N 0, 1 random variables. There are several other methods available for generating normal random numbers see Kennedy and Gentle , Section 6.
The above Box—Muller transformation is simple to implement and is satisfactory if it is used with a good uniform random number generator. The following algorithm due to Kinderman and Ramage ; correction Vol.
For better accuracy, double precision may be required. The output x is a N 0, 1 random number. The above method is supposed to give 14 decimal accurate probabilities. The following Fortran function subroutine for evaluating the standard normal cdf is based on the above computational method. An infinite series expression for the cdf is given in Section Plots in Figure For the degrees of freedom greater than ,, a normal approximation to the chi-square distribution is used to com- pute the cdf as well as the percentiles.
To compute percentiles: Enter the values of the degrees of freedom and the cumulative probability, and click [x].
To compute the df: Enter the values of the cumulative probability and x, and click [DF]. To compute moments: Enter the value of the df and click [M]. Specifically, if X1 ,. See Section 1. F and Beta: Let X and Y be independent chi-square random variables with degrees of freedoms m and n, respectively. Beta: If X1 ,. Laplace: See Section Uniform: See Section 9.
Let Z denote the standard normal random variable. The approximation b is satisfactory even for small n. Algorithm For this reason, the F distribution is also known as the variance ratio distribution. We observe from the plots of pdfs in Figure Moment Generating Function: does not exist. To compute percentiles: Enter the values of the degrees of freedoms and the cumulative probability; click [x]. To find this value, enter other known values in appropriate edit boxes, and click on [Den DF].
To compute moments: Enter the values of the numerator df, denominator df, and click [M]. Fm,n,p Binomial: Let X be a binomial n, p random variable.
This approximation is satisfactory only when both degrees of freedoms are greater than or equal to This approximation is satisfactory even for small degrees of freedoms.
For other degrees of freedoms, algorithm for evaluating the beta distribution can be used. We observe from the plots that for large n, tn is distributed as the standard normal random variable. Series expansions for computing the cdf of tn are given in Section To compute probabilities: Enter the value of the degrees of freedom df , and the observed value x; click [x].
To compute percentiles: Enter the value of the degrees of freedom, and the cumulative probability; click [x]. The percentiles of X are useful for constructing simultaneous confidence intervals for the treatment effects and orthogonal estimates in the analysis of variance, and to test extreme values.
Once the null hypothesis is rejected, it may be desired to estimate all the treatment effects simultaneously. To compute probabilities: Enter the values of the number of groups k, df, and the observed value x of X defined in To compute percentiles: Enter the values of k, df, and the cumulative probabil- ity; click [x]. The required critical point is 2. The t distribution is symmetric about 0. Let X and Y be independent chi-square random variables with degrees of freedoms 1 and n, respectively.
XY Relation to beta distribution: see Section Let X denote the waiting time until the first event to occur. To compute percentiles: Enter the values of a, b, and the cumulative probability; click [x]. To compute other parameters: Enter the values of the cumulative probability, one of the parameters, and a positive value for x; click on the parameter that is missing.
To compute moments: Enter the values of a and b; click [M]. Weibull: See Section Extreme Value Distribution: See Section Geometric: Let X be a geometric random variable with success probability p. The distribution defined by It should be noted that The gamma distribution with a positive integer shape parameter a is called the Erlang Distribution.
The gamma probability density plots in Figure To compute other Parameters: Enter the values of the probability, one of the parameters, and a positive value for x; click on the parameter that is missing. This distribution has applications in reliability and queuing theory.
Examples include the distribution of failure times of components, the distribution of times between calibration of instruments which need re-calibration after a certain number of uses and the distribution of waiting times of k customers who will arrive at a store.
The gamma distribution can also be used to model the amounts of daily rainfall in a region. For example the data on daily rainfall in Sydney, Australia, October 17 — November 7; years — were modeled by a gamma distribution. A gamma distribution was postulated because precipitation occurs only when water particles can form around dust of sufficient mass, and the waiting time for such accumulation of dust is similar to the waiting time aspect implicit in the gamma distribution Das Stephenson et al.
Find the percentage of summer rainfalls exceed six inches. Solution: Let X denote the total summer rainfall in a year. To find a right endpoint, enter 3 for a, 2 for b, and 0. To find a lower endpoint, enter 0. Find E X. These equations may yield reliable solutions if a is expected to be at least 2. Let S0 be an observed value of S. To get one-sided limits, click [1-sided] to get 0. This means that the true value of b is at least 0. To get the p-value, enter 0. Thus, we conclude that b is significantly greater than 0.
An Identity: Let F x a, b and f x a, b denote, respectively, the cdf and pdf of a gamma random variable X with parameters a and b. Additive Property: Let X1 ,. Exponential: Let X1 ,. The series A method of evaluating continued fraction is given in Kennedy and Gentle , p. The following Fortran function routine is based on the series expansion of the cdf in We denote the above beta distribution by beta a, b.
A situation where the beta distribution arises is given below. For equally large values of a and b, the cumulative probabilities of a beta distributions can be approximated by a normal distribution. To compute percentiles: Enter the values of a, b and the cumulative probability; click [x]. To compute other parameters: Enter the values of one of the parameters, cu- mulative probability, and the value of x; click on the missing parameter.
To compute moments: Enter the values of a and b and click [M]. Moment estimators can be used as initial values to solve the above equations numerically. Therefore, cumulative probabilities and percentiles of these distributions can be obtained from those of beta distributions. For example, as mentioned in Sections 3. Beta dis- tributions are often used to model data consisting of proportions. Applications of beta distributions in risk analysis are mentioned in Johnson Chia and Hutchinson used a beta distribution to fit the frequency dis- tribution of daily cloud durations, where cloud duration is defined as the fraction of daylight hours not receiving bright sunshine.
They used data collected from 11 Australian locations to construct 11 stations by 12 months empirical frequency distributions of daily cloud duration. Sulaiman et al. Nicas pointed out that beta distributions offer greater flexibility than lognor- mal distributions in modeling respirator penetration values over the physically plausible interval [0,1].
An approach for dynamically computing the retirement probability and the retirement rate when the age manpower follows a beta dis- tribution is given in Shivanagaraju et al. The coefficient of kurtosis of the beta distribution has been used as a good indicator of the condition of a gear Oguamanam et al.
SchwarzenbergCzerny showed that the phase dispersion minimization statistic a popular method for searching for nonsinusoidal pulsations follows a beta distribution. In the following we give an illustrative example.
Daniel considered these data to demonstrate the application of a run test for testing randomness. We will fit a beta distribution for the data. Table Using the computed mean and variance, we compute the moment estimators see Section The observed quantiles qj that is, the ordered proportions for the data are given in the second column of Table For example, when the observed quantile is 0. Comparison between the sample quantiles and the corresponding beta quantiles see the Q—Q plot in Figure Using this fitted beta distribution, we can estimate the probability that the sunshine period exceeds a given proportion in a November day in Atlanta.
Chi-square Distribution: Let X and Y be independent chi-square random variables with degrees of freedom m and n, respectively. Negative Binomial: Let X be a negative binomial r, p random variable. Gamma: Let X and Y be independent gamma random variables with the same scale parameter b, but possibly different shape parameters a1 and a2.
The following Fortran subroutine evaluates the cdf of a beta a, b distribu- tion, and is based on the above method. Their algorithm uses a combination of the recurrence relations 1 e and 1 f in Section For computing percentiles of a beta distribution, see Majumder and Bhattacharjee b, Algorithm AS It is clear from the density function The plots of the noncentral chi-square pdfs in Figure To compute percentiles: Enter the values of the df, noncentrality parameter, and the cumulative probability; click [x].
To compute moments: Enter the values of the df and the noncentrality param- eter; click [M]. The noncentral chi-square distribution is also useful in computing approximate tolerance factors for univariate see Section Normal Approximations: Let zp denote the pth percentile of the stan- dard normal distribution.
The following algorithm is based on the additive property of the noncentral chi-square distribution given in Section To compute The following Fortran function subroutine computes the noncentral chi- square cdf, and is based on the algorithm given in Benton and Krishnamoorthy It is clear from the plots that the noncentral F distribution is always right skewed.
To compute moments: Enter the values of the numerator df, denominator df and the noncentrality parameter; click [M]. StatCalc also computes one of the degrees of freedoms or the noncentrality parameter for given other values. Let us consider the power function of the Hotelling T 2 test for testing about a multivariate normal mean vector.
The noncentral F distribution also arises in multiple use confidence estima- tion in a multivariate calibration problem. Then a. To compute the cdf, compute first the kth term in the series We also observe from the plots of pdfs in Figure To compute other parameters: Enter the values of one of the parameters, the cumulative probability and x.
Click on the missing parameter. To compute moments: Enter the values of the df, and the noncentrality param- eter; click [M]. More specifically, powers of the t-test for a normal mean and of the two-sample t-test Sections The percentiles of noncentral t distributions are used to compute the one-sided tolerance factors for a normal population Section This distribution also arises in multiple-use hypothesis testing about the explanatory variable in calibration problems [Krishnamoor- thy, Kulkarni and Mathew , and Benton, Krishnamoorthy and Mathew ].
Otherwise, forward computation of The following Fortran function routine tnd t, df, delta computes the cdf of a noncentral t distribution. This program is based on the algorithm given in Benton and Krishnamoorthy The Laplace distribution is also referred to as the double exponential distribu- tion. To compute parameters: Enter value of one of the parameters, cumulative probability, and x; click on the missing parameter. The Laplace distribution can also be used to describe breaking strength data.
Korteoja et al. Sahli et al. In the following we see an example where the differences in flood stages are modeled by a Laplace distribution. The data were first considered by Gumbel and Mustafi , and later Bain and Engelhardt justified the Laplace distribution for modeling the data. Kap- penman used the data for constructing one-sided tolerance limits.
Using these esti- mates, the population quantiles are estimated as described in Section 1. For example, to find the population quantile corresponding to the sample quantile 1.
The Q-Q plot of the observed differences and the Laplace The Q-Q plot shows that the sample quantiles the observed differences and the population quantiles are in good agreement. For ex- ample, the percentage of differences in flood stages exceed Chi-square: If X1 ,. To compute percentiles: Enter the values a, b, and the cumulative probability; click [x]. To compute other parameters: Enter the values of one of the parameters, cu- mulative probability and x; click on the missing parameter.
Explicit expressions for the MLEs of a and b are not available. These estimators may be used as initial values to solve the equations in It is also used to analyze data related to stocks. Braselton et. These authors found that a logistic distribution provided the best fit for the data even though the lognormal distribution has been used traditionally to model these daily changes.
An application of the logistic distribution in nuclear-medicine is given in Prince et. The logistic distribution is also used to predict the soil-water retention based on the particle-size distribution of Swedish soil Rajkai et. Scerri and Farrugia compared the logistic and Weibull distributions for modeling wind speed data. Applicability of a logistic distribution to study citrus rust mite damage on oranges is given in Yang et.
Y2 For more results and properties, see Balakrishnan Since X is actually an antilogarithmic function of a normal random variable, some authors refer to this distribution as antilognormal.
This dialog box also computes the following. The inferential procedures are based on the generalized variable approach given in Krishnamoorthy and Mathew Then, T. Furthermore, exp T. For a given sample size, mean, and standard deviation of the logged data, StatCalc computes confidence intervals and the p-values for testing about a lognormal mean using Algorithm Illustrative Examples Example To find one-sided confidence limits, click [1] to get 5.
You've discovered a title that's missing from our library. Can you help donate a copy? When you buy books using these links the Internet Archive may earn a small commission. Open Library is a project of the Internet Archive , a c 3 non-profit. See more about this book on Archive. This edition doesn't have a description yet. Can you add one? Previews available in: English. Add another edition? Handbook of statistical distributions Jagdish K. Donate this book to the Internet Archive library.
If you own this book, you can mail it to our address below. Borrow Listen. Want to Read. EDA Techniques 1. Probability Distributions 1. Detailed information on a few of the most common distributions is available below.
There are a large number of distributions used in statistical applications. It is beyond the scope of this Handbook to discuss more than a few of these. Two excellent sources for additional detailed information on a large array of distributions are Johnson, Kotz, and Balakrishnan and Evans, Hastings, and Peacock. Equations for the probability functions are given for the standard form of the distribution.
Formulas exist for defining the functions with location and scale parameters in terms of the standard form of the distribution. The sections on parameter estimation are restricted to the method of moments and maximum likelihood. This is because the least squares and PPCC and probability plot estimation procedures are generic. The maximum likelihood equations are not listed if they involve solving simultaneous equations. This is because these methods require sophisticated computer software to solve.
0コメント