Glossary

The purpose of the study of statistics is to help us understand large populations by analyzing small samples from those populations. For samples to provide us with reliable inferences, they need to be selected in a manner that ensures they are useful. The following glossary describes statistical terminology in the context of the mortgage industry in an attempt to clarify why statistical concepts are important to quality control.  

Bias – Bias is an influence in a sampling procedure that causes one type of loan to have a greater probability of being sampled than others. The following example will illustrate the concept in the context of mortgage quality control:

Often, lenders conduct samples based on loan size (they want to get a feel for how their bigger investments are doing), and then attempt to draw inferences to the broader population. This practice results in a severe bias in sampling. Assume that larger loans tend to have fewer defects (perhaps this is because larger loans go to more reputable borrowers and undergo more extensive underwriting from more qualified staff). Now assume that a lender is putting together a sample of loans for QC purposes and selects 70 large loans and 30 small ones for a total sample of 100. Assume that large loans have a defect rate of 5%, and small ones have a defect rate of 10%. The defect rate of the sample would be 6.5%. Now, assume that in the total population, the vast majority of loans (say, 80%) are small loans, while only 20% are large ones. Had the sample been truly random, we would expect to see 80 small loans and 20 large ones, and the total sample would have a defect rate of 9%. Because of the bias in sampling, the sampled defect rate shifted a full 2.5%.

There are other possible sources of bias as well. For example, if a sample calls for 100 loan records, but only 80 are actually sent in for review, then that will introduce a bias, because the records that are submitted are not necessarily statistically similar to the intended sample (it may be, for example, that the 20 records that did not come in were withheld at a particularly error-prone branch).

For more on bias, click here.

Chance Error – Chance error determines how far away from the true value a sample value is likely to be based on natural variation. In the long run, it is expected that chance errors will more or less cancel each other out.

Confidence Level – Confidence levels measure the likelihood that the true value is within a certain distance of the results of a sample analysis. For example, if we analyzed a sample of loans and said that the sample defect rate was 12%, the confidence level would tell us how likely it is that the population defect rate is also 12%. If the result we produce says that the population value is 12% with a 95% confidence level, this means that 95% of the time, the population value will be about 12%. Usually, confidence levels are applied to a range of values rather than to a single value. In other words, you may see that the value for the defect rate is 12%, with 2% precision and a 95% confidence level. This means that 95% of the time, the true value for the population will be between 10% and 14%. Many times, a 95% confidence level is used because it is considered accurate enough to produce reliable results. However, it is possible to craft confidence levels of any value, up to 99.99%. 

Control Chart – A control chart is a chart that explains when processes are misbehaving beyond the natural, expected number of errors. In the Cogent system, control charts are typically used to identify particular branches or underwriters that are performing worse than the rest of the system. To read more about control charts, see our primer.

Non-Sampling Error – These are inconsistencies that are produced within the review process. For example, if a particular QC underwriter is more prone to finding errors and declaring loans defective than others, that will produce a non-sampling error.

Population – The population consists of all loan records. Ultimately, we would like to understand how the population works, but it is too large for us to analyze every loan and come to a stable conclusion.

Precision – Precision is a measure of how much variability a particular sampling methodology produces. The most significant variable that influences precision is sample size. The larger a sample is, the more precise its results are expected to be. Remember, precision is a measure of the effectiveness of a particular methodology. It deals with the question, “If I used this methodology over and over again, how close would all the results be to the value I expect?” For more on precision, click here.

Sample – A sample is a subset of a population. In the field of mortgage quality control, a sample is a small number of loans selected for analysis. A random sample is a sample that is chosen such that every loan record in the population has an equal probability of being selected for the sample, and each loan record is chosen independently of the others. A targeted sample, on the other hand, is a sample that gives special consideration to variables about a loan record when determining whether it will be sampled (the origination source, or particular risk parameters, for example).

Standard Deviation – Standard deviation is a basic statistical concept that is used to explain when variations can be attributed to chance and when other forces are most likely responsible. The standard deviation is calculated by taking the root mean squared error of a particular sample. This means that the average value for a sample is calculated, and then the errors are measured by seeing how far each value in the sample was from the average. Those errors are then squared, and the average of the squares is taken. The square root of that value is the size of one standard deviation.

You will most likely never have to calculate a standard deviation yourself. The important thing to remember about standard deviation is that it allows you to measure how much variation and error you can expect to encounter in your lending process. The larger your standard deviations, the more variability you would expect to see.