## The CERTT Team Explains Precision & Recall |

**Precision** is one measure of the effectiveness of some computer applications for finding search words, candidate terms, and other items. (The other common measure is recall.)

Precision is a measure of the proportion of results of a computer application that are considered to be pertinent or correct. For example, if a computer application is searching for terms in a document and finds 100 candidates, 65 of which really are terms (that is, there are 65 correct results out of 100 total), then the precision of the application’s results is 0.65. (The other 35 non-pertinent results are called noise.)

Precision and recall generally vary inversely; that is, as precision increases, recall generally decreases, and vice versa. For this reason, it can be very difficult to achieve high recall and high precision simultaneously. Usually, computer applications try to give the best possible balance between the two, but according to different users’ needs or different applications, it may be preferable to maintain a higher degree of one or the other.

For further discussion, see the example here.

**Recall** is one measure of the effectiveness of some computer applications for finding search words, candidate terms, and other items. (The other common measure is precision.)

Recall is a measure of the proportion of all possible correct results of a computer application that the application actually produces. For example, imagine that you are using a computer application to search for terms in a document that has 90 terms in it. (You know because you counted them.) If the application finds 65 of these terms, then the recall of the application is 65 out of 90, or 0.72. (The remaining 25 terms that the application did not find are called silences.)

Obviously, recall is not a measure that is applied every time you use a computer application, because it would defeat the purpose to count all the correct results yourself before using a tool! However, testing recall can help to evaluate a tool’s performance before you choose to use it, to determine whether, how and when a tool can be useful in future jobs, and also to help you to adjust settings on some tools to adjust their performance to meet your needs (e.g. balancing recall and precision).

Precision and recall generally vary inversely; that is, as precision increases, recall generally decreases, and vice versa. For this reason, it can be very difficult to achieve high recall and high precision simultaneously. Usually, computer applications try to give the best possible balance between the two, but according to different users’ needs or different applications, it may be preferable to maintain a higher degree of one or the other.

For further discussion, see the example here.

When non-pertinent items appear in the results of a search, in a list of candidate terms, or in the product of other computer applications, we call these results **noise**. For example, if you search for the noun lead, and you find occurrences of the verb lead, because of part-of-speech ambiguity, these occurrences of the verb constitute noise.

Noise is one of the ways of measuring the effectiveness of computer applications. It forms the basis for the calculation of precision. It is also often contrasted with silences. Generally, the proportions of noise and silences in the results of computer applications vary inversely.

For further discussion, see the example here.

**Silences** are correct results of a search, term extraction or other output of a computer application that are present in the data being analyzed, but that the application does not find or produce. They are used in the calculation of the measure of recall. (The proportion of silences in a tool’s output is often compared to the proportion of noise.)

Imagine that you have a document that contains 90 real terms. If a computer application searching for candidate terms finds 65 of them, then the 25 terms that are not identified by the application constitute silences. Silences are a challenge for computer applications, because users may miss important information that is present in the data they are analyzing, because a tool is not able to find them.

For further discussion, see the example here.

It can be difficult to keep track of the concepts of noise and silences, precision and recall and the connections between them. Here's an example to help you keep track.

Imagine a 10,000-word document. If you read the document yourself (a long process!), you will find that it contains 90 terms (and of course a number of words that are not terms). When you use a term extractor to process the document more quickly and easily, it finds 100 candidate terms, of which 65 are on your list of “real” terms. This means that:

- Of the 90 real terms in the document, the term extractor found 65. This means that its
*recall*is 65 out of 90, or 0.72. - Of the 100 candidate terms on the list, 65 are correct. This means that its
*precision*is 65 out of 100, or 0.65. - The 25 “real” terms that the term extractor did not find are called
*silences*. - The 35 candidate terms that are not correct are called
*noise*. - The 65 “real” terms that were found are considered to be
*hits*, correct results. - Anything that both is not a term and was not found by the term extractor doesn’t really have a name, because it isn’t pertinent to any typical evaluations.

These measures generally vary inversely. That is, if you adjust a tool to give very high precision (e.g. by targeting certain types of results more closely), you will probably end up increasing the proportion of silences (missed correct results), and therefore reducing the recall. If you adjust a tool to find more pertinent results (e.g. by including more candidates), you will probably also introduce more noise in the results and reduce the precision. It is very difficult to maximize both of these measures at once. Generally, it is necessary to try to find a balance that works well for a particular use or user.

*Copyright CERTT 2011*