active terminology recognition tools

traducteurs de vocabulaire

Active terminology recognition (ATR) tools automatically analyze texts in electronic format in order to identify occurrences of terms or other items in that text that are also present in a terminology database. When they identify occurrences of known terms, they may highlight them in the text, propose equivalents stored in the database for insertion in the text by the user, or automatically replace occurrences of these known terms with their equivalents from the database. ATR tools must be combined with a terminology base (usually stored in a terminology management system) and are often integrated with translation memory systems. Some examples of ATR tools include the pre-translation function in LogiTerm and the TermBase Agent in MultiTrans.




Ambiguity is a phenomenon in which a single item (e.g. a word form, a series of word forms) may be interpreted in two or more ways. Specific types of ambiguity include part-of-speech ambiguitysemantic ambiguity and structural ambiguity.  




The base of a collocation (1) is the “main” word (lexical unit) in the combination. Generally, a speaker will choose the base freely, and the choice of words that combine with it (collocates) will be constrained by linguistic convention.

For example, once you decide to use the noun glance you will generally use the verb cast. If you choose to use the noun look, you will express essentially the same meaning with the verb take. Your choice of verbal collocate is conditioned by your choice of the base noun. 


bilingual concordancers

concordanciers bilingues

Bilingual concordancers allow users to search for occurrences of character strings (i.e. sequences of characters) in bitexts (i.e. original texts and their translations that have been aligned and are displayed either side-by-side or one above the other). They usually allow users to search in one or both languages, and offer advanced searching features (e.g. Boolean operators, wildcards). These tools can help users to find, study and compare various occurrences of the character string, and to identify and/or evaluate potential translations of lexical units, phrases, sentences or even structures. Some examples of bilingual concordancers include WeBiText, the bitext search function of LogiTerm and TransSearch.


bitext aligners

aligneurs de bitextes

Bitext aligners are used to create bitexts (i.e. to break down original (source) texts and their translations (target texts) into smaller segments and then to match corresponding source and target segments). These tools generally work on a sentence level, and match sentences according to their relative lengths, their place in the text, and sometimes their contents. Since these formal criteria and the nature of texts do not always allow for perfect alignment, most tools also offer functions to help users correct the alignment manually. Bitexts can be used to help in analyzing translations and translation techniques; they can be searched using bilingual concordancers and are also the starting point for the creation of translation memories (cf. the entry for translation memory systems). Examples of bitext aligners include the bitext creation function in LogiTerm, the TextBase Builder in MultiTrans, the aligner in Fusion Translate, and SDL Trados WinAlign. Some bilingual concordancers also include alignment functions.




A button is generally a grey square or rectangle that is labeled with the name of a function or an icon representing it, and allows you to activate that function. 


 character string

chaîne de caractères

In its simplest definition, a character string is simply a series of characters (letters, numbers, etc.). The term character string is often used in computing to represent a series of characters that appear one after the other, sometimes delimited (defined) by surrounding spaces, punctuation marks, or symbols.

Since computer applications cannot recognize words by their forms alone (and sometimes, in cases of part-of-speech ambiguitysemantic ambiguity or structural ambiguity, neither can humans!), computers work by identifying and matching character strings. Sometimes character strings correspond to word forms, while in other cases they may represent only part of a word form, or more than one word form in combination.

Therefore, when you do a search using a computer tool, you enter what you think of as a word, or possibly a word form. But you must remember that the computer interprets this as a character string: a simple series of characters. This is why it cannot, for example, tell the difference between two words that are spelled the same (homographs). 


check box

case à cocher

Check boxes are generally used when a specific function in a tool can be activated or deactivated. Generally, if a checkmark appears in a check box, the corresponding function is activated, and if the check box is empty, the function is deactivated. You can check or uncheck a check box simply by clicking on it. 




The term click is used to denote placing the cursor on the object you wish to activate or select and pressing the left button of the mouse once. This may allow you, for example, to select an entry from a list, to activate functions using buttons or check boxes, to open menus, or to select menu options


click and drag


One way to move files or other objects on your desktop, between directories or sub-directories, and so on is using a click and drag (also sometimes called a drag and drop). To do this, click on the object you would like to move, but rather than letting the mouse button go, hold it down. Then, still holding down the button, use the mouse to slide the object across the screen to the new location. Once the object is where you want it, let go of the mouse button. The object will appear in its new location. 




The collocate in a collocation (1) is the word (lexical unit) that is chosen as a function of the base used, because of linguistic conventions.

For example, once you decide to use the noun glance you will generally use the verb cast. If you choose to use the noun look, you will express essentially the same meaning with the verb take. Your choice of verbal collocate is conditioned by your choice of the base noun. 




1) A collocation is a combination of two words (lexical units) that are generally closely related semantically and are very often used together with a direct syntactic link between them. The combination may be so frequent that is has become the usual or expected way of expressing a particular idea, and it may even sound odd or unnatural to hear the idea expressed in another way.

Collocations consist of a base, a lexical unit that is chosen freely to express an idea, and a collocate, a lexical unit that is commonly used to express something in combination with the base. Collocates often express ideas such as intensification or attenuation and typical actions.

Some examples include to cast a glance, to take a look, a big problem, and madly in love. In these cases, glance, look, problem and in love are the bases, and cast, take, big, and madly are the collocates.

2) In some contexts, and particularly computer applications, the term collocation can also be used with a simpler meaning. In this case, the term refers simply to words that commonly occur together. (Thus, in this use of the term, there may not be a specific type of semantic or syntactic link between the two units.) 


compressed folder

dossier compressé

A compressed folder, also sometimes called a zip file, is a file that contains another file or series of files that have been compressed (or zipped) to make them smaller. Compressed folders are useful, for example, for making backups of files and for sending large or multiple files as attachments to e-mails.

In your courses and the CERTT tutorials and exercises, you will be asked to use the Windows compression function, which you can access by selecting a file or files in My Documents or My Computer, right-clicking, and choosing the option Send to… and then Compressed folder from the contextual menu that appears.

Compressed folders generally have the extension .zip. (However, there are other types of compressed folders that have extensions such as .rar. These types of compressed folders may require special programs for compression and decompression, so if you are going to use them you should be careful to check whether both you and anyone you are exchanging files with have the necessary programs.)

You will generally have to decompress (unzip) files (that is, remove them from the compressed file, generally by double-clicking on a compressed folder in My Computer or My Documents and using Windows’ Extract files function) before you can open or modify them in a program such as a word processor or translation tool. (However, you can often view the contents of files inside a compressed folder by double-clicking on the compressed folder and then the files inside in My Computer or My Documents.) 


contextual menu

menu contextuel

A contextual menu is a menu that provides access to a list of functions available in a specific situation. Contextual menus can be accessed by right-clicking on an item; the options that appear in the contextual menu will generally include those that are applicable to that item and are available in that context. 




Corpora are collections of electronic texts that have been assembled to assist users in studying language and its use. They are generally designed to provide a representative sample that gives an overview of a certain type of language (e.g. in a particular register, region, field, or type of text). These text collections may assist users in determining how lexical units are generally used or combined. Corpora are generally searched using either monolingual concordancers (e.g. in the case of monolingual or comparable corpora) or bilingual concordancers (in the case of parallel corpora, also called bitext corpora). Some well-known corpora that can be consulted online include the British National Corpus and Frantext




The cursor (also sometimes called the pointer or mouse pointer) is represented by an icon that is usually in the shape of an arrow and indicates the position on the screen in which any operation will be carried out. Placing the cursor in the correct place will allow you to click to open a menu, choose a menu option, etc.

When resizing windows, the cursor may take the form of a double-headed arrow, allowing you to click and drag the borders of a window to a new position.

When you are entering text, e.g., in a word processor document, e-mail, form or text box, the cursor often takes the form of a vertical line, indicating the position in which text will be inserted when you type. 




The Windows desktop is what you see when you first start up or log on to a computer. It generally displays a background image (at the Writing Centre, this is the University of Ottawa logo) and will often also display some shortcuts to programs (such as Internet Explorer) and data storage locations (such as My Computer or My Documents). 


dialogue box

boîte de dialogue

A dialogue box is generally a smaller, secondary window that appears in addition to (usually in front of) the main window of a program and allows you to enter choices or information needed to carry out a specific task in a tool. Once you have finished entering information, the box will close and return you to the main tool window. 




Also sometimes called a folder, a directory is a division within a larger data storage device (e.g., a disk drive or server) in which files can be stored. 




The term double-click is used to denote clicking twice in quick succession in the same place. This often allows you to open programs, directories or files. 


dropdown list

liste déroulante

A dropdown list is a list of choices that are available for a specific purpose; for example, these may be directories or file formats that are available for saving or opening files, or lists of available options for filling in forms. An arrow to the right of the list allows you to see the options available and choose the one you want. 


electronic dictionaries

dictionnaires électroniques

Electronic dictionaries are becoming more and more popular alternatives to traditional paper dictionaries. These resources, which may be monolingual, bilingual or multilingual, may be available on CD-ROM or online. They often offer fast, easy and flexible access to the content of dictionary entries to help translators and other users find the information they need. It is important not to confuse these dictionaries with term banks, which, while also usually available online, generally have different purposes and organization. It is also important to note that not all electronic dictionaries are of the same quality; it is particularly important to be cautious when using free online dictionaries. The Oxford English Dictionary and the Dictionnaires de l’Académie françaiseare very well-known dictionaries that can be consulted online (the latter through the Centre national de resources textuelles et lexicales). Others that you may find useful include the Longman Dictionary of Contemporary English Online, the Random House Webster’s Unabridged, the Oxford-Hachette and the Nouveau Petit Robert.


grammatical category; part-of-speech category

catégorie grammaticale; partie du discours

Grammatical categories are classes of words (or more specifically, lexical units) that are defined by common patterns of behaviour and roles in sentence structures. For example, verbs, nouns, adjectives, adverbs and determiners behave similarly to other members of the same class (showing similar patterns of inflection, combination with other lexical units, etc.), but differently from the members of the other classes. (For example, the vast majority of verbs are conjugated to correspond to the person and number of their subjects, and thus vary in form. But these are the only lexical units that behave this way.) 




1) Homographs are two lexical units (or forms of lexical units) that are different, but are written the same. For example, the verb form lead (e.g. You can lead a horse to water) is a homograph of the noun form lead (e.g. this is so heavy it must be made of lead). The written forms of the two lexical units are identical (although in this case they are pronounced differently, i.e. are not homophones).

The verb lead is also a homograph of the noun lead (e.g. I walk my dog on a lead). In this case, the lexical units have identical written forms and pronunciations, so they are both homographs and homophones.

2) In the context of semantic ambiguity, homography is often contrasted with polysemy. From this perspective, the essential characteristics of homography are the existence of two distinct lexical units belonging to the same grammatical category that have identical written forms but have distinct meanings that are not closely related. For example, the nouns lead (e.g. this is so heavy it must be made of lead) and lead (e.g. I walk my dog on a lead) are homographs, because their meanings are not related. These types of meaning differences are often reflected by the existence of two separate dictionary entries for the two lexical units. 




Homophones are lexical units (or forms of lexical units) that are different, but are pronounced the same. For example, the past tense of the verb read (e.g. We read a story before bedtime last night) is a homophone of the noun red (e.g. The cover of the book was red). In this case, the pronunciation is identical, although the written forms are different. So read and red are homophones but not homographs. The verb lead (e.g. You can lead a horse to water) is a homophone of the noun lead (e.g. I walk my dog on a lead). In this case, the lexical units also have identical written forms, so they are both homographs and homophones. 




An icon is a graphic that often represents a specific function in a tool. These icons may be used to indicate the functions associated with buttons on a toolbar or elsewhere in a program’s interface. 




Words can vary in form according to the context in which they are used. For example, verbs are conjugated, so they differ in form according to person, number, tense, and so on. Nouns vary in number, having both singular and plural forms. Adjectives in some languages (such as French) also vary in form to agree with the nouns they modify. This modification in the form of words according to contextual factors is called inflection. Because of inflection, many words (lexical units) can have a number of different word forms.

It may be difficult for some computer applications to identify the links between inflected forms and the lexical unit to which they correspond. Two techniques that can help deal with this problem are lemmatization and stemming




The term lemma is used to refer to the base form of a word (lexical unit). This is the form of the word that is generally found as a headword in a dictionary.

For example, the inflected (i.e. conjugated) verb forms go, goes, going, gone and went are all associated with the base form (lemma) go. The inflected forms word and words are both associated with the base form, or lemma,word




In lemmatization, a software program identifies the lemma, or base form, to which inflected (e.g. conjugated) forms of words (lexical units) correspond.

For example, lemmatization would involve identifying that the word forms go, goes, going, gone and went are all associated with the lemma go. Note that all of these word forms belong to the same lexical unit, the verb to go. In order to carry out the process of lemmatization, the software needs to have and apply a list of all of the inflected forms of a particular lexical unit, or a set of rules for forming these inflected forms from the base form.

Lemmatization should not be confused with stemming, which groups forms of one or many lexical units together based on the sharing of a part of these forms. 


lexical unit

unité lexicale

The term lexical unit is often used rather than word to avoid some of the ambiguities (semantic, part-of-speech) associated with the latter term and refer more precisely to these linguistic units. While word may be interpreted in different ways, the term lexical unit always denotes an association of two important components: 1) a word form (or set of word forms, in the case of words that vary in form with inflection or for other reasons) and 2) a specific meaning (or set of closely related meanings, in the case of polysemous items). Using this definition allows us to more easily describe phenomena of semantic and part-of-speech ambiguity, as well as issues resulting from inflection. It also allows us to deal with units that have a very specific meaning but are composed of more than one character string (e.g., pomme de terre, chemin de fer).

Different lexical units can be distinguished by their grammatical category. For example, a verb is always a different lexical unit from a noun or an adjective, even if the forms and meanings of the two lexical units are similar or even identical. They can also be distinguished by their meanings: completely unrelated meanings generally signal distinct lexical units, even if the two units are homographs.  


localization tools

outils de localisation

Localization is a task that involves the translation and adaptation of a Web page, software application or other product to a particular linguistic and cultural community. Because the task often involves working with complex computer coding, many participants are generally involved in the process. A number of localization tools have been created to assist translators and other localization professionals in managing the complexities of the task (including managing workflow, providing accurate word counts and estimates, separating the textual components to be translated from computer code that must be preserved, and managing terminology and previously translated texts or versions of texts). In addition to dedicated localization tools, translation memory systems and terminology management systems are very useful for many localization projects. Localization tools include Catalyst, CatsCradle, Passolo, and WebBudget.


machine translation systems

systèmes de traduction automatique

Machine translation (MT) systems are unlike all of the other tools described in this glossary, because rather than assisting a human translator or other language professional in his or her work, MT systems take charge of the entire process of translating texts. However, this doesn’t mean that language professionals have no role to play. When MT systems are used, humans are most often involved in the revision (called post-editing) of the target text produced by the MT system to ensure that it is correct and adequate for its intended use. In some cases, humans may also be able to adjust the way the MT system works (e.g. by adding to or modifying the dictionaries it uses) or by preparing documents in such a way that they can be translated as successfully as possible by the system (called pre-editing). MT systems are most useful when source texts can be carefully prepared to be easily translated by the system (e.g. by clarifying ambiguous expressions, using short sentences), and/or when the target text is intended purely to assist in comprehension (and not, for example, for publication). There are different underlying techniques used by MT systems. Some try to imitate the ways in which humans process language (e.g. using grammar rules), while others operate using statistical probabilities or by taking examples of previously translated text as models. Just as human translators may produce slightly different versions of a target text, so will different MT systems. It is important not to mix up the short form MT for machine translation with TM, which stands for translation memory. You will find tools such as Systran and Reverso installed in the Writing Centre; a number of other systems, such as Babelfish and Google Translate, are also available online for free. (It is important to be particularly cautious when using free online tools, as they obviously do not offer the same kinds of advantages as locally installed tools that can be adapted for use on specific texts!)




You will no doubt be familiar with menus from using Word, WordPerfect, and other software in a Windows or Mac environment. These menus allow you to access the various options available in a tool. 




A message appears to display important information, for example to signal that an operation has been carried out or that an error has occurred. You will not generally have to take any action or provide any information, although you may have to click a button to confirm that you have seen the message. 


monolingual concordancers

concordanciers unilingues

Monolingual concordancers are computer tools that help in the analysis of corpora. They are used primarily to find and display occurrences of character strings (i.e. sequences of characters) in corpora. They usually offer advanced searching features (e.g. Boolean operators, wildcards). Some present the occurrences retrieved in key-word-in-context (KWIC) format, which displays each occurrence on a separate line, with the character string the user searched for displayed in the centre. This presentation is designed to make comparing multiple occurrences easier and more efficient, particularly as most tools also allow the occurrences to be sorted using various criteria. Analyzing occurrences of character strings can help users to evaluate how words and phrases are used and combined. Many concordancers also offer additional corpus analysis functions (e.g. that can make lists of all of the word forms present in corpora and their frequencies, to help identify particularly pertinent items in a collection of texts). Examples of monolingual concordancers available in the Writing Centre include WordSmith Tools, and the full-text search feature of LogiTerm. Some other concordancers can be used and/or downloaded online for free; these include TextStat, WebConc, WebCorp, AntConc and Corsis.


My Computer

Poste de travail

This Windows directory allows you to quickly and easily access all of the data storage locations available on your workstation, or that you can access from that workstation. These will generally include a floppy disk drive, My Documents (also called the U: drive) which you will use to store your files, as well as a number of servers to which  you can connect to access and store files if requested to do so by your professor or instructor. 


My Documents

Mes documents

This is the directory in which you will always store your work and other files. This directory is private: only you can access the files you store in it.

At the uOttawa Writing Centre, it is also labeled as the Z: drive, because rather than a directory on the hard drive of the workstation you are using, it is actually a space of about 10 MB on a server at the Writing Centre, which you can access from any workstation in either the main Writing Centre or the Writing Centre classroom.

This means that you can work at any workstation and still access your files in exactly the same way. (However, to use your files outside the Writing Centre, you will still need to transfer your files to a USB key, floppy disk or other medium, or send a copy by e-mail.)  




When non-pertinent items appear in the results of a search, in a list of candidate terms, or in the product of other computer applications, we call these results noise. For example, if you search for the noun lead, and you find occurrences of the verb lead, because of part-of-speech ambiguity, these occurrences of the verb constitute noise.

Noise is one of the ways of measuring the effectiveness of computer applications. It forms the basis for the calculation of precision. It is also often contrasted with silences. Generally, the proportions of noise and silences in the results of computer applications vary inversely.

For further discussion, see the example here. 




Tool functions can usually be accessed by choosing one of the options (i.e., names of functions) that appear in a menu. In many cases, keyboard shortcuts that allow you to access functions without going through the menu are indicated to the right of the option. 


other Office tools

autres outils Office; autres outils de bureautique

The Microsoft Office Suite offers a number of software applications to assist with tasks such as creating, editing and managing presentations (PowerPoint), spreadsheets (Excel) and databases (Access). Translators, writers and revisers may use these tools to access documents for translation and to store and manage terminology, client information and other data. Professors may be interested in using such tools to prepare lectures or conference presentations, to store research data or to calculate grades.


part-of-speech ambiguity

ambiguïté catégorielle

Part-of-speech ambiguity involves the existence of two distinct lexical units, belonging to two different grammatical categories, but that have one or more identical word forms. For example, the form lead presents part-of-speech ambiguity, because the verb lead (e.g. You can lead a horse to water) and the noun lead (e.g. I walk my dog on a lead) share this form (and, for that matter, the form leads as well). Computer applications such as concordancers and term extractors may not be able to differentiate between occurrences of these two lexical units because of their identical forms. This may also be problematic when this ambiguity contributes to structural ambiguity




Polysemy is a type of semantic ambiguity in which a single lexical unit has two or more meanings that are closely related. (It is important to differentiate between polysemy and homography, which describes the existence of two distinct lexical units with identical forms but with meanings that are not closely related.) The links between meanings are often based on metaphorical comparisons, figurative meanings that have evolved from literal meanings, and so on.

For example, the noun file has at least two distinct meanings: one corresponding to the “paper” version, and another that developed from an extension of the first meaning to refer to “electronic” or “computer” files. Polysemous lexical units are generally described in a single dictionary entry, but with separate definitions for each of the meanings. 




Precision is one measure of the effectiveness of some computer applications for finding search words, candidate terms, and other items. (The other common measure is recall.)

Precision is a measure of the proportion of results of a computer application that are considered to be pertinent or correct. For example, if a computer application is searching for terms in a document and finds 100 candidates, 65 of which really are terms (that is, there are 65 correct results out of 100 total), then the precision of the application’s results is 0.65. (The other 35 non-pertinent results are called noise.)

Precision and recall generally vary inversely; that is, as precision increases, recall generally decreases, and vice versa. For this reason, it can be very difficult to achieve high recall and high precision simultaneously. Usually, computer applications try to give the best possible balance between the two, but according to different users’ needs or different applications, it may be preferable to maintain a higher degree of one or the other.

For further discussion, see the example here. 




Recall is one measure of the effectiveness of some computer applications for finding search words, candidate terms, and other items. (The other common measure is precision.)

Recall is a measure of the proportion of all possible correct results of a computer application that the application actually produces. For example, imagine that you are using a computer application to search for terms in a document that has 90 terms in it. (You know because you counted them.) If the application finds 65 of these terms, then the recall of the application is 65 out of 90, or 0.72. (The remaining 25 terms that the application did not find are called silences.)

Obviously, recall is not a measure that is applied every time you use a computer application, because it would defeat the purpose to count all the correct results yourself before using a tool! However, testing recall can help to evaluate a tool’s performance before you choose to use it, to determine whether, how and when a tool can be useful in future jobs, and also to help you to adjust settings on some tools to adjust their performance to meet your needs (e.g. balancing recall and precision).

Precision and recall generally vary inversely; that is, as precision increases, recall generally decreases, and vice versa. For this reason, it can be very difficult to achieve high recall and high precision simultaneously. Usually, computer applications try to give the best possible balance between the two, but according to different users’ needs or different applications, it may be preferable to maintain a higher degree of one or the other.

For further discussion, see the example here. 



cliquer à droite

The term right-click is used to denote pressing once on the right button of the mouse (instead of the left, which is most often used). This often allows you to open a contextual menu




In these tutorials and exercises, screen will be used to refer to what you see on your monitor. This may include windows, dialogue boxes, messages, the task bar, etc. 


search engines

moteurs de recherche

The best-known Internet search engine is Google, although there are others (including Yahoo!, Alta Vista, and Ask.com). Search engines analyze the contents of Web pages and create lists of occurrences of word forms found in pages or their URLs, in the information that Web page creators provide about their pages (the pages’ metadata), or in links to pages. This analysis facilitates and accelerates searching online using key words. Search engines also use sophisticated calculations to rank the results of searches in an attempt to present the pages that are most likely to be helpful to a user at the top of the results list. Many search engines also offer a number of specialized searching functions, and each search engine has its own particular syntax that must be learned in order to optimize searches. A different type of search tool, called a meta search engine (e.g. Dogpile), carries out searches in a number of search engines at once, and then synthesizes the results.


semantic ambiguity

ambiguïté sémantique

Semantic ambiguity is a type of ambiguity that includes the sub-types of polysemy and homography. It is present when two or more distinct lexical units that belong to the same grammatical category share one or more word forms but have different meanings. 




Many tools offer multiple ways of accessing functions. One of these is keyboard shortcuts, which generally involve holding down one or more function keys on the keyboard (Shift, Ctrl, Alt) and pressing a letter key. The keyboard shortcut for a function is often indicated to the right of an option in a menu.

Note that in addition to keyboard shortcuts, you may also hear about Windows shortcuts, which are indicated by icons that appear on the desktop or task bar and allow you to open programs without going through the Start menu




Silences are correct results of a search, term extraction or other output of a computer application that are present in the data being analyzed, but that the application does not find or produce. They are used in the calculation of the measure of recall. (The proportion of silences in a tool’s output is often compared to the proportion of noise.)

Imagine that you have a document that contains 90 real terms. If a computer application searching for candidate terms finds 65 of them, then the 25 terms that are not identified by the application constitute silences. Silences are a challenge for computer applications, because users may miss important information that is present in the data they are analyzing, because a tool is not able to find them.

For further discussion, see the example here. 


Start menu

menu démarrer

The Windows Start menu, accessible by clicking on the Start button on the task bar at the bottom left of the screen, offers access to the main functions in Windows, and to most programs. 


status bar

barre d'état

In many tools, a line at the bottom of the main window will display the current status of the tool (e.g., what options are activated, if there is a process underway and how it is progressing, or the result of the last process carried out). This line is called the status bar




In stemming, a software program is used to identify word forms that share a common portion, or stem, so that they can be grouped together. For example, using stemming, it would be possible to group together forms such as thick, thicken, thickened, thickening and thickener.

Note that these forms all share a common stem, thick, but that they are associated with different lexical units that also belong to different grammatical categories: the adjective thick, the verb thicken (three forms), and the noun thickener. This sets stemming apart from lemmatization, as lemmatization is used to group together various forms of the same lexical unit. Also unlike lemmatization, stemming does not require the software to use a list of inflected forms of lexical units or a set of rules for forming them, because it works on the basis of matching character strings.

In addition, the processing of irregular forms is different in lemmatization and stemming. For example, for the verb to go, stemming (like lemmatization) would allow the forms go, goes, going and gone to be grouped together, because they share part of their form. However, unlike lemmatization it would miss went because its form is completely different from the others. This is a significant difficulty in dealing with irregular forms using stemming. 


structural ambiguity; syntactic ambiguity

ambiguïté de structure; ambiguïté syntaxique

Structural or syntactic ambiguity is a type of ambiguity that occurs when a series of word forms can be interpreted in two or more ways that correspond to different combinations of words (lexical units) and/or to structures involving different links between the various lexical units. For example, in the sentence He saw the man with the telescope, who had the telescope? The sentence could be interpreted to mean that he had it, or that the man had it. Similarly, does the sentence Fruit flies like a banana mean that a particular type of fly likes bananas, or that fruit in general, when thrown through the air, will have a similar trajectory to a banana? While the human interpreter can generally make sense of these sentences by calling upon logic, reasoning, and his or her knowledge of the situation and the real world, these cases can be extremely difficult for computer applications to solve. 




A sub-directory is a subdivision of a directory. Sub-directories can be useful for keeping files of different types organized. 




Some menu options, rather than offering access directly to a function, require you to choose a more specific option from a shorter menu that appears beside the main menu. This shorter, more specific menu is called a sub-menu




In some cases when tabs are used, there are divisions made even within a single group of functions on a tab. In these cases, you may see sub-tabs of a tab, which you can access by clicking on the sub-tab, as you would with a normal tab. 




Synonymy is present when two distinct lexical units with different written forms have extremely similar or identical meanings. For example, pinnacle and summit are synonyms. Note that in the case of polysemous lexical units, only one meaning may be shared. For example, the meaning “top, e.g. of a mountain” is common to both lexical units, but the sense of “high-level meeting, e.g. between government representatives” is specific to summit.

Some specialists in lexical semantics believe that there are always subtle differences in the meaning and/or usage of different lexical units, and that it is thus more accurate to speak of quasi-synonyms or near synonyms in most cases. 


tab (1)


In some tools, different functions may be divided into groups, and each group may be displayed separately. To save space, these different groups may be “superimposed” on one another (just as they would be in a binder, with tabs that allow you to turn to the section you want). To see a specific group of functions, you can click on the corresponding tab.  (Be careful not to mix up tab in this sense with the tabs used to separate data in word processor documents.) 


tab (2)


The tab key on the keyboard allows you to indent text on a single line by a pre-set distance (also called a tab). Tabs are often used to space data out on a single line. They are also useful for separating different kinds of data before importing into tables, spreadsheets or databases. In structured documents (e.g., tables, spreadsheets, forms), the tab key also allows you to move your cursor to the next field or cell. (Be careful not to mix up tab in this sense with tabs in programs’ interfaces.)


task bar

barre des tâches

The Windows Task bar that appears at the bottom of the screen displays information about the applications available and/or running in Windows.

The Start button that allows you to access the Start menu appears at the left-hand side, and a clock at the right-hand side. Beside the clock on the right-hand side the task bar may also display icons that represent applications that are running in the background (but which are managed automatically by Windows and require no action from you).  On the left-hand side it displays Windows shortcuts that allow you to open some frequently used programs without going through the Start menu. Finally, in the middle it may display rectangles that represent the various programs that are currently open. 


term banks

banques de données terminologique

Term banks are resources that assist in research on terms used in specialized language, and are particularly useful for example for specialized or technical translation and technical writing. Usually bi- or multilingual, term banks are collections of term records, i.e. highly structured entries in databases that store data about concepts that are important in specialized fields (e.g. terms, their equivalents in other languages, definitions, contexts, sources, and observations). This concept-based structure and specialized orientation differentiates term banks from most electronic dictionaries, which are generally organized by lexical item. Two of the best-known Canadian term banks are TERMIUM® and the Grand dictionnaire terminologique (GDT). A well-known European term bank is IATE (InterActive Terminology for Europe), formerly known as Eurodicautom.


term extractors

dépouilleurs terminologiques; extracteurs de termes

Term extractors are computer tools that analyze texts in electronic format and identify candidate terms (i.e. lexical units that appear likely to be terms in a specific domain). Term extractors use a variety of methods to find candidate terms, all of them calling upon formal criteria such as frequency analyses or structures of word combinations. Note that this automated software does not produce perfect results. Some actual terms contained in the text may be overlooked, while some candidates that are not actually terms may be proposed. Therefore, the output of a term extractor must be verified by a language professional. Examples of term extractors include SDL Trados MultiTerm Extract and TermoStat. Translation environments such as MultiTrans, LogiTerm and Fusion Translate also include term extraction functions.


terminology management systems

gestionnaires de donées terminologiques; systèmes de gestion de données terminologiques

Terminology management systems (TMSs) are tools similar to generic database management systems, but which are designed specifically to assist translators and other language professionals in storing and managing terminological data (e.g. terms, equivalents, domains, definitions, contexts, and sources). TMSs allow users to create, store, manage and search their own term records for items they feel will be useful in future work. They often suggest or even impose record structures to help users store various kinds of terminological data, and also offer a certain number of search functions to help users find the records they need quickly and easily. TMSs are often a more powerful alternative to more general Office tools (e.g. word processor documents with tables, spreadsheets or databases) for storing terminology. Another advantage is that TMSs can sometimes work in conjunction with an active terminology recognition tool and/or translation memory system as part of a larger translation environment. It is important to note the difference between TMSs (usually used to store personal records or records for a fairly small group of users) and term banks (which are generally organization-wide or even public or commercial products such as TERMIUM® or the GDT). Terminology management systems include SDL Trados MultiTerm and BeeText Term. Translation environments such as MultiTrans, LogiTerm and Fusion Translate also include terminology management functions.


text box

champ de texte

A text box is a field in which you can type text (for example, a filename when saving a document, a comment in a form, or a name or description for a new item you are creating in a program). 


title bar

barre de titre

At the top of a window, you will see a coloured bar (blue by default in most versions of Windows) that generally indicates the name of the tool and often of the file that is open. You can carry out a number of tasks using the title bar: clicking on it will activate a window; clicking and dragging it will allow you to move the window; and double-clicking on it will display the window in full-screen mode, while double-clicking it again will reduce the window to a smaller size. 



barre d'outils

Near the top of many tool windows, toolbars offer access to frequently used commands. These toolbars contain buttons that offer access to available functions identified using icons. By clicking on these buttons, you can activate the functions. You can often choose the toolbars you would like to display in the View menu of a tool. 


translation environments

environnements de traduction

In CERTT, we use the term translation environment (sometimes abbreviated TEnT for translation environment tool) to refer to systems that include a number of different tools for translators in one integrated package. These environments are usually centred around translation memory systems or similar tools, but usually also include bitext aligners, terminology management systems, term extractors, active terminology recognition tools and bilingual and/or multilingual concordancing functions, among other tools. Some even allow for integration of machine translation for some purposes. Translation environments include SDL Trados, Fusion Translate, LogiTerm, MultiTrans, WordFast and OmegaT.


translation memory systems

gestionnaires de mémoires de traduction; systèmes de gestion de mémoires de traduction

Translation memory (TM) systems are designed to save translators time and effort in translating documents that contain repetitions (either internal, within the same text, or external, in other similar documents). TM systems store segments of texts that have been translated, accompanied by their translations, usually in a type of database called a translation memory. (These matched segments, sometimes called translation units, can be created automatically as a user translates, or can be assembled using existing source and target texts matched using a bitext aligner.) Most TM systems link directly to text editors (e.g. word processors), so that they can be used while a translator is working on a translation as he or she normally would. When a translator is working on a text in which a segment is similar or identical to one that has already been translated, the TM system can automatically suggest the previous translation for re-use. The translator can then decide whether this translation is appropriate for use in the new text. If so, he or she can simply insert it in the new translation, or can edit it as necessary and then insert it. If the suggestion is not appropriate, the translator can simply reject it and translate him- or herself. Translation memory (TM) systems should not be confused with machine translation (MT) systems; translation memories allow humans to recycle segments of previous human translations, while MT systems carry out translations automatically and most often rely on humans for editing after the fact. While these two types of tools may be integrated in some translation processes or even in a single translation environment, they function in quite different ways. Translation memory systems form the core of translation environments such as Déjà Vu, Fusion Translate and SDL Trados (specifically the Translator’s Workbench tool). (The latter two tools are installed in the Writing Centre.) Similar tools are also found in the MultiTrans and LogiTerm translation environments. 


web tools

outils sur le Web

The category of Web tools is a general one, including various types of tools useful for language professionals that can be used online. Some of these are difficult to fit into classical tool classes (e.g. Diatopix, a tool that allows a user to do a number of Web searches limited by region simultaneously, and creates a graph to compare the results, which can be helpful in identifying whether a term or expression is more common in one region than another). Others (e.g. the library catalogue, ORBIS) are more general tools that can nevertheless be useful for translators and others in the language industry. In CERTT, this category is distinguished from electronic dictionaries and term banks, which —while also generally available online — form their own classes of tools. 




The main interface of a tool appears in a window. You can manipulate the size and placement of windows using the buttons with the line, square and X icons that appear in the right-hand corner of the title bar, or by placing your cursor on the title bar, side or corner of the window and clicking and dragging to the location you want. 




Windows is a computer operating system, a collection of software programs that manages computer resources and coordinates the operation of programs and the management of files and computer-human interaction. Windows functions covered in CERTT that may be useful to translators include creating and extracting files from compressed (i.e. zipped) folders, and using keyboard shortcuts.




Some tools offer small programs, or wizards, that provide step-by-step instructions for carrying out complex tasks. Generally, wizards appear in dialogue boxes and break down the required actions into small sets, allowing you to move from one to the next as you complete each set of tasks. 




The word word is notoriously problematic in linguistics, and is generally replaced by lexical unit or word form when it is necessary to express a specific meaning in this field. While we use word easily and usually have the impression that its meaning is clear, it does not allow us to precisely describe a number of possible variations in forms and meanings. For example, if several forms of the verb to go exist (e.g. because of inflection), is each one a different word, i.e., are go, goes, going, gone and went different words? Similarly, what about lead (e.g. You can lead a horse to water) and lead (e.g. This is so heavy it must be made of lead)? They have the same form, but are they the same word? What about meanings that are represented by a multi-word form, such as pomme de terre or chemin de fer? Do the three distinct character strings make up a single word, or are they three words?

Because of all of these ambiguities, we generally prefer to use lexical unit to refer to an association of a meaning and a form or set/set of forms (e.g., to go – including the forms go, goes, going, gone, went; pomme de terre – including the formspomme de terre andpommes de terre). We use word form to refer to character strings, e.g. lead that represent one or more lexical units (i.e. lead (v.) and lead (n.)). 


word form

mot-forme; mot graphique

Word forms are character strings that represent a specific lexical unit (or potentially more than one lexical unit) in different types of contexts, e.g. various inflected forms. A single lexical unit may be associated with many word forms: for example, the verb to go can be represented by the word forms go, goes, going, gone and went. Similarly, some distinct lexical units may have some word forms in common, e.g. in the case of homography and part-of-speech ambiguity.

It may be difficult for some computer applications to identify the links between different word forms that correspond to a single lexical unit. Two techniques that can help deal with this problem are lemmatization and stemming


word processors

traitements de texte

Word processors are software programs that help users to enter, edit, format and save text documents. Most also offer additional functions that can help translators, writers and revisers to compare or revise documents, to save information in different formats or layouts (e.g. as tables), and to convert files into different file formats. Translation memory systems and other translation tools often interact with word processors to provide assistance to translators directly in the word processor environment.