Loading...
 

MultiTrans PRISM Term Extractor Tutorial,

Level I

 


Other term extractor tutorials

Other MultiTrans tutorials


 

Term extractors are computer tools that analyze texts in electronic format and identify candidate terms (i.e. lexical units that appear likely to be terms in a specific domain). Term extractors use a variety of methods to find candidate terms, all of them calling upon formal criteria such as frequency analyses or structures of word combinations. Note that this automated software does not produce perfect results. Some actual terms contained in the text may be overlooked, while some candidates that are not actually terms may be proposed. Therefore, the output of a term extractor must be verified by a language professional. Examples of term extractors include SDL Trados MultiTerm Extract and TermoStat. Translation environments such as MultiTransLogiTerm and Fusion Translate also include term extraction functions.

 

 

I. Introduction


 

In MultiTrans, you have the option of creating a Terminology Extraction file so that, for each of the TextBase languages, MultiTrans automatically creates a list of series of two or more word forms (candidate multi-word terms) that occur twice or more in the TextBase. Then, when you open a TextBase in MultiTrans (as described in the MultiTrans TextBase Builder Tutorial, Level I), the Term Count tab, which is empty by default, will list these candidate compound terms and their frequency in the TextBase. 

 

To see how the term extractor fits into the translation process in MultiTrans, consult the MultiTrans Work Flow diagram.

 

You can find out more about MultiTrans by consulting the MultiCorpora website at http://www.multicorpora.com. You can also read information about MultiTrans's different functions by selecting MultiTrans Help from the Help menu, once you have opened MultiTrans.

 

II. Getting ready


 

  1. Locate the files you will need for the tutorial.
    1. If you have completed the MultiTrans TextBase Builder Tutorial, Level I, your TextBase and TermBase will be saved in their respective folders on the Home (H:) drive (H:\MultiTrans).
    2. If you have not created a TextBase and TermBase, create a sub-directory of your Home (H:) drive called MultiTrans Term Extraction or another name that you wish. (For instructions, see the file Creating a sub-directory in Windows.)
    3. Download the folder MultiTrans Bases - Extractor and TermBase Manager and extract its contents to the sub-directory you created above. (For instructions, see the file Extracting files from a compressed folder.)
  2. Open MultiTrans.
    1. MultiTrans opens and asks you to enter a username. Enter the name of your choice and click the Start button.
    2. The main page you see is referred to as the Start Screen. It displays icons for various MultiTrans functions (e.g. Start SessionAnalysis) and allows you to navigate between them. (See Note 1.)

III. Activating the terminology extraction


 

  1. From the Start Screen, click on the Manage TMs icon.
  2. In the TM Manager window that appears, check the box to the left of the name of the TextBase you are using for this tutorial, and click the Extract Terminology button. (If your TextBase does not appear in the list, click on the Add... button, locate the TextBase, then click Open. Your TextBase will be located either in the H:\MultiTrans\TextBases folder or the sub-directory you created at the beginning of this tutorial.)
  3. The Extract Terminology window appears, displaying various extraction options.
    1. Beside TextBase name, you will see the name of the TextBase you have just selected.
    2. Maximum length refers to the maximum length for candidate terms extracted. The default is 25 words, but you can specify a different length. For the moment we will leave the default setting.
    3. Checking the Use exclude list checkbox allows you to specify a list of word forms that will be excluded from extraction (e.g. that will not be identified as beginnings or ends of candidate terms). This list contains many common words such as the, and, in, on, etc. Ensure this checkbox is checked.
    4. The languages associated with your TextBase and the candidate-term lists that you will be creating are displayed in the Languages: box. Ensure that English and French appear, and that the checkboxes next to these languages are checked.
  4. Click on Schedule.
  5. The Schedule Operation window that opens allows you to identify when you would like the extraction to occur. If you wanted MultiTrans to perform the extraction later, you would check the checkbox beside Scheduled Execution and specify the date/time for the extraction. For this tutorial, ensure the checkbox beside Immediate Execution is checked, and click OK.
  6. A message appears telling you that the extraction has been successfully scheduled. Click OK.
  7. From the TM Manager window, click Close.

IV. Viewing the terminology extraction list created by MultiTrans


 

  1. Open your TextBase and TermBase. (For instructions, see Opening a TextBase and TermBase in MultiTrans.)
  2. Click on the Terminology tab from the TextBase Search view.
  3. The extraction results are listed on the Term Count sub-tab (at the bottom) (see Note 2). The list shows you candidate compound terms (identified automatically by MultiTrans) and their frequency in the TextBase. A paperclip icon indicates that the term can be found in the TermBase. For the moment, since the TermBase is empty, none of the words will have paperclips next to it.
  4. The Word Count sub-tab shows you the list of word forms in the TextBase and their frequencies. A paperclip icon indicates that the word form is found in the TermBase. For the moment, since the TermBase is empty, none of the words will have paperclips next to it.

V. Evaluating the kinds of items that are extracted and analyzing the usefulness of the list


 

  1. Look through the TextBase and identify five single-word items in the source-language (French) text that you think might be pertinent for terminological or other research.
  2. Look at the Word Count sub-tab of the Terminology tab.
    1. What kinds of units are identified (parts of speech, forms, etc.)? How are they organized?
    2. Click on the headings of the columns. What does this do? How can this help you to find items that interest you?
    3. Do you find all of the single-word items you identified as potentially pertinent for research? Where are they in the lists?
    4. What proportion of the items identified do you think might be useful for doing research? Where are they located in the results?
  3. Look through the TextBase and identify five multi-word items in the source-language (French) text that you think might be pertinent for terminological or other research.
  4. Look at the Term Count sub-tab.
    1. What kinds of units (or series of word forms) are identified (parts of speech, structures, etc.)? How are they organized?
    2. Click on the headings of the columns.What does this do? How can this help you to find items that interest you?
    3. Do you find all of the multi-word items you identified as potentially pertinent for research? Where are they in the lists?
    4. What proportion of the candidate-terms identified do you think might be useful for doing research? Where are they in the lists?
    5. Do you observe challenges in the identification of these multi-word units? What are they?
    6. Do you observe any potential challenges that MultiTrans handled well? How do you think it did so?
  5. From the File menu, select Close session. You will be returned to the MultiTrans Start Screen.

VI. Wrapping up


 

  1. Close your TextBase and TermBase.
    1. From the File menu, select Close Session.
  2. Close MultiTrans.
    1. From the File menu, select Exit.
  3. Make a copy of your files as a backup, or transfer them to another computer.
    1. In the Home (H:) drive, create a sub-directory called WHO_Bases_yyyymmdd, (replacing the series of letters at the end with today's date), or use another name that you prefer. (For instructions, see Creating a sub-directory in Windows.)
    2. Copy your TermBase and paste it in the sub-directory you have just created.
      1. Locate the TermBase in the H:\MultiTrans\TermBases sub-directory.
      2. Copy and paste it by using Ctrl+C and Ctrl+V, or by choosing Copy and Paste from the contextual menu that appears when you right-click on a file or folder.
    3. Copy your TextBase folder and paste it in the sub-directory you have just created. (See Warning 1.)
      1. Locate this folder in the H:\MultiTrans\TextBases sub-directory. It will have the same name you gave your TextBase at the beginning of this tutorial. 
      2. Copy and paste it by using Ctrl+C and Ctrl+V, or by choosing Copy and Paste from the contextual menu that appears when you right-click on a file or folder.
    1. Make a compressed folder that contains the sub-directory created in step 3a. (For instructions, see Creating a compressed folder in Windows.)
    2. Copy this compressed folder to a USB key. Or, if the folder is less than 2 MB, send a copy as an attachment to your email.

 

 

 

 

 

NOTE 1: You will not be able to access the Translation Agent, TextBase Agent, or TermBase Agent from the Start Screen. For information about how to access and use these tools, consult the corresponding tutorials: MultiTrans Translation Agent Tutorial, Level I; MultiTrans TextBase Agent Tutorial, Level I; MultiTrans TermBase Agent Tutorial, Level I.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

NOTE 2: If you are unfamiliar with the TextBase Search view, see the Getting to know the TextBase interface section in the MultiTrans TextBase Builder Tutorial, Level I.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

WARNING 1: It is very important that you copy everything in your TextBase folder. This includes the sub-directories Content and Indexes, as well as the TextBase file ending in .tcs. All of these files are needed to open a TextBase in MultiTrans.
 

  

 

VII. Questions for reflection


 

  • As you went through this tutorial, what were your first impressions of the functions and functioning of MultiTrans? 
  • What could the MultiTrans term extractor help you to do? In what kinds of situations?
  • What criteria can be used to evaluate term extractors? (Note that some of them are discussed in the questions you were asked above.)
  • What kinds of word forms were identified by the term extractor? Are these the forms that would usually be included on term records? Why or why not? If you wanted to create term records from the results of the extraction, would you change the forms that were found? Why or why not? (You may want to consider this question again later on, in light of the results of the MultiTrans Tutorial: TermBase Agent, Level I.
  • Did the MultiTrans term extractor identify any items other than terms that you thought would be interesting for a translator or terminologist to be aware of? If so, what kinds of items? How might they be useful?
  • How does MultiTrans compare to other extractors you have seen?
  • What do you think of the options MultiTrans offers a user to adjust the extraction process? Do you think they are useful? Why or why not? Try experimenting with these settings and comparing your results.
  • What are some of the advantages and disadvantages of using MultiTrans to extract single- and multi-word items (candidate terms)? Compared to a manual approach? Compared to using another tool?

 

Tutorial created and updated by the CERTT team. (2010-01-29)

Tutorial updated for Prism by Trish Van Bolderen. (2012-08-05)