MultiTrans 4.4 Term Extractor Tutorial, Level I
When you create a TextBase in MultiTrans (as described in the MultiTrans TextBase Builder Tutorial, Level I), you have the option of automatically creating a Terminology Extraction file. When this option is selected, MultiTrans automatically creates two lists for each of the languages in the TextBase: one is of word forms and their frequencies in the TextBase (similar to the list created by the WordSmith Tools WordList function as described in the WordSmith Tools WordList Tutorial, Level I); the other is of series of two or more word forms (candidate multi-word terms) that occur twice or more in the TextBase.
To see how the term extractor fits into the translation process in MultiTrans, consult the MultiTrans Work Flow diagram.
You can find out more about MultiTrans by consulting the MultiCorpora website at http://www.multicorpora.com. When you open MultiTrans, you can also click on the MultiTrans Help icon to read the help files providing information on MultiTrans’s different functions.
II. Getting ready
III. Viewing the terminology extraction lists created by MultiTrans
IV. Evaluating the kinds of items that are extracted and analyzing the usefulness of each list
V. Observing the options available in MultiTrans for term extraction
VI. Wrapping up
NOTE 1: If you have not completed the MultiTrans TextBase Builder Tutorial, Level I, download the ready-made database, MultiTrans Bases – Extractor and TermBase Manager and extract its contents to the sub-directory you created above. (For instructions, see Extracting files from a compressed folder.)
NOTE: If you do not see anything on the Terminology sub-tabs, you may have forgotten to check the Create Terminology Extractor File box when you created your TextBase, or to copy all of the files for the TextBase to your sub-directory. If this is the case, you can download the ready-made TextBase as described in the above note to do this exercise, and re-do your TextBase afterwards.
NOTE: If you cannot find these files, you can download the compressed folder MultiTrans TextBase Builder Level I files.
NOTE: To make it easier to compare the various versions of the extraction, you can save the list of candidate-terms as a text-only file (.txt). To do this, right-click somewhere in the list of candidate terms (on the Term Count sub-tab of the Terminology tab) and choose Enregistrer liste au format texte... option from the contextual menu that appears. Choose the directory where you would like to save the new file, which you can then open in Word, another word processor, or even Excel.
VII. Questions for reflection
- As you went through this tutorial, what were your first impressions of the functions and functioning of MultiTrans?
- What could the MultiTrans term extractor help you to do? In what kinds of situations?
- What criteria can be used to evaluate term extractors? (Note that some of them are discussed in the questions you were asked above.)
- What kinds of word forms were identified by the term extractor? Are these the forms that would usually be included on term records? Why or why not? If you wanted to create term records from the results of the extraction, would you change the forms that were found? Why or why not? (You may want to consider this question again later on, in light of the results of the MultiTrans Tutorial: TermBase Agent, Level I.
- Did the MultiTrans term extractor identify any items other than terms that you thought would be interesting for a translator or terminologist to be aware of? If so, what kinds of items? How might they be useful?
- How does MultiTrans compare to other extractors you have seen?
- What do you think of the options MultiTrans offers a user to adjust the extraction process? Do you think they are useful? Why or why not? Try experimenting with these settings and comparing your results.
- What are some of the advantages and disadvantages of using MultiTrans to extract single- and multi-word items (candidate terms)? Compared to a manual approach? Compared to using another tool?
Tutorial created and updated by the CERTT team. (2010-01-29)