Loading...
 

MultiTrans 4.4 TextBase Builder Tutorial, Level I

 


Other text alinger tutorials

Other bilingual concordancer tutorials

Other MultiTrans tutorials


 

 

I. Introduction


 

MultiTrans is a translation environment. The TextBase Builder that is included in this environment can help a user to align and manage large bilingual corpora and prepare them for searching using the TextBase Search module and for use by other modules such as the Translation Agent  and TextBase Agent.

 

To see how the TextBase Builder fits into the translation process in MultiTrans, consult the MultiTrans Work Flow diagram.

 

You can find out more about MultiTrans by consulting the MultiCorpora website at http://www.multicorpora.com. When you open MultiTrans, you can also click on the MultiTrans Help icon to read the help files providing information on MultiTrans’s different functions.

 

II. Getting ready


 

  1. Save the files you will need for the exercises:
    1. Create a sub-directory of My Documents called MultiTrans_TextBase_Builder (or another name that you wish). (For instructions, see Creating a sub-directory in Windows.)
    2. Download the compressed folder MultiTrans TextBase Builder Level I files, which contains the files WHO-obesityEN.doc and WHO-obesityFR.doc.
    3. Extract the files to the sub-directory you created. (For instructions, see Extracting files from a compressed folder in Windows.)
  2. Open MultiTrans:
    1. MultiTrans opens and asks you to enter a username. Enter the name of your choice and click the Start button.

The main page you see on the screen is referred to as the General References Page and is used to navigate among the various modules of MultiTrans.

The pane along the left-hand side of the window is called the Process Bar. It displays the main functions of MultiTrans, allows access to the modules and is easily accessible at any time. There are four grey tabs that appear on the Barre de fonctions:

Projects: Used to create a new project or open an existing one.  A project allows you to group together certain information and elements of MultiTrans, such as TextBases, TermBases, and client and subject information. For the purpose of this tutorial we will not be using a project.

TextBases: Contains shortcuts to the different functions used in the TextBase module.

TermBases: Contains shortcuts to the different functions used in the TermBase module.

Options: Allows you to change your preferences for various modules of MultiTrans 4. (See Note 1.)

 

III. Creating a TextBase and TermBase


 

  1. From the General References page, select TextBase Builder. (Alternatively, you can access this module by clicking on the TextBases tab on the Process Bar.)
  2. The TextBase Builder opens with the Step 1 - Language Selection dialogue box. Ensure that the boxes beside English and French are checked. (To add or choose a different language, click on the Modify the language list link at the bottom of the dialogue box, add or find your choice in the list of available languages using the Modify button, and click the OK button). Then click the Next button.
  3. The Step 2 - Input File(s) Selection dialogue box is displayed.
    1. Click the Add button.
    2. Under English click on the browse button (see Screenshot 1). The Select English Input File dialogue box opens.
    3. Find the sub-directory where you saved the files you downloaded at the beginning of the tutorial and select WHO-obesityEN.doc. Click the Open button.
    4. Click in the grey box just underneath French and click on the browse button. The Select French Input File dialogue box opens. Now select WHO-obesityFR.doc and click the Open button. (See Note 2.)
  1. Click the Next button.
  2. The Step 3 - Output Method Selection dialogue box is displayed. Select Create new TextBase and click on the Browse button.
  3. The Choose TextBase dialogue box is displayed. Select local TextBase and click the Browsebutton.
  4. Find the sub-folder you created at the beginning of the tutorial (in Step 1a of the Getting Ready section) and double-click on it. Once in the new directory, enter TEXTBase_TextBase_Builder_tutorial_yyyymmdd (replacing the series of letters at the end with today's date) as the name of your TextBase. Or use another name that you prefer. (See Warning 1.)
  1. Click the Open button.
  2. The path for the file you created should appear in the TextBase Path field. Click the OK button on the Choose TextBase dialogue box.
  3. Click the Next button.
  4. The Step 4 - Segmentation dialogue box is displayed. This dialogue box is an important one as this is where you determine how to segment your text.
    1. Under Abbreviations, the Configure Abbreviations... button allows you to see which files are being used to avoid splitting segments due to abbreviations. Since all languages are different there are different files for each language.
    2. Under Carriage return, ensure that the Use carriage returns as segment delimiters (recommended) box is checked.
    3. Click the Next button.
  5. The Step 5 - Alignment Agent dialogue box is displayed. Ensure that the English<->French box is checked and click the Next button.
  6. The Step 6 - Terminology Extractor dialogue box is displayed. Ensure that the box next to Create Terminology Extractor file is checked. This allows for MultiTrans to automatically extract frequent terms of more than one word from your text. Ensure that the boxes next to English and French are checked and click the Next button. (See Warning 2.)
  1. The Step 7 – Click “Build” to Start Process dialogue box is displayed. Click the Build button. After a moment, a dialogue box will appear indicating that the files were successfully imported to a TextBase. Click the OK button.
  2. Click the Open button on the Step 7 dialogue box to open your new TextBase.
  3. The TextBase Search Wizard automatically opens in the Step 1 - TextBase Selection dialogue box. Your new TextBase should be displayed under TextBase. Click the Next button.
  4. The Step 2 - Language Selection dialogue box is displayed. The languages you selected in step ‎2 are selected by default.
    1. Since you will be working from French into English, you will need to ensure that the language direction is appropriate.
    2. Choose French from the dropdown list beside Source Language and choose English from the dropdown list beside Target Language. Click the Next button.
  5. The Step 3 - TermBase Selection dialogue box is displayed. Click on the browse button. The Select TermBase(s) dialogue box is displayed. (See Note 3.)
    1. Click on the New button and the Create New TermBase dialogue box opens.
  1. In the Folder field, click the browse button and select the sub-folder you created at the beginning of this tutorial (in Step 1a of Getting Ready) and click OK. The Folder field should now contain the location of your TermBase. 
  2. In the Name field, give your TermBase a name, such as TERMBase.TextBase.Builder.tutorial_yyyymmdd (replacing the series of letters at the end with today's date). Click the Create button. 
  3. Your new TermBase will appear in the list in the Local TermBases tab. Click the Select button to return to Step 3 of the Assistant, where your new TermBase name should appear. Click the Next button. (See Note 4.)
  1. The Step 4 - "Start" Loading TextBase(s) dialogue box is displayed. Click the Start button. When MultiTrans is finished loading the TextBase, some statistics about the number of words and segments in the texts will appear in the dialogue box. Click the OK button.

The TextBase search window is displayed and you have successfully created a new TextBase.

 

IV. Searching for words or phrases in the TextBase


 

MultiTrans includes a TextBase search function, MultiTrans's equivalent of a bilingual concordancer, which searches within your base of bitexts. Explore this function.  

  1. The TextBase window is divided into three sections (not including the Process Bar, which you can always see on the left unless you have hidden it as described above).
    1. The rightmost portion of the screen is divided vertically in two. The left-hand pane contains the source (French) text, and the right-hand pane contains the target (English) text. You will notice different colours of text; these colours are there simply to differentiate between segments in the texts.
    2. The pane to the left contains three tabs: Search, TextBases and Terminology. The TextBases tab is divided into two sub-tabs at the bottom: File List and Alignment. The Terminology tab is also divided into two sub-tabs at the bottom: Word Count and Term Count. The following table explains the function of each:

 

Search

Enables you to search for terms or other units within the text by entering them in the search field and clicking on the Search button.

TextBases

File List 

Shows you the files contained in the TextBase.

Alignment

Shows you which segments are aligned. You can click on any

aligned segment to view it in the Source and Target panes to the right. The active segments will be highlighted in yellow.

Terminology

 

Word Count

Shows you the list of word forms in the TextBase and their frequencies. A paperclip icon indicates that the form is found in the TermBase.

Term Count 

Shows you automatically identified candidate compound terms and their frequency in the TextBase. A paperclip icon indicates that the term can be found in the TermBase.

 

  1. In the Search tab, enter maladies chroniques into the text field and click the Search button.
    1. Under Search Results you will see the term along with its frequency in the TextBase. Below that you will see the file(s) and segment(s) in which the term was found. (See Screenshot 2.)
    2. Clicking on a filename will highlight the source and text language segments in which the occurrence appears. The segment is highlighted in yellow and the search term is in blue font.
    3. Clicking on the Options … button allows you to change the search options. If you change some of the options and do a new search, what results do you get and how are they different?
  2. In a MultiTrans TextBase, this search function will only allow you to do a search in the source language. Since we have chosen French as the source language in this tutorial, in order to do a search in English, you will need to switch the language direction. (See Note 5.)
    1. Go to the TextBase Search menu and select Language Direction.
    2. Change the Source Language to English by selecting from the dropdown list, and then change the Target Language to French by selecting from the dropdown list.
    3. Click the OK button. You can now search in English.
    4. Try searching for the equivalent(s) you identified for maladies chroniques. Is this equivalent, or are these equivalents, always used to translate maladies chroniques in this text? (See Note 6.)

 

V. Viewing and correcting alignments in the TextBase   


 

  1. On the TextBases > Alignment tab, click on a few of the aligned segments to see how they are shown in the right side of the window. (See Screenshot 3.)
  2. If you find misaligned segments when doing a search, you can change the alignment as you go. Correcting these segments in order gives the best results.
    1. In some cases, you may want to delete one or more of the alignments proposed for a sentence, such as when a sentence appears in one text but not in the other and therefore has no equivalent, or when a complex section of the alignment is incorrect, and you find it easier to start fresh.
      1. To delete the alignment, first use your mouse to select this alignment from the Alignment tab, so that the segments appear highlighted in yellow in the Source and Target panes.
      2. Then choose the Delete Alignment option from the TextBase Search menu. You can also do this by clicking on the Delete alignment icon (see Screenshot 3) or by using the Delete key on your keyboard.
    2. To create a new alignment, select the source text segment you wish to align, so that it appears highlighted in yellow. You can do this by going to the Alignment tab and clicking on the segment you want, or you can hold down the Ctrl key and use the Up or Down arrows on the keyboard to move the yellow highlighting until you get to the desired segment in the source text.
    3. To move the yellow highlighting in the target text, hold down the Ctrl and Shift keys simultaneously and use the Up and Down arrows on the keyboard to get to the segment you wish to align with the yellow highlighted segment in the source text.
    4. If one segment in the source language corresponds to two segments in the target language, you can right-click on the target language segment and select Extend Selection from the menu that appears. That will extend the selection to the end of the Suivant segment. This can also be done using Ctrl+Shift+E.
    5. If MultiTrans has incorrectly aligned two segments in the target text with one segment in the source text, you can right-click on the target language segment and select Shrink Selection from the menu that appears. That will exclude the second segment from the alignment. This can also be done using Ctrl+Shift+R.
    6. Once the segments you wish to align are highlighted in yellow in both the source and target texts, save the alignment by clicking on the Align segment icon (see Screenshot 3) or by selecting Ctrl+A on the keyboard.

 

VI. Saving a TextBase


 

  1. From the File menu, select Save TextBase Search. This will save all your progress up to this point.
  2. The next time you open MultiTrans, you will be able to access your TextBase and TermBase by selecting  TextBase Search from the General References page, which will display the Step 1 Select TextBase dialogue box.
    1. If you do not see your TextBase in the list, click on the browse button, select your TextBase from the Local TextBases tab (in the Select TextBase) dialogue box), and click Open.
    2. Click the Add button to add your TextBase to the list and click the Next button. You can now follow steps ‎16 to ‎19 above to open your TextBase and TermBase.

 

VII. Wrapping up


 

  1. To make a copy of your files as a backup or to transfer them to another computer:
    1. In My Computer, find the sub-directory you created to store the files for this exercise.
    2. Make a compressed folder that contains this sub-directory. (For instructions, see Creating a compressed folder in Windows.)

WARNING: In the folder you created at the beginning of this tutorial (in Step 1a of the Getting Ready section), you will see not only the files you downloaded (your original source and target texts, WHO-obesityEN.doc and WHO-obesityFR.doc), but also the files listed below, which together make up your TextBase and TermBase. You must keep all of these together if you copy or move your TextBase and TermBase.

 

Filename

Description

WHO-obesityEN.inp

Your target text, converted into a format readable by MultiTrans

WHO-obesityFR.inp

Your source text, converted into a format readable by MultiTrans

Filename of TextBase.tcs

Your main TextBase file

Filename of TermBase.TMB3

Your TermBase file

Filename of TextBase.English

Filename of TextBase.English-French.aso

Filename of TextBase.English-French.aso.lgf

Filename of TextBase.French

Filename of TermBase.bk1 (if a backup has been created)

Wordlist.English

Wordlist.French

English.ter

French.ter

bversioninfo

Some additional TextBase files, containing the extracted terminology and other data, as well as “administrative” information.

 

  1. Copy this compressed folder to a USB key, or if it is less than 2 MB, send a copy as an attachment to your e-mail.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

NOTE 1: To view or hide the Process Bar, go to the View menu and select Process Bar.

 

 

 

 

 

 

 

 

 

 

 

Screenshot 1 coming soon.

 

 

 

 

 

NOTE 2: You can add more than one text pair by repeating step ‎3. However, for these exercises you will use just one pair of texts.

 

 

 

 

WARNING 1: It is very important to make sure that you are saving your base in the sub-directory you created for it, and that you keep it separate from other unrelated files. It will be difficult to assemble all the parts of your TextBase if they become mixed up with other files.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

WARNING 2: If you have any Word documents open, you MUST save and close them before clicking Créer in the next step to avoid losing any of your work.

 

 

 

 

 

 

 

 

 

 

NOTE 3: MultiTrans works by linking a TextBase closely with one or more TermBases used to store terminology from the texts in the base (or on related subjects). Every time you create or use a TextBase, MultiTrans will ask you to choose an appropriate TermBase to use as well. For this tutorial, we will simply create an empty TermBase that you will later learn to add to and use in the MultiTrans TermBase Manager Tutorial, Level I.

 

 

NOTE 4: You can also create a new TermBase by selecting Nouvelle TermBase from the General References page or from the TermBases tab on the Process Bar (Barre de fonctions) and following the procedure in step ‎18 above. You can learn more about creating TermBases in the MultiTrans TermBase Manager Tutorial, Level I.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Screenshot 2 coming soon.

 

 

 

 

 

 

NOTE 5: You can also search in either the source or target text using the Find in  (TextBase Search > Find in) function. This function allows you to do searches similar to those that can be done in word processing tools such as Word.

 

NOTE 6: Once you have tried an English search, do not forget to change your language direction back to French-English.

 

 

 

Screenshot 3 coming soon.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

VIII. Questions for reflection


 

  • As you went through this tutorial, what were your first impressions of the functions and functioning of MultiTrans?
  • What could MultiTrans help you to do? In what kind of situation?
  • How does the MultiTrans alignment feature compare to the alignment feature in other tools you have used? Did you find it easier or more difficult to work with? Why?
  • What are some of the advantages and disadvantages of using MultiTrans to build and search a parallel (bitext) corpus? How does this compare to building a corpus using WinAlign or Fusion? How does this compare to building a corpus manually?
  • What aspects of the MultiTrans interface did you like? Which did you dislike, or did you find more difficult to use?
  • What kinds of problems did you find in the alignment of the texts? What effect would this have on the usefulness of the corpus and the amount of time and effort needed to use it? Were these problems easy to fix? Why or why not? Can you think of other kinds of problems that might be observed in aligning texts? Do you think MultiTrans would make it easy to correct these? Why or why not?
  • The MultiTrans philosophy involves correcting alignment “on the fly,” i.e., modifying only the parts of the aligned texts that you use, as you use them. How does this compare to approaches using more fixed alignment, in terms of investment of time, requirements in the interface and possible uses? Did you see the effect of these differences as you used MultiTrans in this exercise?

 

Tutorial created and updated by the CERTT team. (2010-01-26)