Loading...
 

MultiTrans PRISM TextBase Builder Tutorial,

Level I

 


Other text alinger tutorials

Other bilingual concordancer tutorials

Other MultiTrans tutorials


 

 

I. Introduction


 

MultiTrans is a translation environment. The TextBase Builder that is included in this environment can help a user to align and manage large bilingual corpora and prepare them for searching using the TextBase Search module and for use by other modules such as the Translation Agent  and TextBase Agent.

 

To see how the TextBase Builder fits into the translation process in MultiTrans, consult the MultiTrans Work Flow diagram.

 

You can find out more about MultiTrans by consulting the MultiCorpora website at http://www.multicorpora.com. You can also read information about MultiTrans's different functions by selecting MultiTrans Help from the Help menu, once you have opened MultiTrans.

II. Getting ready


 

  1. Save the files you will need for the exercises.
    1. In the Home (H:) drive, create a sub-directory called MultiTrans_TextBase_Builder or another name that you wish. (For instructions, see Creating a sub-directory in Windows.)
    2. Download the compressed folder MultiTrans TextBase Builder Level I files, which contains the files WHO-obesityEN.doc and WHO-obesityFR.doc.
      1. As suggested by their files names, these French and English texts discuss obesity and were issued by the World Health Organization (WHO). One text is the translation of the other, though we do not know which is which.
    3. Extract the files to the sub-directory you created. (For instructions, see Extracting files from a compressed folder in Windows.)
  2. Open MultiTrans.
    1. MultiTrans opens and asks you to enter a username. Enter the name of your choice and click the Start button.
    2. The main page you see is referred to as the Start Screen. It displays icons for various MultiTrans functions (e.g. Start Session, Analysis) and allows you to navigate between them. (See Note 1.)

III. Creating a TextBase


 

  1. From the Start Screen, click the Build TextBase button.
  2. The TextBase Builder opens, displaying the Step 1 - Output Method Selection dialogue box.
    1. Under Build from:, select the radio button that corresponds to the type of files you will use to create the TextBase. Since you will use the Word documents containing the English and French versions of the WHO obesity text, select Unilingual documents.
    2. Ensure that Create new TextBase is selected and, in the field below it, enter TextBase_WHO_yyyymmdd (replacing the series of letters at the end with today's date) as the name of your TextBase. Or use another name that you prefer. The TextBase will be saved by default in the Home (H:) drive (Home (H:) > MultiTrans > TextBases).
    3. Click the Next button.
  3. The Step 2 - Language Selection dialogue box opens.
    1. Ensure that the boxes next to English and French are checked. (See Note 2.) Then click the Next button.
  4. The Step 3 - Input File(s) Selection dialogue box opens.
    1. Click on the browse icon (...) beside the Select source folder field. (See Screenshot 1.) 
    2. Find the sub-directory on the H: drive where you saved the files you downloaded at the beginning of the tutorial, and click the OK button. The left-hand panel now displays the files found in that sub-directory.
    3. Drag and drop the WHO-obesityEN.doc file to the English column. The file name now appears under the English column. (See Note 3.)
    4. Repeat these steps to add the French text to the French column.
    5. Click the Next button. (See Note 4.)
  5. The Step 4 - Validation dialogue box opens.
    1. If MultiTrans has discovered a problem with the format of one or more of the files you identified in Step 3, it will list them under Rejected Documents (Wrong Format). In this case, verify that you are using file formats that are compatible with MultiTrans.
    2. Otherwise, the file locations of each of the texts you have selected will be listed under Accepted Documents. Click the Next button.
  6. The Step 5 - Alignment Agent... dialogue box opens.
    1. Ensure the box next to the language pairing that corresponds to your TextBase texts (in this case, English <-> French) is checked.
    2. From the Source language dropdown list at the bottom of the window, select the desired source language for the TextBase. Since you will be working into English, select French.
    3. Click the Build button. A box appears with a message indicating that you have successfully scheduled your TextBase to be built. Click the OK button.
    4. The Scheduled Operations dialogue box opens, displaying the name of the new TextBase and information about the TextBase. One of several classifications will appear under the Status column: Failed, Pending, In Progress, Finished.
      1. If you see Failed, a problem has occurred, and you will need to try building your TextBase again.
      2. If you see Pending or In Progress, wait a few moments until MultiTrans completes processing your TextBase, and the status classification changes to Finished
      3. When you see Finished, click on Close. You may have to click the Refresh button on the left to update the status. If you want to view this window again, click on Scheduled Operations, from the Start Screen.
  7. The TextBase has been saved, as a TCS file, on your Home (H:) drive (H:\MultiTrans\TextBases). The TextBase can be opened and used with MultiTrans only. 

IV. Creating a TermBase


 

A TermBase stores terminology contained in the texts or related to the subject field. MultiTrans allows you to open a TextBase even if you have not also created an accompanying TermBase. However, if you do not create and open a TermBase, an empty, unnamed and unsaved TermBase will be opened by default when you open the TextBase. Since it is very likely that you will open and use the TextBase and TermBase together, it is advisable that, after you create a TextBase, you also create a TermBase. (See Notes 5 and 6.)

 

  1. From the Start Screen, click the Build TermBase button. The Build TermBase dialogue box opens.
  2. In the Name field, enter TermBase_WHO_yyyymmdd (replacing the series of letters at the end with today's date). Or use another name that you prefer. 
  3. From the dropdown menu In the Languages field, identify the languages that correspond to the texts you used to create the TextBase.
    1. Select English, so the word appears highlighted in blue in the Languages field. Then click on the plus sign (+) to add this language to the box below.
    2. Repeat for French. (See Note 7.)
  4. In the Folder field, make sure the H:\MultiTrans\TermBases folder is listed. Then click on OK. If this folder is not listed, click the browse button and select the H:\MultiTrans\TermBases folder. 
  5. Click the Create button. The empty TermBase may open automatically. If it does, close it by clicking on the X in the upper right-hand corner of the TermBase Editor window. 

V. Opening the TextBase and TermBase


 

  1. From the Start Screen, click the Start Session button. The Start project dialogue box opens.
  2. Identify the source and target languages for this session.
    1. From the Source language dropdown menu, select French. 
    2. From the Target language dropdown menu, select English. (See Note 8.)
  3. Take a look at the the boxes displayed under Selected resources. For the moment, they will be empty because you have not yet selected the TextBase and TermBase that you want to open.
  4. In the lower half of the dialogue box, under TextBases, identify the TextBase you want to open.
    1. The name of the TextBase you created earlier in this tutorial should already be listed. Check the box to the left of it. (If you do not see your TextBase listed in the TextBases box, click the Add... button and find your TextBase. It should be located in the H:\MultiTrans\TextBases folder. Select the file ending in .tcs and click Open.)
    2. The name of the TextBase will now appear in the upper left-hand box, below Selected resources > TextBases.
  5. Under TermBases, in the lower half of the dialogue box, identify the TermBase you want to open.
    1. Click the Browse... button. 
    2. From the Open window that appears, find and select the TermBase you created in this tutorial. (It should be located in the H:\MultiTrans\TermBases folder). Then click Open
    3. The name of the TermBase now appears in both the upper and lower right-hand boxes of the Start project dialogue box.
  6. Click Start. The TextBase (labelled as TextBase Search) and TermBase will both open. You will see only the TextBase.
    1. To view the TermBase, go the View menu and select TermBase, or you can click on the View TermBase button in the Process Bar (on the left-hand pane).
    2. To return to the TextBase, go to the View menu again and select TextBase, or you can click on the View TextBase button in the Process Bar (on the left-hand pane.) 

VI. Getting to know the TextBase interface


 

The TextBase window is divided into three main sections.
  1. The left-hand pane, labelled TextBase, is called the Process Bar and displays icons for six MultiTrans functions, making them easily accessible to you at any time (see Notes 9 and 10): 

    1. Reverse Languages: allows you to switch the source and target languages.

    2. Export: allows you to export the current TextBase to a TMX (Translation Memory eXchange) file format. TMX files are compatible with many other translation environments (e.g. Fusion, LogiTrans, SDL Trados). 

    3. Align Segments: allows you to create an alignment between one or more source and target segments.

    4. Insert in TermBase: allows you to insert a word or series of words from the TextBase into the TermBase.

    5. Show Metadata: allows you to view certain information (e.g. when and by whom the TextBase was created) about the current TextBase and TermBase.

    6. View TermBase:  allows you to view the TermBase.

  2. The right-hand portion of the screen is divided vertically into two panes.
    1. The left-hand pane contains the source text. At the top of this pane, you will see the source language (French), followed by the name of the file: fra-WHO-obesity
    2. The right-hand pane contains the target text. At the top of this pane, you will see the target language (English), followed by the name of the file: eng-WHO-obesity.
    3. You will notice that the text is displayed in different colours. These colours serve to differentiate between segments within each text and do not indicate a relationship between source and target segments. For example, a green segment in the target text will not necessarily be a translation of a green segment in the source text. 
  3. The pane located in between the Process Bar and the panes displaying the source and target texts contains three tabs.
    1. The Search tab allows you to search for terms or other units within the text by entering them into the search field and clicking the Search button.
    2. The TextBases tab is divided into two sub-tabs located at the bottom.
      1. The File List tab shows you the file pairs contained in the TextBase. 
      2. The Alignment tab shows you which source and target segments are aligned with each other. You can click on any aligned segment to view it in the Source and Target panes to the right. The active segments will be highlighted in yellow.
    3. The Terminology tab is also divided into two sub-tabs.
      1. The Word Count tab shows you a list of all word forms in the TextBase and their frequencies. A paperclip icon indicates that the word form is found in the TermBase. For the moment, since the TermBase is empty, none of the words will have paperclips next to it.
      2. The Term Count tab shows you candidate compound terms (identified automatically by MultiTrans) and their frequency in the TextBase. A paperclip icon indicates that the term can be found in the TermBase. By default, this tab will be empty. You will learn how to activate this list in the MultiTrans Term Extractor Tutorial, Level I

VII. Searching for words or phrases in the TextBase


 

MultiTrans's TextBase search function, which is the MultiTrans equivalent of a bilingual concordancer, searches within your base of bitexts. Explore this function.  

  1. In the Search tab, enter maladies chroniques into the text field, and click the Search button or hit Enter on your keyboard.
    1. In the upper portion of Search Results, you will see the character string you have searched and its frequency in the TextBase. Below that, you will see the file(s) and segment(s) in which the string was found in the TextBase. (See Screenshot 2.)
    2. Click on one of the results to highlight the source and target language segments in which the occurrence appears. The segments will be highlighted in yellow, and the searched string will be identified in blue font.
      1. Can you easily identify an equivalent for the maladies chroniques in the target segment?
    3. Click on the Options … button. (See Screenshot 3.)
      1. Which default options are selected under Word matching?
      2. Do two searches for maladie chronique. In the first one, select the Exact search option. In the second, select the Stemmed search option. (After modifying the options, you will have to click the Search button again to update the results.) What are the results in both cases? How do they compare with each other? How do they compare with the search you did of maladies chroniques? Do you find these search options useful? Why or why not? 
      3. Modify the search options by deselecting the All words option, and click OK. Do another search for maladies chroniques. How do the results compare with the initial search you did?
      4. Make other changes to the search options to see how they affect your search results.
  2. The search function within the MultiTrans TextBase allows you search in either the source or target language. You can switch between the two languages by using the radio button to choose Search source or Search target.  (See Note 11.)
    1. Try searching for the equivalent(s) you identified for maladies chroniques. Is this equivalent, or are these equivalents, always used to translate maladies chroniques in this text?

VIII. Viewing and correcting alignments in the TextBase   



  1. From the Alignment tab (under the TextBases tab), click on a few of the aligned segments to see how they are shown on the right side of the TextBase window. (See Screenshot 4.)
  2. If you find misaligned segments when doing a search, you can change the alignment as you go. You will have the best results if you correct these problems in the order in which they appear.
    1. Use your mouse to select this alignment from the Alignment tab (see Note 12), so that the misaligned segments appear highlighted in yellow in the Source and Target panes. 
  3. Delete any problematic alignments. You may want to do this if a sentence appears in one text but not in the other and therefore has no equivalent, or if a complex section of the alignment is incorrect and you would prefer to start fresh.
    1. Highlight the misaligned segments in the Alignment tab.
    2. Choose the Delete Alignment option from the TextBase Search menu in the toolbar. You can also do this by clicking on the Delete alignment icon (see Screenshot 5) or by using the Delete key on your keyboard, or simply by right-clicking on the selected segment and choosing the Delete Alignment option.
  4. Create a new alignment.
    1. Select the source text segment(s) you wish to align, so that the text appears highlighted in yellow.
      1. To highlight only one source segment, go to the Alignment tab and click on the line just above or below the source segment you want.
      2. To select the source segment, hold down the Ctrl key and use the Up or Down arrows on the keyboard to move the yellow highlighting until it arrives at the desired segment in the source text. (You can also right-click on the highlighted segment in the source text and choose the Previous segment or Next segment options to navigate through the segments.) (See Note 13.)
      3. To highlight more than one segment (for instances like this where more than one source segment corresponds to one or more segments in the target text), right-click on the first segment of the source portion you want to align, and select Extend Selection from the menu that appears, or hold down Ctrl+E. The selection will be extended to the end of the next segment.
      4. If MultiTrans has incorrectly aligned more than one source segment with one or more target segments, you can right-click on the source language segment and select Shrink Selection from the menu that appears, or hold down Ctrl+R. This will exclude the last source segment from the alignment.
    2. Select the target text segment(s) you wish to align with the source segment(s) you have just identified, so that this text also appears highlighted in yellow.
      1. Once you have highlighted the source segment(s) you want, one or more segments will automatically be highlighted in the target text. If this selection is correct, skip ahead to the Align the segments section.
      2. To move the highlighting to a different target segment, hold down the Ctrl+Shift keys and use the Up or Down arrows on the keyboard until the highlighting arrives at the desired segment in the target text. (You can also use the options in the contextual menu that appears when you right-click on the target segment.)
      3. To enlarge the target selection, right-click on this target segment and select Extend Selection from the menu that appears, or hold down Ctrl+Shift+E.
      4. To reduce this target selection, right-click on this target language segment and select Shrink Selection from the menu that appears, or hold down Ctrl+Shift+R.
    3. Align the segments.
      1. Once the segments you wish to align with each other are highlighted in the source and target texts, save the alignment by clicking on the Align Segments icon from the upper horizontal menu (see Screenshot 6), by selecting Ctrl+A on the keyboard, or by clicking the Align Segments icon in the Process Bar.
  5. Finish correcting the alignments for in the TextBase.
    1. If you notice that an alignment problem has affected all subsequent aligments in the TextBase, you can fix the problem and then have MultiTrans realign the texts from the newly corrected alignment.
    2. To do this, right-click on the alignment in the TextBases pane and choose Realign until end from the contextual menu.

 

Reminder: keyboard shortcuts

 

The following table summarizes some of the keyboard shortcuts that allow you to navigate within the source and target texts and make changes to segment selections for alignment purposes.

 

Key(s) generally associated with...

source text:

Ctrl

 

target text:

Ctrl + Shift

 

 

 

Navigating within...

source text:

Ctrl + Up/Down arrows

 

target text:

Ctrl + Shift + Up/Down arrows

     
Enlarging (Extending) segments within... source text: Ctrl + E
  target text: Ctrl + Shift + E
     
Reducing (Shrinking) segments within... source text: Ctrl + R
  target text: Ctrl + Shift + R

 

IX. Wrapping up


 

  1. Close your TextBase and TermBase.
    1. From the File menu, select Close Session.
  2. Close MultiTrans.
    1. From the File menu, select Exit.
  3. Make a copy of your files as a backup, or transfer them to another computer.
    1. In the Home (H:) drive, create a sub-directory called WHO_Bases_yyyymmdd, (replacing the series of letters at the end with today's date), or use another name that you prefer. (For instructions, see Creating a sub-directory in Windows.)
    2. Copy your TermBase and paste it in the sub-directory you have just created.
      1. Locate the TermBase in the H:\MultiTrans\TermBases sub-directory.
      2. Copy and paste it by using Ctrl+C and Ctrl+V, or by choosing Copy and Paste from the contextual menu that appears when you right-click on a file or folder.
    3. Copy your TextBase folder and paste it in the sub-directory you have just created. (See Warning 1.)
      1. Locate this folder in the H:\MultiTrans\TextBases sub-directory. It will have the same name you gave your TextBase at the beginning of this tutorial. 
      2. Copy and paste it by using Ctrl+C and Ctrl+V, or by choosing Copy and Paste from the contextual menu that appears when you right-click on a file or folder.
  1. Make a compressed folder that contains the sub-directory created in step 3a. (For instructions, see Creating a compressed folder in Windows.)
  2. Copy this compressed folder to a USB key. Or, if the folder is less than 2 MB, send a copy as an attachment to your email.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

NOTE 1: You will not be able to access the Translation Agent, TextBase Agent, or TermBase Agent from the Start Screen. For information about how to access and use these tools, consult the corresponding tutorials: MultiTrans Translation Agent Tutorial, Level IMultiTrans TextBase Agent Tutorial, Level I; MultiTrans TermBase Agent Tutorial, Level I

 

 

 

 

 

 

 

NOTE 2: If you are using texts in languages other than English and French, click Modify the language list... in the lower left-hand corner of this dialogue box. From the Preferences dialogue box that opens, click on Modify in order to select from the list of available languages.

 

Screenshot 1 coming soon.

 

 

 

NOTE 3: If you add the wrong file to one of the language columns, you can remove it by selecting it and then clicking the arrow pointing to the left.

 

NOTE 4: Although you will only use one pair of texts for this tutorial, MultiTrans allows you to add as many pairs of texts to your TextBase as you like.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

NOTE 5: You can create a TermBase without a TextBase being automatically created.

 

NOTE 6: To learn how to add to and use the TermBase, see the MultiTrans TermBase Manager Tutorial, Level I.
 

 

 

 

 

 

 

 

NOTE 7: If you need to remove a language from the box below the Languages field, click on the language and then on the minus sign (-).

 

 

 

 

 

 

 

 

 

 

NOTE 8: If the default source and target languages are set in the opposite order of what you want, you can quickly switch them by clicking the icon located in between the language dropdown menus.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

NOTE 9: To view or hide the Process Bar, go to the View menu and select Process Bar.

 

 

NOTE 10: The Process Bar displays a different set of MultiTrans functions when you are in the TermBase. To compare, click the View TermBase icon, in the Process Bar of the TextBase Search window, and observe the icons that are displayed. These functions are discussed in more detail in the MultiTrans TermBase Manager Tutorial, Level I.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Screenshot 2 coming soon. 

 

 

 

 

 

 

Screenshot 3 coming soon.

 

 

 

 

 

 

NOTE 11: You can also search in either the source or target text using the Find in function. This function allows you to do searches similar to those that can be done in word processing tools such as Word. From the TextBase Search dropdown menu in the toolbar, select Find in and choose the appropriate language.

 

 

 

Screenshot 4 coming soon.

 

 

 

NOTE 12: You cannot navigate beween segments by using your mouse to click within the source or target text. 

 

Screenshot 5 coming soon.

 

 

 

 

 

 

 

 

NOTE 13: As you navigate within the source text, the target text highlighting will also move. For the time being, ignore what happens in the target text. You will adjust the target selection afterwards. 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Screenshot 6 coming soon.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

WARNING 1: It is very important that you copy everything in your TextBase folder. This includes the sub-directories Content and Indexes, as well as the TextBase file ending in .tcs. All of these files are needed to open a TextBase in MultiTrans.

 

 

 

 

 

 

X. Questions for reflection


 

  • As you went through this tutorial, what were your first impressions of the functions and functioning of MultiTrans?
  • What could MultiTrans help you to do? In what kind of situation?
  • How does the MultiTrans alignment feature compare to the alignment feature in other tools you have used? Did you find it easier or more difficult to work with? Why?
  • What are some of the advantages and disadvantages of using MultiTrans to build and search a parallel (bitext) corpus? How does this compare to building a corpus using WinAlign or Fusion? How does this compare to building a corpus manually?
  • What aspects of the MultiTrans interface did you like? Which did you dislike, or did you find more difficult to use?
  • What kinds of problems did you find in the alignment of the texts? What effect would this have on the usefulness of the corpus and the amount of time and effort needed to use it? Were these problems easy to fix? Why or why not? Can you think of other kinds of problems that might be observed in aligning texts? Do you think MultiTrans would make it easy to correct these? Why or why not?
  • The MultiTrans philosophy involves correcting alignment “on the fly,” i.e., modifying only the parts of the aligned texts that you use, as you use them. How does this compare to approaches using more fixed alignment, in terms of investment of time, requirements in the interface and possible uses? Did you see the effect of these differences as you used MultiTrans in this exercise?

 

Tutorial created and updated by the CERTT team. (2010-01-26)

Tutorial updated for Prism by Trish Van Bolderen. (2012-08-05)