Loading...
 

WordSmith Tools Concord Tutorial, Level II

 


Other monolingual concordancer tutorials

Other WordSmith Tools tutorials


 

 

I. Introduction


 

WordSmith Tools is a corpus analysis tool suite that was developed by Mike Scott. It includes a monolingual concordancer, known as Concord, whose function is to display the occurrences of a user-specified search string (e.g. term, phrase, character string) in its immediate context in a display format known as KWIC (key word in context). Note that the WordSmith Tools suite offers a number of other text analysis tools which can be explored through the WordList and KeyWords tutorials and exercises.

 

You can learn more about WordSmith Tools by consulting www.lexically.net. From the WordSmith Tools page on the www.lexically.net site, you can also download a demo version of WordSmith Tools which offers the same options as the commercial version, but which limits the number of occurrences that are displayed to 25.

 

Concord is compatible with the following file formats: .html, .xml, .txt. It offers a range of search functions, a variety of modes for organizing and displaying results, and statistical features for identifying collocations.

 

II. Getting Ready



  1. Prepare the files you will need for the exercises:
    1. Create a sub-directory called WordSmith_Concord_II (or another name that you wish). (For instructions, see Creating a sub-directory in Windows.)
    2. Download the English .txt files of the Supreme Court of Canada Decisions.
    3. Extract the files in the compressed folder to the same folder. (For instructions, see the tutorial on Extracting files from a compressed folder in Windows.)
  2. Open WordSmith Tools.
  3. When WordSmith asks you if you’d like to enable basic functions only, choose No. By choosing No, you will have access to all functions.
    1. The program will open a main window from which the various functions can be accessed. The three principal functions — Concord (C), KeyWords (K) and WordList (W) — are available via buttons. Other functions can be accessed from the Utilities menu.

 

III. Choosing texts


 

  1. Choose the texts for processing:
    1. In the main WordSmith window, go to the File menu and select Choose Texts.
      1. In the Choose texts dialogue box, the files that have been selected for processing are displayed in the right-hand pane, and the directory structure appears in the left-hand pane.
    2. A demo file that comes with the software (a chapter from A Tale of Two Cities by Dickens) appears in the list on the right. Remove this file by selecting it and then pressing the Delete key. If you do not remove this file, your results will include occurrences coming from this text.
    3. From the drop-down menu in the upper left-hand corner of the Choose texts window, select the drive where you have stored the documents that you wish to analyze (U: or My Documents).
    4. You can also choose to see files of only a certain format (e.g. plain text or .txt files) by choosing the corresponding option from the drop-down menu located immediately to the right of the drive menu. (See note.)
    5. In the left-hand pane, browse through the directories on the U: drive to find the files that you wish to process. (Double-click on the yellow folder icons (see screenshot) to open the directories).
    6. Select all the files.
      1. To select all the files at once, select the first one in the list, hold down the Shift key, and click on the final file in the list.
    7. Click on the long vertical button with two arrows that appears between the left-hand pane (the directory structure, labelled Files available) and the right-hand pane (the list of files, labelled Files selected). The files that you have selected should now appear in the right-hand pane. (See note.)
  2. Once you have selected the files you want to use, exit the Choose texts window by clicking on the green check-mark button (see screenshot).

 

IV. Performing case-sensitive searches


 

  1. Open the Concord tool by clicking the same-named button from the main WordSmith window.
  2. From the File menu, choose New.
  3. In the Getting Started dialogue box that pops up, click on the Search word tab to ensure that it is active. At the top of this tab, you will see a search field, and, in the lower portion of the tab, you will see examples of available search options. (See note.)
  4. Search for a specific character string, either with or without upper-case letters, by entering it into the first field in the Search Word tab.
    1. Try court.
      1. WordSmith will ask you if you would like to sort the data. Choose No. Certain sorting options are discussed in WordSmith Tools Concord Tutorial, Level I and later in this tutorial.
      2. WordSmith will explain the column layout to you. Click on OK.
    2. Try Court.
      1. When you choose New from the Concord File menu, WordSmith will ask you if you would like to start a new Concord window. Choose Yes. This will allow you to compare the results of both searches. The new window may automatically be minimized, so click on the minimized box at the bottom of your screen to begin your new search.
    3. What do you notice about the results of these searches?
    4. Search for ==court==.
    5. Observe the results.
    6. Search for ==Court==.
    7. Assess the results. What differences do you notice between the last two searches? What differences do you notice between these searches and the first two? What do the results suggest about the way units in these texts are processed?
    8. Search for Acts.
    9. Search for ==Acts==.
    10. Are all of the occurrences of Acts relevant to Act? Why or why not?
    11. Can you think of any difficulties associated with this search? Can you think of difficulties associated with similar searches: for example, acts versus ==acts==? Under what conditions?

 

V. Performing wildcard searches


 

  1. Search for a series of strings that have only some of the same characters by entering the common characters in the Search Word field and replacing the different characters with the appropriate wildcard symbol.
    1. Insert a question mark (?) to replace one character.
      1. Do a search for court and courts with only one query.
    2. Insert an asterisk (*) to replace one or more characters.
      1. Do a search to find words beginning with legisl.
      2. What words do you find?
      3. Do a search to find words that include the words appeal and appellant.
      4. What other words do you find? What proportion of the identified words is relevant to your search?
      5. How might this kind of function be useful for querying a corpus?

 

VI. Performing complex searches


 

  1. Combine search criteria with the Boolean operator OR, which allows you to find contexts containing at least one of the searched strings. Enter two character strings in the Search Word field, separating them by a forward slash (/). Click OK to start the search.
    1. Do a search to find both attorney and counsel.
    2. Choose the Centre sort option to group together the respective occurrences of each string. (See note.)
    3. Resort the results in order to highlight recurring patterns among the words that precede or follow the searched strings.
    4. Are the co-occurrents and the structures associated with them the same for each string?
    5. How does this way of sorting highlight information that is useful to the translator or terminologist?
  2. Combine search criteria with a proximity operator, which allows you to find contexts containing two strings located within a certain distance of one another.
    1. Enter the first string into the Search Word field.
      1. Try court.
    2. Enter the second string into the context word(s) & context search horizons field, located in the Advanced tab.
      1. Try appeal. (See note.)
    3. Define the allowable distance between the two strings by specifying the number of words that can separate them, both to the left and to the right of the main string.
      1. Try 3 words to the left and 3 words to the right.
    4. Click OK to start the search. (See note.)
    5. What are the results? How might this type of search be useful to the translator or terminologist?
  3. Combine search criteria with the Boolean operator NOT, which allows you to find contexts containing one string but not another.
    1. Enter into the Search Word field the string that must appear in the results.
    2. Enter the string that must not appear in the results into the Exclude if context contains field in the Advanced tab.
    3. Click OK to start the search.
    4. Do a search for Act, minding the case.
    5. Observe the results.
    6. Repeat the search, this time excluding contexts in which Wildlife appears within a distance of one or more words to the left of Act.
    7. How do the results of the last two searches differ from one another?
    8. Sort the occurrences to highlight the different kinds of Acts.
    9. How can this kind of search be useful to a translator or terminologist?

 

VII. Reminder


 

The table below summarizes the functions available within WordSmith Tools.

 

Function

WordSmith Tools

Exact matching

string1 string2

Case-sensitiveness

==string1==

Wildcard (replacing one character)

string?

Wildcard (replacing one or more characters)

string*

Boolean operator OR

string1/string2

Boolean operator NOT

  • Enter the main string into the Search Word field.
  • Enter the string you want excluded into the  Exclude if context contains field.

Proximity operator

  • Enter string1 into the Search Word field.
  • Enter string2 into the context word(s) & context search horizons field.
  • Specify the number of words to the left and/or to the right of the main string that can separate the two strings.

Adjacency operator

  • Enter string1 into the Search Word field.
  • Enter string2 into the context word(s) & context search horizons field.
  • Specify 1 word to the left and to the right of string1.
 

VIII. Identifying co-occurrents of a searched string


 

  1. Do another search for the string court, in the Concord window. (See note.)
  2. Click on the Clusters tab at the bottom of the Concord window to see recurring sequences of strings, also known as “clusters,” identified in the results. Clusters are listed in order of frequency, starting with the most frequent.
    1. What are some of the more useful or relevant units that have been identified by the tool?
    2. How might the Cluster function be useful to the translator or terminologist?
  3. Adjust the parameters defining how clusters are calculated.
    1. Choose the Clusters option from the Compute menu.
    2. Specify the length of the clusters you would like WordSmith Concord to identify. The recommended length for a cluster is between 2 and 4 words, but you can experiment with fewer or more words.
    3. Specify the minimum frequency with which a cluster must appear in the texts in order for it to be identified.
    4. Specify the horizons for your clusters. The horizons represent the distance separating the main searched string from the cluster.
    5. Click OK to recalculate the clusters for court.
    6. Look at the Related column on the right-hand side of the results window. It displays larger clusters that include the cluster identified in the Cluster column.
  4. Click on the Patterns tab to display the list of words that appear close to the searched string.
    1. How might this information be useful to the translator or terminologist?
  5. Click on the Collocates tab to identify strings that frequently appear close to the searched string.
    1. Collocates of the string are displayed in the left-hand column and are listed in order of frequency, starting with the most frequent. Their position relative to the searched string (in words to the left and to the right) is displayed to the right of each collocate. The most frequent position is indicated in red.
    2. What are some of the more relevant collocates for this string?
    3. Have the results highlighted co-occurrents that you had not yet noticed? (See note.)

 

IX. Wrapping up


 

  1. Make a copy of your files as a backup, or transfer them to another computer:
  1. In My Computer or from the Start menu, find the sub-directory you created to store the files for this exercise.
  2. Make a compressed folder that contains this sub-directory. (For instructions, see Creating a compressed folder in Windows.)
  3. Copy this compressed folder to a USB key, or, if it is less than 2 MB, send a copy as an attachment to your e-mail.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

NOTE: The default selection for this menu is the *.* option, which means that all the files located in the directory will be displayed, regardless of their name or extension. If you choose *.txt, for example, only the files with a .txt extension will be displayed, though the asterisk means that these files may have any name. 

Screenshot coming soon.

 

 

NOTE: If you think that you will want to consult this group of files regularly, you can save the list by clicking on the Save favourites button (see screenshot). You can even add comments about these files in the lower pane of the window.
 
The files can then be added as a block the next time that you use Concord. This can be done by opening the list using the Get favourites button (see screenshot). Loading the files can take a few minutes.

Screenshot coming soon.

 

 

 

 

 

 

NOTE: WordSmith also allows you to conduct a search using a previously prepared file containing a list of search words. However, this option will not be explored in this tutorial. For more information on this option, please consult the WordSmith Help files. 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

NOTE: For a description of the sorting functions and their uses, see WordSmith Tools Concord Tutorial, Level I.

 

 

 

 

 

 

 

 

 

NOTE: If the Advanced tab is not visible, it’s possible that, when opening WordSmith, you selected Yes when you were asked if you wanted to see only the basic functions of the program. To access the Advanced tab, first click on the Cancel button. Then, from the Settings menu, choose the Specific to Concord options and click on the General tab. Uncheck the Keep things simple box and click OK. Restart your search. The Advanced tab should appear now. 

 

NOTE: This proximity search function replaces the Boolean operator AND, which is available through some other tools. However, the proximity search function is more precise since it allows the user to specify not only the presence of two strings in a given context but also their relative position. 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

NOTE: Don’t forget to delete the context words from the Advanced tab in order to generate all occurrences of the searched string. 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

NOTE: You can define the number of words analyzed on each side of the searched string. From the Settings menu, select Specific to Concord, then modify the horizons on the Concord tab. You can recalculate the collocates by clicking on Collocates from the Compute menu. 

 

 

 

X. Questions for Reflection


 

 
  • How did the search and advanced analysis functions compare to manual searches you could do using Concord? Can you accomplish more or less the same tasks manually? Which ones? How?
  • Which option strikes you as:
    • the fastest?
    • the simplest?
    • the most efficient?
  • Why?
  • What challenges can be overcome at least partly thanks to the advanced functions described above? How?
  • What challenges cannot be overcome by using these functions?

 

Tutorial translated by Trish Van Bolderen (2011-06-15).