WordSmith Tools Concord Tutorial, Level I


Other monolingual concordancer tutorials

Other WordSmith Tools tutorials



I. Introduction


WordSmith Tools is a corpus analysis tool suite that was developed by Mike Scott. It includes a monolingual concordancer, known as Concord, whose function is to display the occurrences of a user-specified search string (e.g. term, phrase, character string) in its immediate context in a display format known as KWIC (key word in context). Note that the WordSmith Tools suite offers a number of other text analysis tools which can be explored through the WordList and KeyWords tutorials and exercises.


You can learn more about WordSmith Tools by consulting www.lexically.net. From the WordSmith Tools page on the www.lexically.net site, you can also download a demo version of WordSmith Tools which offers the same options as the commercial version, but which limits the number of occurrences that are displayed to 25.


Concord is compatible with the following file formats: .html, .xml, .txt. It offers a range of search functions, a variety of modes for organizing and displaying results, and statistical features for identifying collocations.


II. Getting Ready


  1. Prepare the files you will need for the exercises:
    1. Create a sub-directory called WordSmith_Concord (or another name that you wish). (For instructions, see Creating a sub-directory in Windows.)
    2. Download English Wind Power manual.
    3. Extract the files in the compressed folder to the same folder. (For instructions, see the tutorial on Extracting files from a compressed folder in Windows.)
  2. Open WordSmith Tools.
  3. When WordSmith asks you if you’d like to enable basic functions only, choose No. By choosing No, you will have access to all functions.
    1. The program will open a main window from which the various functions can be accessed. The three principal functions — Concord (C), KeyWords (K) and WordList (W) — are available via buttons. Other functions can be accessed from the Utilities menu.


III. Choosing texts


  1. Choose the texts for processing:
    1. In the main WordSmith window, go to the File menu and select Choose Texts.
      1. In the Choose texts dialogue box, the files that have been selected for processing are displayed in the right-hand pane, and the directory structure appears in the left-hand pane.
    2. A demo file that comes with the software (a chapter from A Tale of Two Cities by Dickens) appears in the list on the right. Remove this file by selecting it and then pressing the Delete key. If you do not remove this file, your results will include occurrences coming from this text.
    3. From the drop-down menu in the upper left-hand corner of the Choose texts window, select the drive where you have stored the documents that you wish to analyze.
    4. You can also choose to see files of only a certain format (e.g. plain text or .txt files) by choosing the corresponding option from the drop-down menu located immediately to the right of the drive menu. (See note.)
    5. In the left-hand pane, browse through the directories to find the files that you wish to process. (Double-click on the yellow folder icons (see screenshot) to open the directories).
    6. Select all the files. (To select all the files at once, select the first one in the list, hold down the Shift key, and click on the final file in the list).
      1. Click on the long vertical button with two arrows that appears between the left-hand pane (the directory structure, labelled Files available) and the right-hand pane (the list of files, labelled Files selected). The files that you have selected should now appear in the right-hand pane. (See note.)
  2. Once you have selected the files you want to use, exit the Choose texts window by clicking on the green check-mark button (see screenshot).


IV. Generating a concordance


  1. Open the Concord tool by clicking on the same-named button in the main WordSmith window.
  2. From the File menu, choose New.
  3. In the Getting Started dialogue box that pops up, click on the Search Word tab to ensure that it is active. On this tab, you will see a search field at the top, and, in the lower portion of the tab, some examples of different search options that are available. (See note.)
  4. Enter the search string terrain in the search field and then click on OK (see screenshot) to begin the search.
  5. The results will appear in the Concord window. Each occurrence of terrain appears on a separate line with this search string centred and in colour in the middle of the line. The source of each occurrence is displayed to the right in the File column.
  6. Take a look at the statistics provided. WordSmith calculates the position of each occurrence in the sentence, paragraph, section and text, and expresses this information as both a raw value and a percentage. (For example, an occurrence that appears at the very end of a text should be in the position expressed as 100% in the sentence, paragraph, section and text, while an occurrence that came at the very beginning would be in position 0%. However, the raw values will vary according to the number of words in a sentence, paragraph, section or text).
    1. Can you think of any useful information that could be gleaned from these statistics?


V. Displaying the results


  1. Adjust the width of the contexts by positioning the cursor on the dividing line between the columns Concordance and Set (the cursor will change shape to look like a vertical bar with an arrow on either side) and pulling to the right.
  2. Adjust the height of a line by placing the cursor on the dividing line between two lines (i.e. between the line numbers) and pulling down.
  3. Adjust the height of all the lines at once by opening the View menu and choosing the options Grow or Shrink (see screenshot). (Alternatively, you can press the F8 key to increase the line height or Ctrl+F8 to reduce it).
  4. If you prefer to see only the complete sentence in which the search string appears, choose the Sentence only option from the View menu.
  5. See the complete text from which the occurrence has been taken by double-clicking on the line in question to select it. The text is displayed on the source text tab with the occurrence of the search string underlined. Once you have finished consulting the text, click on the concordance tab (at the bottom of the window) to return to the results display.
  6. Classify your results according to categories that you can define. For example, mark those lines that show a potentially interesting collocation for the search string terrain. (See note.)
    1. In the Concord window, position the cursor on a line in the Set column.
    2. Type a letter to represent the category you wish to assign to a given occurrence. (For example, you could type N for cases where the collocate of an occurrence is a noun, an A for cases where it is an adjective, and so on). Once you have classified a given line, the cursor will automatically move on to the next line.
    3. Continue classifying the occurrences by assigning a letter to each line. (In step 7, you will learn how to sort the data according to these classes).
  7. Sort the results:
    1. Go to the Edit menu and choose Resort (see screenshot). (Alternatively, you can press the F6 key.) The occurrences will be automatically sorted in alphabetical order according to the information recorded in the Set column. (Those lines for which no information has been recorded in the Set column will appear first).
    2. Click on Resort again to see the occurrences in reverse alphabetical order.
    3. Could this type of sorting be helpful for studying the usage of a search string? How?
  8. Save the search results in the form of a concordance that can be opened in WordSmith Tools:
    1. From the File menu, choose Save (see screenshot).
    2. In the dialogue box that pops up, choose the drive where you want to save the results and then navigate to the desired folder.
    3. Enter a filename in the field.
    4. Click on the Save button to save the file.
  9. Save the search results in the form of a concordance in plain text format (.txt), which can be viewed and edited in a word processor or imported into a database:
    1. From the File menu, choose Save As…> Plain text (see screenshot).
    2. Click on the yellow folder icon (see screenshot) to choose the drive where you want to save the results and then navigate to the folder that you created for your exercises.
    3. Enter a filename and click on the Open button.
    4. When you are returned to the Save as Plain text dialogue box, click on OK to save the file. (See note.)
  10. Sort results according to other criteria (see note):
    1. From the Edit menu, choose Clear set column. Information previously entered in this column should now disappear.
    2. From the Edit menu, choose Resort. In the Concordance Sort dialogue box that pops up, you can specify up to three different sort criteria to be taken into account (a main or primary sort criterion, a secondary sort criterion and a tertiary sort criterion) by clicking on each of the three tabs. Within each tab, some of the sort criteria that you can choose include:
      1. File, which sorts the occurrences according to the source file in which they appear;
      2. R1, which sorts the occurrences in alphabetical order according to the word which appears one position to the right of the search string (i.e. immediately after the search string).
      3. R2, which sorts by the word two places to the right of the search string, etc.;
      4. L1, which sorts the occurrences in alphabetical order according to the word which appears one position to the left of the search string (i.e. immediately before the search string); 
      5. L2, which sorts by the word two places to the left of the search string, etc.
    3. Check the Case sensitive box to allow case to be taken into account during a sort if desired.
  11. Sort the results to highlight different combinations with and collocations of terrain (see note):
    1. Sort the occurrences according to the word that comes immediately before terrain by choosing L1 and clicking on OK.
      1. What collocate(s) can you identify in the contexts?
      2. To what part(s) of speech categories do they belong?
      3. Do these correspond to the tags that you assigned in the Set column?
      4. In what ways could these collocations be useful to translators or terminologists?
    2. Sort the results according to the word appearing 3 places to the left of terrain. (See note.)
      1. What option do you need to choose to do this sort?
      2. What collocate(s) can you identify in these contexts?
      3. What is/are the part(s) of speech?
      4. Do these observations correspond to the tags that you assigned in the Set column?
      5. In what ways could these observations be useful to translators or terminologists?
    3. Did you identify other relevant collocates of terrain? Can you figure out a way to sort the results in order to highlight these collocates? (See note.)
  12. Remove any irrelevant results from the list:
    1. Click on the line in question in order to select it.
    2. Press the Delete key to “de-activate” the line. (Note that if you change your mind, simply select the line again and press the Insert key to “re-activate” it).
    3. Repeat as necessary to “de-activate” any additional irrelevant results.
    4. From the Edit menu, choose Zap (see screenshot) to delete the “de-activated” lines. (See note.)
  13. Close the concordance without saving it by going to the File menu and choose Close, then click on the No button when the system asks you whether you want to save the file.


VI. Wrapping up


  1. Make a copy of your files as a backup, or transfer them to another computer:
    1. In My Computer or from the Start menu, find the sub-directory you created to store the files for this exercise.
    2. Make a compressed folder that contains this sub-directory. (For instructions, see Creating a compressed folder in Windows.)
    3. Copy this compressed folder to a USB key, or, if it is less than 2 MB, send a copy as an attachment to your e-mail.


















NOTE: The default selection for this menu is the *.* option, which means that all the files located in the directory will be displayed, regardless of their name or extension. If you choose *.txt, for example, only the files with a .txt extension will be displayed, though the asterisk means that these files may have any name. 


Screenshot coming soon.



NOTE: If you think that you will want to consult this group of files regularly, you can save the list by clicking on the Save favourites button (see screenshot). You can even add comments about these files in the lower pane of the window.
The files can then be added as a block the next time that you use Concord. This can be done by opening the list using the Get favourites button (see screenshot). Loading the files can take a few minutes.


Screenshot coming soon.


NOTE:  WordSmith also allows you to conduct a search using a previously prepared file containing a list of search words. However, this option will not be explored in this tutorial. For more information on this option, please consult the WordSmith Help files. 


Screenshot coming soon.















Screenshot coming soon.






NOTE: It could be very interesting, for example, to indicate the parts of speech for a character string that could have multiple possible parts of speech (e.g. power (n) and power (v)), or to indicate different senses for words that have more than one meaning (e.g. pole (of the Earth), pole (of a battery) and pole (long, narrow cylindrical object)), or even simply to distinguish occurrences that are interesting or problematic from those that are not. 




Screenshot coming soon.









Screenshot coming soon.







Screenshot coming soon.


Screenshot coming soon.



NOTE: If you save the file in .txt format, you can specify the length of the context to be extracted by going to the Concord window (Settings > Specific to Concord > Characters to save (per entry) > OK). 


NOTE: As noted above, when information has been added to the Set column, WordSmith uses this information to sort the results. In order to sort using different criteria, it is necessary to delete the information in the Set column. (Note, however, that, since you have already saved a copy of the annotated concordance, you can always refer to it if you want to). 


NOTE: Don’t forget to check the Activated box for criteria in the secondary and tertiary sorts if you want to use these additional criteria for sorting the concordance. 


NOTE: Take note of the collocates that you have identified for this search term. In the follow-up WordSmith Tools Concord Tutorial, Level II, you will learn another way to identify collocates – this time, automatically.


Screenshot coming soon.


NOTE: You can also access the Zap function using the keyboard shortcut Ctrl + Z. You will probably recognize this shortcut which is used by default in Microsoft applications to “undo” a previous action. It is important not to use Ctrl + Z in WordSmith unless you are zapping unwanted lines. 
















VII. Questions for Reflection


  • What are your first impressions of the functions and functioning of WordSmith Tools Concord?
  • What do you think of the interface? The search options? The display options? The processing options?
  • What could Concord help you to do? In what kind of situation?
  • What are some of the advantages and disadvantages of using Concord to search a corpus or text collection? Compared to a manual approach? Compared to using another tool?
  • What are some of the main challenges associated with using a concordancer? Do these challenges impact the way in which you perform searches or analyze results?
  • What criteria can be used to evaluate Concord?
  • How does Concord compare to other monolingual concordancers that you have used?


Tutorial created by the CERTT Team. (2009-07-06)