Loading...
 

WordSmith Tools KeyWords Tutorial, Level I

 


Other monolingual concordancer tutorials

Other WordSmith Tools tutorials


 

 

I. Introduction


 

WordSmith Tools, developed by Mike Scott, is a corpus analysis tool that includes three text analysis tools: a monolingual concordancer (Concord) and wordlist extractors (WordList and KeyWords). This tutorial will focus on the basic features of KeyWords.

 

You can learn more about WordSmith Tools by consulting www.lexically.net. From the WordSmith Tools page on the www.lexically.net site, you can also download a demo version of WordSmith Tools which offers the same options as the commercial version, but which limits the number of occurrences that are displayed to 25. 

 

WordSmith Tools can process .html, .xml and .txt files. KeyWords is a program that identifies the "key" words in a text, i.e., the words that are unusually frequent. This type of information can be used to study a genre, a language for special purposes, a writer’s idiolect, etc.

 

In order to establish which frequencies are common and which are especially high, the program needs a reference corpus that will establish the norm against which word frequencies will be compared. It is important to know that KeyWords compares word lists and not raw texts.

 

II. Getting Ready


 

  1. Prepare the files you will need for the exercises:
    1. Create a sub-directory of the U: drive (also called My Documents).
    2. Download the files from the KeyWords Resource Package.
    3. Extract the files to the sub-directory you created.
  2. Open WordSmith Tools.
  3. When WordSmith asks you if you’d like to enable basic functions only, choose No. By choosing No, you will have access to all functions.
    1. The program will open a main window from which the various functions can be accessed. The three principal functions — Concord (C), KeyWords (K) and WordList (W) — are available via buttons. Other functions can be accessed from the Utilities menu.
  4. Open the files in the KeyWords Resource Package and read them.

 

III. Choosing texts


 

  1. Choose the texts for processing (see note):
    1. In the main WordSmith window, go to the File menu and select Choose Texts.
      1. In the Choose texts dialogue box, the files that have been selected for processing are displayed in the right-hand pane, and the directory structure appears in the left-hand pane.
    2. A demo file that comes with the software (a chapter from A Tale of Two Cities by Dickens) appears in the list on the right. Remove this file by selecting it and then pressing the Delete key. If you do not remove this file, your results will include occurrences coming from this text.
    3. From the drop-down menu in the upper left-hand corner of the Choose texts window, select the drive where you have stored the documents that you wish to analyze (U: or My Documents).
    4. You can also choose to see files of only a certain format (e.g. plain text or .txt files) by choosing the corresponding option from the drop-down menu located immediately to the right of the drive menu. (See note.)
    5. In the left-hand pane, browse through the directories on the U: drive to find the files that you wish to process. (Double-click on the yellow folder icons (see screenshot) to open the directories).
    6. Select the file: Environment_EN.txt.
    7. Click on the long vertical button with two arrows that appears between the left-hand pane (the directory structure, labelled Files available) and the right-hand pane (the list of files, labelled Files selected). The files that you have selected should now appear in the right-hand pane. (See note.)
  2. Once you have selected the files you want to use, exit the Choose texts window by clicking on the green check-mark button (see screenshot).

 

IV. Generating and evaluating WordLists


 

  1. Generate a base WordList:
    1. In the WordSmith main window, click on the WordList button. The WordList window appears together with the Getting started dialogue box.
    2. In the Getting Started dialogue box, click on the Make a word list now button. The word list appears in the WordList window.
    3. Save this word list by opening the menu File in the word list window and selecting the Save option (see screenshot). You will then have to select a sub-directory to store the file and give it a name. You now have created a word list for the text Environment_EN.txt.
  2. Generate WordLists to compare with your base Wordlist:
    1. Repeat all of the above steps (Choosing texts and Generating a base WordList) in order to generate a wordlist of all of the texts in the folder Reference Corpus 1. Remember to save the wordlist. (See warning and note.)
    2. Repeat the same steps in order to generate a wordlist of all of the texts in the folder Reference Corpus 2. Remember to save the wordlist. (See warning and note.)
    3. By the end of this section, you will have created and saved three wordlists: the base wordlist (Environment_EN.txt), and two comparison wordlists (Reference Corpus 1 and Reference Corpus 2).
  3. Evaluate the wordlist generated through the Environment_EN.txt text:
    1. Look carefully at this wordlist.
    2. What words have surfaced in the wordlist?
    3. If we set aside articles, prepositions and other grammatical words, what words (or terms) are the most frequent in the text?
    4. Do you consider that these are the most distinctive words in the text?
    5. If you had to choose the key words in the text, would you have selected these?


V. Generating a KeyWord list


 

  1. On the main WordSmith Tools window, open the KeyWords tool by clicking on the same-named button. 
  2. From the KeyWords window, open the File menu, and select the New… option (see screenshot). 
  3. A dialogue box appears, asking you to load two wordlists. Click on the Browse button (see screenshot) of the first text field, locate the wordlist you generated from the text Environment_EN.txt, and double-click on it. The location of this file will now appear in the field.
  4. Click on the Browse button (see screenshot) of the field below, locate the wordlist you generated for the texts in the folder Reference Corpus 1, and double-click on it. The location of this file will now appear in the field. (See warning.)
  5. Once you have located and selected both wordlists, generate the keywords list by clicking on the Make a keyword list now button.
  6. The program automatically generates a keyword list by comparing the frequency of the words in each list and extracting the words that are proportionately more frequent in the text to be analyzed than in the reference corpus. This list appears in the KeyWords window.
    1. Observe this keyword list. Is it different than the wordlist you generated for the Environment_EN.txt text?
    2. Is the list just as long, or is it shorter?
    3. How do the words in this list differ from the words in the initial wordlist?
    4. Do you agree in that they are key words of this text?

 

VI. Displaying Keywords results


 

  1. Take a closer look at the information displayed in the KeyWords window.
  2. Note that the bottom of the KeyWords window is divided into seven tabs: KWs, plot, links, clusters, filenames, notes, source text. To shift between tabs, click on the name of the tab you want to move to.
  3. Click on the KWs tab and observe its content. The information is divided into seven columns.

 

Column

Description

Keyword

list of keywords

Freq.

number of occurrences of each keyword in the source text(s) in which these words are key

%

percentage value of the frequency of the keyword in the source text

R.C. Freq.

number of occurrences of each keyword in the reference corpus (R.C)

R.C. %

percentage value of the frequency of the keyword in the reference corpus

Keyness

statistical calculation that factors in the frequency of a word in each wordlist and limits it with the probability value (p)

P

value used in statistics to indicate the probability of obtaining a wrong results; a high p value indicates chances are high that the word is not a key word

 

  1. Click on the plot tab and observe its contents. The information is divided into six columns.

Column

Description

Keyword

list of keywords

Dispersion

statistical calculation to assess whether a keyword appears evenly throughout the whole text or is highly concentrated in very specific sections. It ranges from 0 to 1, where 0 is a very uneven use of the keyword while 1 is a perfectly even presence of the keyword along the whole text.

Keyness

statistical calculation that factors in the frequency of a word in each wordlist and limits it with the probability value (p)

Links

Links are co-occurrences of key-words within sets of 5 words

Hits

number of occurrences of each keyword in the source text(s) in which these words are key

Plot

graphical representation of where the occurrences of each word appear in the text

 

  1. Click on the links tab, and observe its contents.
    1. This tab shows the number of links, followed by a column labelled as "in" and a percentage. The percentage represents the number of links divided by the total number of occurrences of the word in question (the "in" column number).
  2. Click on the clusters tab, and observe its contents.
    1. This tab shows keywords that appear close to each other in the text. If the two keywords are separated by brackets with one or more dots inside, then the words do not appear side by side.
  3. Click on the filenames tab, and observe its contents.
    1. This tab shows the name of the file and its location on your computer.
  4. Click on the notes tab, and observe its contents.
    1. In this tab, you can write notes about the keyword list.
  5. Click on the source text tab, and observe its contents.
    1. This tab displays the full source text.

 

VII. Viewing and manipulating the results 


 

  1. Clean the raw keyword list of any words that you do not deem key.
    1. Make sure you are on the KWs tab in the KeyWords window. If you are not, return to it by clicking on the KWs tab.
    2. Eliminate an entry by clicking on it and pressing the Del key on your keyboard. The selected word becomes grey and has a strike through it.
    3. Browse through the list with the arrow keys on your keyboard and delete all of the words you do not want in your wordlist. Repeat the previous step for each of the words you want to eliminate from the list. (See note.)
    4. Once you have gone through your entire list (if you are analyzing a short text), or through the most key keywords of the list (if you are working with a corpus or a very long text), and are sure you want to remove the deleted words permanently, go to the Edit menu and select the Zap option (see screenshot). Zapping will cut out the deleted words and re-organize the wordlist according to frequency of the remaining words. (See warning and note.)
  2. Generate a concordance in order to examine the context of the words and evaluate whether the word is worth keeping.
    1. Make sure you are on the KWs tab in the KeyWords window. If you are not, return to it by clicking on the KWs tab.
    2. Select the word you want to generate a concordance with by clicking on it.
    3. Open the Compute menu and select the Concordance option (see screenshot). The software automatically launches a simple search for this word in its Concord tool. For more information on how to use Concord, read the WordSmith Tools Concord Tutorial and Exercise: Level I. (You can also consult WordSmith Tools Concord Tutorial and Exercise: Level II.)
  3. Sort the results.
    1. Resort the results, in increasing or decreasing order of the value in that column, by clicking on the button at the top of each column.
      1. Resort the results alphabetically from A to Z (or, alternatively, from Z to A) by clicking on the Keyword button.
      2. Resort the results from highest to lowest keyness (or vice-versa) by clicking on the Keyness button.
    2. Resort the keyword list from the Edit menu. You can resort the results alphabetically, by word ending or by word length.
      1. Make sure you are on the KWs tab in the Keywords window. If you are not, return to it by clicking on the KWs tab.
      2. Invert the alphabetical order of the beginning of the words, which is usually set at A-Z, so the words are listed in Z-A order: open the Edit menu and select the Resort option (see screenshot).
      3. Resort the wordlist alphabetically by word ending: open the Edit menu, select the Other sorts option, and, in the new menu, select the Reverse word option (see screenshot).
      4. Resort the wordlist by word length: open the Edit menu, select the Other sorts option, and, in the new menu, select the Word length option (see screenshot).
  4. Choose the right reference corpus.
    1. KeyWords calculates each word's keyness in relation to a reference corpus. Therefore, what this corpus contains has a direct effect on the keyword list that is presented to us. The KeyWords Resource Package that you downloaded at the beginning of this exercise contains a text on the environmental effects of wind turbines and two reference corpora. Reference Corpus 1, which you have been using until now, contains a series of texts on income taxes and capital gains taxes, while Reference Corpus 2 consists of a series of texts on wind power. In this second corpus, words related to wind turbines and wind power should be much more frequent than in the first one, and the keyword list should therefore differ greatly.

    2. Generate a keyword list with the Environment_EN.txt text wordlist and the Reference Corpus 2 wordlist:

      1. Repeat the steps in the section Generating a Keyword Listsubstituting the Reference Corpus 1 wordlist with the Reference Corpus 2 wordlist.

      2. How do the results differ? Has the program retrieved the same keywords?

      3. Do keywords that appear in both keyword lists have the same keyness value? And p value?

 

VIII. Wrapping up


 

  1. Make a copy of your files as a backup, or transfer them to another computer:
  2. In My Computer or from the Start menu, find the sub-directory you created to store the files for this exercise.
  3. Make a compressed folder that contains this sub-directory.
  4. Copy this compressed folder to a USB key, or, if it is less than 2 MB, send a copy as an attachment to your e-mail.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

NOTE: You can use texts of your own. However, if this is your first time using KeyWords, we highly recommend that you use the sample texts.

 

 

NOTE: The default selection for this menu is the *.* option, which means that all the files located in the directory will be displayed, regardless of their name or extension. If you choose *.txt, for example, only the files with a .txt extension will be displayed, though the asterisk means that these files may have any name.  

 

 

Screenshot coming soon

 

NOTE: If you think that you will want to consult this group of files regularly, you can save the list by clicking on the Save favourites button (see screenshot). You can even add comments about these files in the lower pane of the window.
 
The files can then be added as a block the next time that you use Concord. To do so, open the list using the Get favourites button (see screenshot). Loading the files can take a few minutes.

Screenshot coming soon

 

 

Screenshot coming soon

 

 

 

 

 

 

WARNING: Make sure to remove the Environment_EN.txt file from the Files Selected on the right. Otherwise your new wordlist will contain this text also. To remove the file, select it and click DEL on your keyboard. 

 

NOTE: To add multiple files to the Files selected list when you create these lists, in the Files available list, select the first file, then press the Shift key and hold it down while you select the last file. If you think you will work often with these files, you can save the selection by clicking on the Save Favourites button (see screenshot) and saving the list on the U: drive or My Documents. To retrieve the selection next time you use WordSmith, click on Get Favourites (see screenshot). 

 

 

 

 

 

 

 

 

 

Screenshot coming soon

 

Screenshot coming soon

 

 

Screenshot coming soon

 

WARNING: Make sure that you enter the wordlist of the text you want to analyze in the first field and the reference corpus in the one below. Otherwise, the list generated by the program will be useless. 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

NOTE: If at any time you delete a word by mistake or you change your mind on a deleted word, you can re-insert it to the wordlist by selecting it and pressing the Ins key on your keyboard. 

 

Screenshot coming soon

 

WARNING: Zapping will eliminate the previously deleted words from your keywords list permanently. There is no undo option. 

 

NOTE: This process can also be automated by means of a stop-list. To learn more about it, read the WordSmith WordList Tutorial, Level II

 

Screenshot coming soon

 

 

 

 

 

 

 

 

 

 

Screenshot coming soon

 

 

Screenshot coming soon

 

Screenshot coming soon

 

 

 

 

 

 

 

 

 

 

 

 

 

Screenshot coming soon

 

 

 

IX. Questions for Reflection


  • What are your first impressions of the functions and functioning of WordSmith Tools KeyWords?
  • How do KeyWords results differ from those obtained with WordList?
  • Compared to other corpus analysis tools or term extractors, what are some advantages and disadvantages of KeyWords?
  • Based on what you have seen in terms of how to choose the right corpus, how can the results of KeyWords be refined?

Tutorial created by the CERTT Team (2007-10-07).

Tutorial updated by Trish Van Bolderen (2011-09-06).