Where Mallet is useful for thematically searching and interrogating large corpora, Voyant Tools, especially the new standalone server, is more helpful for mid-sized or what we might term "meso" level analysis. You can download the Voyant Stand-Alone server here. Downloading and getting started with Voyant is fairly simple, although it is important to remember that although the server is now running on your computer, the visualization and analysis is still done in the browser. Also, a helpful tip would be to move your Voyant download from your downloads folder and into it's own, permanent directory.

Once your server is up and running, it's time to upload texts for analysis. You can either directly paste text into the blank box on the homepage, leave hyperlinks to texts in the box, or upload texts from your computer. For sustainability and ease of editing, I would suggest the last option. In terms of number of texts, like I previously mentioned, I think a mid-sized corpus is best in order to get the most out of the Voyant analytical toolbox. Too many texts quickly become unwieldy while analyzing only one text leaves many of the features that make Voyant a useful research tool, untouched.

For my examples, I used five volumes of religious tracts, titled Tracts for the Times, published between 1833-1841. Another important tip (that I learned the hard way) is, if you're interested in comparing volumes across a timespan, as I was, upload those volumes in chronological order as it is very difficult to change their order once you are in the analytics screen.

Once you've uploaded your texts and Voyant has processed them, the first thing to do is to click the small "options" button on the upper right hand side of the window labeled "Cirrus." From here, you will most likely want to run the stopwords of whichever language your corpus is in. Voyant's stopword list is very easily editable as well, just click on the "edit stop words" button and add or subtract whichever words you'd like. Finally, make sure you check the box for "Apply Stop Words Globally" so that the stopwords are taken out of every window in Voyant.

Now, you will probably be looking at a complex, possibly intimidating window with 7 different windows inside of it. We will go through each one.

The Corpus Reader

This is the large, main panel in the center of your screen and it is where you can read all of your uploaded text. The corpus reader is all interactive and connected to all of your other windows, so an analysis done in say the "Word Trends" window will automatically highlight the word you are looking at using the scale on the left of the Corpus reader window. The color-heat scale on the left side of the window is color coded by each volume, your first volume might be blue, your second, green, etc. This allows you to bring up text from each volume by clicking on the color instead of trying to navigate through various pages. At the bottom of the window is a search bar where you can manually type in a term to search for it within the corpus. Each instance of this word will be highlighted within the text and once again, your color scale will make navigating to find instances of your word very easy. Sections of each volume that contain a high density of your particular word will be highlighted in a darker, heavier way than others. Sections with little to no instances of your specified word will appear much lighter, sometimes almost white, on the scale. The color scale allows you to easily navigate to the sections of the text you are particularly interested in.

Cirrus


Word clouds generally do not really lend too much in the way of critical, analytical data visualizations, but they can give a quick overview of what is in your corpus. Unlike topic modeling, which takes words, their co-location, and their frequency into account, all Voyant does is count your words. In the word cloud then, the largest words are simply the words that appear most often while the smaller words occur less often (that is not to say that the Voyant word cloud accounts for every word in your corpus. If a word makes it into the word cloud, even if it is small, you can still attribute significant importance to it.) The usefulness of Voyant's word cloud lies not in its helpfulness as a data visualization, but as an interactive way to engage with your corpus as a whole. Clicking on a word in the word cloud will highlight it within your corpus reader, graph its trend within your Word Trends window, and bring up its frequency and trend within each volume in your Words and Documents window.

Words in the Entire Corpus, Word Trends, Words in Documents


Although the word cloud gives a quick and easy way of interacting with your corpus at large, Words in the Entire Corpus is a more effective way to do so. Within the window, it gives you the count and the trend of your word through your entire corpus (not in any specific volume). Checking the box for one word (or two or three or however many you like) graphs each words trends against the others in the Word Trends Window and brings up their individual stats within each document or volume in the Words in Documents window. For instance, I mapped three words in my collection of religious tracts: saint, sacrifice and baptism. The words are all next to each other in terms of overall frequency: saint had 2,711 occurrences, sacrifice had 2,138 and baptism had 1,997. Although the word saint does occur more frequently, the words are, in comparison to the rest of the corpus, relatively close in occurrences. But, when you map the trends, you see an interesting pattern. Over the course of 5 volumes, the word "saint" has a frequency range of 243-850. Baptism has a range of 58-1,499 and sacrifice has a range of 35-1925. On the word trends graph, it's easy to see that "saint" has a consistent presence in every volume, but both sacrifice and baptism only seem to appear strongly in one volume each, having very miniscule representations throughout the rest of the corpus. What this reveals (granted, I am very familiar with this set of tracts) is that while both principles of the eucharistic sacrifice and baptism were important to the Oxford Movement writers (authors of these tracts) they were but aspects of their greater theology and thrust. The constant presence of the word saint reveals the methodology the Oxford Movement employed to articulate its theology, a historical one, based on the lives and writings of the early church saints.

 

My small experiment demonstrates some of the applicable uses and some of the drawbacks to Voyant. What is great about Voyant is being able to see the same statistic (in the case, frequency of saint, baptism and sacrifice) through multiple analytical lenses at once, allowing for comparison across multiple volumes, tracking of frequency within each individual volume, and the broader scope of their frequency within the corpus itself. Taking just one of these statistics, the word cloud or Words in the Entire Corpus, would suggest that these three words (more broadly, concepts) were relatively equal. It is only by looking at that same statistic spread out across five volumes that a different story emerges, and there is where Voyant's strengths lie.

As a corollary to the above, working with one volume of text in Voyant does not take advantage of all of its tools. Analyzing just one of the volumes in the Tracts for the Times might could easily have missed either "Baptism" or "Sacrifice" or, if you had selected only Volume 2, you might believe that the Tractarians placed an extreme emphasis on Baptism.

 

  • No labels