Network Analysis

History of Network Analysis in the Humanities

Rests mostly in social/political science and bibliometrics. Scott Weingart discusses in the Historian's Macroscope: http://www.themacroscope.org/?page_id=308

Notable examples

Erikson and Bearman tracing the rise of illicit shipping and global networks in the British East India Company: (2006) "Malfeasance and the Foundations for Global Trade in the East India, 1601-1833." American Journal of Sociology, 112(1), 195-230 doi:10.1086/502694

Padgett and Ansell using networks to discuss rise of the Medici family in Renaissance Florence: (1993) "Robust Action and the Rise of the Medici, 1400-1434". American Journal of Sociology, 98(6) 1259-1319

For more recent work, here is Jonathan Goodwin's writeup of his process building graphs of citation networks in various humanities journals (bibliometrics) along with links to the graphs themselves http://www.jgoodwin.net/octopress/blog/2013/09/06/creating-a-chronological-slider/

Ryan Cordell, David Smith, Elizabeth Maddock Dillon ""Infectious Texts: Modeling Text Reuse in Nineteenth-Century Newspapers," from Proceedings of the Workshop on Big Humanities. This project has already built an interactive network graphs that depicts text sharing between antebellum American newspapers: http://www.viraltexts.org/gexf-js-master/index.html

Network Basics:What is a Network?

Binary of Nodes (Vertices, agents) a type of entity that is connected to other entities.

Edges: connections between the nodes (sometimes called arcs)

Concept of networks gives us a framework for wider context, connections of history etc.

Nodes can have particular attributes or categories attached to them as can edges and these are represented in varying thickness, colors, sizes etc.

Because there’s no axis, the space between nodes is not meaningful, only the number of edges separating nodes determines “space.” Network graphs often take a “force directed” layout so the same visualization can take many different forms but meaning stays constant. “Force directed” is random, but generated so that edges act as springs, nodes are where springs are attached, laid out so that springs have least amount of tension.

Pitfall of basic network graph is lack of change over time, which is not so useful for people working in the humanities. There are a few ways to get around this. The first is to build an “aggregate static network” where all time information is coded and included in one static graph, “this network over 100 years looks like X.” One can trim down data by particular years or periods of time and build the various different graphs they represent and compare across multiple graphs. However, there is a way to include time splicing in graphs. One graph with a time slider is often referred to as a Dynamic Network, nodes that are there, stay there: time slices 1800-1850, 1800-1900, 1800-1950 etc. The Sliding scale, shows increments; network every 10 or 5 years. At some point you can’t make time smaller because the network ceases to exist. You have to aggregate data/time at least somewhat.

Basic principles of networks

Dyad: two nodes connected by an edge. Reciprocated dyads: people talking, sharing, or unreciprocated, a one way relationship of sharing

Triad: useful for modeling transitivity. Or “global clustering coefficient” of the amount of triads, how many are completed triangles.

Edges: directed or undirected edges. Directed network would be like a letter writing network. A can write to B but B doesn’t necessarily write to A. Can show unreciprocated connections.

Undirected networks: all relationships are reciprocal. (Think Facebook, you can't be friends with someone unless they are also friends with you)

Edge Weights: weights can be directed or undirected. Weights can be number of letters, length of phone call etc.

Bipartite Networks: Many network tools (like Gephi) are not really equipped to deal with these. It’s a network with 2 types of nodes. IE Books and authors. 1-4 authors, A-K books

You CAN visualize these networks on basic network tools, but what you get out of it isn’t perfectly clear. Most often when using these tools you will want to build unipartite graphs (all of the nodes represent the same thing: facebook friends, places, letter writers)

What does basic data tell us? Numbers of edges and nodes. Density: if you connected every node to every other node you would have X number of connections. How many connections do you actually have (no. of edges)= Y. Density=Y/X.

Network Diameter: what is the longest connected/edged path through this network? Avg path: how long on average does it take to get from one random node to any other random node? (think 6 degrees of Kevin Bacon)

Degree Centrality: number of edges each node is connected to.

Betweeness Centrality: How many shortest paths does a particular node sit on?

Closeness Centrality: How close is any node to any other node? Think, “who do you have to tell to spread info the farthest the fastest?”

Local clustering: how many of my friends are friends (connected) with each other. This goes hand in hand with modularity, think about "communities" or "groups of friends"

Modularity: groups, shared edges ,automated, algorithmic ways to detect communities.

Degree Distribution: what does the network look like as a whole based on their connections with one another.

Preferential attachment: if you have a lot of connections, you’re more able/likely to get more connections. Most social networks (Republic of Letters) you have a few people with lots of connections holding the network together, most people don’t have that many

Information Flow can vary: consider bibliometrics vs. history. Bibliometrics information is flowing from cited paper to the citing paper, hence the edge would be directed that way. A historian may be more interested in the opposite.

Hive plot is a network graph that is spatially oriented

Every network is representable by a matrix.

Building a data set: list of nodes and list of edges: add attributes to the nodes (age) attribute edges (weight)

Network Analysis Tools

NodeXL: Developed by a team of information scientists, including scholars from Cornell University, NodeXL is a simple, free, and easy to use network analysis tool that can use data directly from an Excel spreadsheet. However, it's analytical and visualization capabilities are not quite that of other tools.

Gephi: arguably the best all around network analysis tool for humanists, Gephi works mainly by importing CSV files through its "data tables" tab but there is a readily available plugin for importing Excel spreadsheets as well (although this plugin can be a bit buggy and temperamental). Gephi has excellent visualization capabilities along with many different algorithmic analysis tools. Gephi is downloadable for free.

Other more advanced tools include: iGraph, Cytoscape, NWB, Pajek, UCINet

Using Gephi

When jumping into Gephi, I've found the most productive thing was to first use easily accessible and Gephi readable data that you are already familiar with. Network graphs are not exactly the easiest things to make sense of, more often than not they are big amorphous blobs of "spaghetti and meatballs" that don't always tell a meaningful story at first sight. Thus, it's important to be familiar (at least somewhat) with the data you're using. So I recommend, if you have a Facebook, downloading the data of your Facebook friend network. First go to the Facebook search bar and type in "netviz." Agree to the terms and then click "personal network" This should start a download of your facebook network in a gdf file. Then open up Gephi, go to fille>open and select your newly downloaded file.It should appear as a large blob of nodes and edges in your overview screen.

The first thing we'll do to make sense out of this graph is adjust the layout. Go to the bottom left hand corner of your screen where there is a dropdown layout menu and select "Force Atlas 2" and then click "Run." The graph will continue to expand until you stop it, so once it spreads out enough to become manageable, click "stop."

Next, you'll want to go to the right hand pane of the screen under Statistics>Network Overview. Here are various algorithms you can run on your data to make more sense out of them. The first one we'll want is to run "modularity." This algorithmically detects communities in your data. For your facebook friends, you might have a community of family: ie. various family members who are all friends with each other but aren't connected to your other friends. After you run the modularity algorithm, head over to the upper left hand corner of your graph and select partition. Hit the refresh button on the side of the dropdown menu and then, in the dropdown menu select "Modularity Class" and click "Apply." Your graph should now be color coded according to the various communities within your Facebook friends: ie. College friends might be red, family might be blue. etc.

The next helpful algorithm you will want to run is "Average Degree." This measures, on average, how many connections does a node in your network have? After you run it, in the pop up screen you will see all of your nodes and their various number of connections. Next you should go back to the upper left corner and click on "Ranking." Immediately under "Ranking," click on the half red, half white diamond and then proceed to select "In Degree" or "Out Degree" on the dropdown menu below (because Facebook is an undirected network, it doesn't matter if you choose in or out degree) and click "Apply." Now the nodes on the graph should be sized according to the number of connections they have: ie. the friend of yours that has the most friends in common with you, will be the largest node.

Lastly, in the main graph overview window, go to the bottom click the capital "T" icon. Now, when you hover over your graph, the names of your friends will be displaced over their corresponding nodes.

To share your graph, go to the Preview tab and play with the various layouts it allows you. I prefer "Default curved" for edges and nodes. When you're ready, hit the "export SVG/PDF/PNG" button to create an exportable file. We will cover how to share your graphs interactively online in a later section.

Building Data Sets

Basic data sets: when you're first using Gephi, it's helpful to start small, ie a data set that does not have much in the way of extra attributes on your nodes or edges. A simple excel spread sheet with two columns, representing the connections you want to graph can be helpful. To import this, save your excel as a CSV file and then proceed to File>Import Spigot and select "Data Importer." Proceed through the walk through and eventually you will get to a screen that says "Select Agents" and you should identify which columns in your excel sheet you want to graph the connections between. The rest is fairly straightforward. On the "Options" page you will want to check the box "create links between X and Y" or whatever the categories you are connecting are. And finally, after you have clicked "Finish" make sure to un-check the "Create missing nodes" box and select whether or not your network is directed or undirected.

Complex Data Sets: As I'm still figuring much of this out myself as well, this section will continue to be updated as I expand my own Gephi skills.

First things first, there are a number of very useful plugins that I've found helpful. Go to Tools>Plugins to browse and download. The one's I've found most useful and will reference in this tutorial are: GeoLayout, GeoTools, ExportToEarth, Force Atlas 3D, and SigmaExporter.

GeoCoding Data

Importing data to be visualized as a network and overalyed onto a map can be tricky, especially because it is a substantially different method than first covered.

When setting up your data, the first thing you need to do is create a spreadsheet that is your for your "Nodes." For this example, we'll use data collected from the Transatlantic Slave Voyages Database between 1789-1801. Your first column should be labeled "ID" and should contain the names of ALL of your nodes. For the data set in question, our nodes are the places slaves were picked up at, and the places where they were delivered. Ultimately we will want to create a directed network showing the movement of enslaved humans from where they began to the place the voyage ended, but for now, it's important that we list every node (regardless of whether it was a starting point or an end point) in the ID column. The next two columns should be the Latitude and Longitude of each individual node. Once this is completed, go into Gephi and head to the middle tab called "Data Laboratory." In the top left hand corner of your workspace, make sure the "nodes" tab is selected and then move to your right and click "Import Spreadsheet." Find your "nodes" CSV file and upload it. Make sure that you are uploading it as a "nodes table" and not an "edge table," and click Finish. Once your data is in Gephi, you will probably also want to fill in the "Label" column of the Data Laboratory with the names of your places. To do this, simply go to the bottom of the page, select "Copy Data to other Column" and select your "ID" column then select the label column in the pop up box.

Next, you will need to create your edges spreadsheet. The first column of this spreadsheet should be exactly the same as for your nodes sheet. Title the column "ID" and have the complete list of nodes. Then, since we are building a directed graph, make two columns titled "Source" and "Target" (if you are building an undirected graph, the titles aren't necessary). Our "Source" column will be filled with all of the original departure locations of slaves and the "Target" column will be of the corresponding final destinations. Once you've finished your spreadsheet go back to the upper left corner and select edges (your data laboratory should go blank at this point because you haven't yet uploaded any edges) and again select import spreadsheet and import your edges file.

Page tree

Network Analysis