Network Analysis

History of Network Analysis in the Humanities

Rests mostly in social/political science and bibliometrics. Scott Weingart discusses in the Historian's Macroscope: http://www.themacroscope.org/?page_id=308

Notable examples

Erikson and Bearman tracing the rise of illicit shipping and global networks in the British East India Company: (2006) "Malfeasance and the Foundations for Global Trade in the East India, 1601-1833." American Journal of Sociology, 112(1), 195-230 doi:10.1086/502694

Padgett and Ansell using networks to discuss rise of the Medici family in Renaissance Florence: (1993) "Robust Action and the Rise of the Medici, 1400-1434". American Journal of Sociology, 98(6) 1259-1319

For more recent work, here is Jonathan Goodwin's writeup of his process building graphs of citation networks in various humanities journals (bibliometrics) along with links to the graphs themselves http://www.jgoodwin.net/octopress/blog/2013/09/06/creating-a-chronological-slider/

Ryan Cordell, David Smith, Elizabeth Maddock Dillon ""Infectious Texts: Modeling Text Reuse in Nineteenth-Century Newspapers," from Proceedings of the Workshop on Big Humanities. This project has already built an interactive network graphs that depicts text sharing between antebellum American newspapers: http://www.viraltexts.org/gexf-js-master/index.html

Network Basics:What is a Network?

Binary of Nodes (Vertices, agents) a type of entity that is connected to other entities.

Edges: connections between the nodes (sometimes called arcs)

Concept of networks gives us a framework for wider context, connections of history etc.

Nodes can have particular attributes or categories attached to them as can edges and these are represented in varying thickness, colors, sizes etc.

Because there’s no axis, the space between nodes is not meaningful, only the number of edges separating nodes determines “space.” Network graphs often take a “force directed” layout so the same visualization can take many different forms but meaning stays constant. “Force directed” is random, but generated so that edges act as springs, nodes are where springs are attached, laid out so that springs have least amount of tension.

Pitfall of basic network graph is lack of change over time, which is not so useful for people working in the humanities. There are a few ways to get around this. The first is to build an “aggregate static network” where all time information is coded and included in one static graph, “this network over 100 years looks like X.” One can trim down data by particular years or periods of time and build the various different graphs they represent and compare across multiple graphs. However, there is a way to include time splicing in graphs. One graph with a time slider is often referred to as a Dynamic Network, nodes that are there, stay there: time slices 1800-1850, 1800-1900, 1800-1950 etc. The Sliding scale, shows increments; network every 10 or 5 years. At some point you can’t make time smaller because the network ceases to exist. You have to aggregate data/time at least somewhat.

Basic principles of networks

Dyad: two nodes connected by an edge. Reciprocated dyads: people talking, sharing, or unreciprocated, a one way relationship of sharing

Triad: useful for modeling transitivity. Or “global clustering coefficient” of the amount of triads, how many are completed triangles.

Edges: directed or undirected edges. Directed network would be like a letter writing network. A can write to B but B doesn’t necessarily write to A. Can show unreciprocated connections.

Undirected networks: all relationships are reciprocal. (Think Facebook, you can't be friends with someone unless they are also friends with you)

Edge Weights: weights can be directed or undirected. Weights can be number of letters, length of phone call etc.

Bipartite Networks: Many network tools (like Gephi) are not really equipped to deal with these. It’s a network with 2 types of nodes. IE Books and authors. 1-4 authors, A-K books

You CAN visualize these networks on basic network tools, but what you get out of it isn’t perfectly clear. Most often when using these tools you will want to build unipartite graphs (all of the nodes represent the same thing: facebook friends, places, letter writers)

What does basic data tell us? Numbers of edges and nodes. Density: if you connected every node to every other node you would have X number of connections. How many connections do you actually have (no. of edges)= Y. Density=Y/X.

Network Diameter: what is the longest connected/edged path through this network? Avg path: how long on average does it take to get from one random node to any other random node? (think 6 degrees of Kevin Bacon)

Degree Centrality: number of edges each node is connected to.

Betweeness Centrality: How many shortest paths does a particular node sit on?

Closeness Centrality: How close is any node to any other node? Think, “who do you have to tell to spread info the farthest the fastest?”

Local clustering: how many of my friends are friends (connected) with each other. This goes hand in hand with modularity, think about "communities" or "groups of friends"

Modularity: groups, shared edges ,automated, algorithmic ways to detect communities.

Degree Distribution: what does the network look like as a whole based on their connections with one another.

Preferential attachment: if you have a lot of connections, you’re more able/likely to get more connections. Most social networks (Republic of Letters) you have a few people with lots of connections holding the network together, most people don’t have that many

Information Flow can vary: consider bibliometrics vs. history. Bibliometrics information is flowing from cited paper to the citing paper, hence the edge would be directed that way. A historian may be more interested in the opposite.

Hive plot is a network graph that is spatially oriented

Every network is representable by a matrix.

Building a data set: list of nodes and list of edges: add attributes to the nodes (age) attribute edges (weight)

Network Analysis Tools

NodeXL: Developed by a team of information scientists, including scholars from Cornell University, NodeXL is a simple, free, and easy to use network analysis tool that can use data directly from an Excel spreadsheet. However, it's analytical and visualization capabilities are not quite that of other tools.

Gephi: arguably the best all around network analysis tool for humanists, Gephi works mainly by importing CSV files through its "data tables" tab but there is a readily available plugin for importing Excel spreadsheets as well (although this plugin can be a bit buggy and temperamental). Gephi has excellent visualization capabilities along with many different algorithmic analysis tools. Gephi is downloadable for free.

Other more advanced tools include: iGraph, Cytoscape, NWB, Pajek, UCINet

Using Gephi

When jumping into Gephi, I've found the most productive thing was to first use easily accessible and Gephi readable data that you are already familiar with. Network graphs are not exactly the easiest things to make sense of, more often than not they are big amorphous blobs of "spaghetti and meatballs" that don't always tell a meaningful story at first sight. Thus, it's important to be familiar (at least somewhat) with the data you're using. So I recommend, if you have a Facebook, downloading the data of your Facebook friend network. First go to the Facebook search bar and type in "netviz." Agree to the terms and then click "personal network" This should start a download of your facebook network in a gdf file. Then open up Gephi, go to fille>open and select your newly downloaded file.It should appear as a large blob of nodes and edges in your overview screen.

The first thing we'll do to make sense out of this graph is adjust the layout. Go to the bottom left hand corner of your screen where there is a dropdown layout menu and select "Force Atlas 2" and then click "Run." The graph will continue to expand until you stop it, so once it spreads out enough to become manageable, click "stop."

Next, you'll want to go to the right hand pane of the screen under Statistics>Network Overview. Here are various algorithms you can run on your data to make more sense out of them. The first one we'll want is to run "modularity." This algorithmically detects communities in your data. For your facebook friends, you might have a community of family: ie. various family members who are all friends with each other but aren't connected to your other friends. After you run the modularity algorithm, head over to the upper left hand corner of your graph and select partition. Hit the refresh button on the side of the dropdown menu and then, in the dropdown menu select "Modularity Class" and click "Apply." Your graph should now be color coded according to the various communities within your Facebook friends: ie. College friends might be red, family might be blue. etc.

Page tree

Network Analysis