History of Network Analysis in the Humanities

Rests mostly in social/political science and bibliometrics. Scott Weingart discusses in the Historian's Macroscope: http://www.themacroscope.org/?page_id=308

Notable examples

Erikson and Bearman tracing the rise of illicit shipping and global networks in the British East India Company: (2006) "Malfeasance and the Foundations for Global Trade in the East India, 1601-1833." American Journal of Sociology, 112(1), 195-230 doi:10.1086/502694

Padgett and Ansell using networks to discuss rise of the Medici family in Renaissance Florence: (1993) "Robust Action and the Rise of the Medici, 1400-1434". American Journal of Sociology, 98(6) 1259-1319

For more recent work, here is Jonathan Goodwin's writeup of his process building graphs of citation networks in various humanities journals (bibliometrics) along with links to the graphs themselves http://www.jgoodwin.net/octopress/blog/2013/09/06/creating-a-chronological-slider/

Ryan Cordell, David Smith, Elizabeth Maddock Dillon ""Infectious Texts: Modeling Text Reuse in Nineteenth-Century Newspapers," from Proceedings of the Workshop on Big Humanities. This project has already built an interactive network graphs that depicts text sharing between antebellum American newspapers: http://www.viraltexts.org/networks/1836-1860-wMagazines/index.html

Network Basics:What is a Network?

Binary of Nodes (Vertices, agents) a type of entity that is connected to other entities.

Edges: connections between the nodes (sometimes called arcs)

Concept of networks gives us a framework for wider context, connections of history etc.

Nodes can have particular attributes or categories attached to them as can edges and these are represented in varying thickness, colors, sizes etc.

Because there’s no axis, the space between nodes is not meaningful, only the number of edges separating nodes determines “space.” Network graphs often take a “force directed” layout so the same visualization can take many different forms but meaning stays constant. “Force directed” is random, but generated so that edges act as springs, nodes are where springs are attached, laid out so that springs have least amount of tension.

Pitfall of basic network graph is lack of change over time, which is not so useful for people working in the humanities. There are a few ways to get around this. The first is to build an “aggregate static network” where all time information is coded and included in one static graph, “this network over 100 years looks like X.” One can trim down data by particular years or periods of time and build the various different graphs they represent and compare across multiple graphs. However, there is a way to include time splicing in graphs. One graph with a time slider is often referred to as a Dynamic Network, nodes that are there, stay there: time slices 1800-1850, 1800-1900, 1800-1950 etc. The Sliding scale, shows increments; network every 10 or 5 years. At some point you can’t make time smaller because the network ceases to exist. You have to aggregate data/time at least somewhat.

Basic principles of networks

Dyad: two nodes connected by an edge. Reciprocated dyads: people talking, sharing, or unreciprocated, a one way relationship of sharing

Triad: useful for modeling transitivity. Or “global clustering coefficient” of the amount of triads, how many are completed triangles.

Edges: directed or undirected edges. Directed network would be like a letter writing network. A can write to B but B doesn’t necessarily write to A. Can show unreciprocated connections.

Undirected networks: all relationships are reciprocal. (Think Facebook, you can't be friends with someone unless they are also friends with you)

Edge Weights: weights can be directed or undirected. Weights can be number of letters, length of phone call etc.

Bipartite Networks: Many network tools (like Gephi) are not really equipped to deal with these. It’s a network with 2 types of nodes. IE Books and authors. 1-4 authors, A-K books

You CAN visualize these networks on basic network tools, but what you get out of it isn’t perfectly clear. Most often when using these tools you will want to build unipartite graphs (all of the nodes represent the same thing: facebook friends, places, letter writers)

What does basic data tell us? Numbers of edges and nodes. Density: if you connected every node to every other node you would have X number of connections. How many connections do you actually have (no. of edges)= Y. Density=Y/X.

Network Diameter: what is the longest connected/edged path through this network? Avg path: how long on average does it take to get from one random node to any other random node? (think 6 degrees of Kevin Bacon)

Degree Centrality: number of edges each node is connected to.

Betweeness Centrality: How many shortest paths does a particular node sit on?

Closeness Centrality: How close is any node to any other node? Think, “who do you have to tell to spread info the farthest the fastest?”

Local clustering: how many of my friends are friends (connected) with each other. This goes hand in hand with modularity, think about "communities" or "groups of friends"

Modularity: groups, shared edges ,automated, algorithmic ways to detect communities.

Degree Distribution: what does the network look like as a whole based on their connections with one another.

Preferential attachment: if you have a lot of connections, you’re more able/likely to get more connections. Most social networks (Republic of Letters) you have a few people with lots of connections holding the network together, most people don’t have that many

Information Flow can vary: consider bibliometrics vs. history. Bibliometrics information is flowing from cited paper to the citing paper, hence the edge would be directed that way. A historian may be more interested in the opposite.

Hive plot is a network graph that is spatially oriented

Every network is representable by a matrix.

Building a data set: list of nodes and list of edges: add attributes to the nodes (age) attribute edges (weight)

Network Analysis Tools

NodeXL: Developed by a team of information scientists, including scholars from Cornell University, NodeXL is a simple, free, and easy to use network analysis tool that can use data directly from an Excel spreadsheet. However, it's analytical and visualization capabilities are not quite that of other tools.

Gephi: arguably the best all around network analysis tool for humanists, Gephi works mainly by importing CSV files through its "data tables" tab but there is a readily available plugin for importing Excel spreadsheets as well (although this plugin can be a bit buggy and temperamental). Gephi has excellent visualization capabilities along with many different algorithmic analysis tools. Gephi is downloadable for free.

Other more advanced tools include: iGraph, Cytoscape, NWB, Pajek, UCINet

 

  • No labels

1 Comment

  1. user-05e18

    More to come.