Family Ties and Migration: Surnames as Social Networks

The presence of large regional wage gaps that persist over decades has led to a large literature on documenting various potential barriers to the reallocation of labor. This pattern is especially striking within national economies as integrated as the U.S. One much discussed friction is the role of social networks in enabling migration towards higher-wage regions.

However, without a measure of these networks, quantitative analysis of their importance has been limited. We document the use of same surname-race-birthplace groups as proxies for social networks, which we believe constitute the first easily available, population-wide measure of networks. We show that distribution of males who share a given surname predicts both the choice of destination state and the probability of migration itself. We propose that such patterns are consistent with the use of surnames as a proxy for kin-based networks, which reduce the cost of migration.

Beyond explaining overall migration patterns and wage dispersion across space, social networks may also play an important mediating role when specific populations or groups are subjected to oppressive regimes that have the potential to harm long-term economic and well-being outcomes. For example, Carrington et al. (1996) argues that moving costs which decrease with the stock of migrants help explain the Great Migration -- the flow of blacks from the US South to the North in the early-mid 20th century. As an application of our method (of constructing social network based on surname-race-birthplace combinations), we will examine whether the presence of individuals from one’s own network outside of Jim Crow states (in the pre-Jim-Crow era) facilitated the out-migration of black individuals exposed to the Jim Crow regime.

While the potential importance of social networks has been long appreciated, empirical analysis of their effect on migration has been limited to proxies that are either broadly available (race, place of birth) or limited to a specific setting (soldiers from the same unit). Using sample census data, we have already established correlational patterns that show that the use of surnames can fill this gap in population-scale datasets. While networks based on surnames alone have significantly explain historical migration patterns, using a gravity-model we find that those based on surname-race-birthplace combinations have greater than an order of magnitude more explanatory power; as much as distance between two states! We believe careful construction of such network size across full-count multiple censuses and then linking individuals across these censuses will offer the opportunity to explore several interesting questions in intergenerational dynamics, as well as the effects of economic and health shocks (such as the Jim-Crow Regime, the 1918 Influenza Pandemic etc.) on individual and group-level outcomes.

This grant will help us bring this project to scale by performing this analysis on multiple full- count censuses from 1850-1940. We will require a powerful server that can handle multiple full-count US censuses, and potentially require a computer science graduate student for the data cleaning and construction, as well as an economics doctoral student to help with the empirical analysis. We believe this grant will be able to cover the entire data construction, as well as the specific question regarding Black migration and economic progress during the Jim- Crow era that we are interested in. The analysis will then be used to apply for a larger NIH and NSF grant.

Academic Year
2023-2024
Duke Principal Investigator(s)
Primary Funding Agency
NICHD/DRPC Pilot
Award Year