Book Review: The Model Thinker – A new way to look at Data Analysis

In this special guest feature, Carol Wells reviews the new book by Scott E. Page entitled “The Model Thinker.”

A hands-on reference for the working data scientist, “The Model Thinker” challenges us to consider that the historical methods we have used for data analysis are no longer adequate given the complexity of today’s world. The book opens by making the case for a new way of using mathematical models to solve problems, offers a close look at a number of the models, then closes with a pair of demonstrations of the method.

Author Scott Page asserts that we are still evaluating today’s far-reaching and abundant data the same way we did twenty-five years ago: That is, we look through our data, look through our models, find the single best fit, and apply.

The problem is that applying one model to a problem gets us only part of the story. A one-model solution has told us, for instance, that our country’s poor health is due to sugar consumption, or that Trump voters in 2016 were those who had been left behind economically. These are valid, but far from complete.

Page proposes a “many-model paradigm,” where we apply several mathematical models to a single problem. The idea is to replicate “the wisdom of the crowd” which, in groups like juries, has shown us that input from many sources tends to be more accurate, complete, and nuanced than input from a single source.

The book emphasizes social data, because, as the author notes, people are a special challenge. You can count on, say, carbon atoms to never violate the laws of physics. People are not so reliable. We have irrational biases. Sometimes we learn from opportunities or mistakes and change our behavior. Sometimes not so much.

Page is a professor of complex systems and quantitative social science at the University of Michigan. He writes in a straightforward fashion, punctuated with bursts of unusual metaphors, such as in the following:

Confronted with a complex system we cannot, to paraphrase Plato, carve the world at its joints. We can partially isolate the major causal trends and then explore how they are interwoven. In doing so, we will find that the data produced by our economic, political, and social systems exhibits coherence. Social data is more than sequences of incomprehensible hairballs that might have been spit up by the family cat.”

In the final chapter, Page demonstrates his method by tackling two real-world issues: the opioid epidemic and economic inequality. By applying the many-model paradigm to income inequality he illuminates many interlocking causes including economic development, sociological trends, political power, and the weight of history.

A single model can track the flow of money among the generations. Another can examine the factors involved in the disparity of the pay of educated and uneducated workers. Yet another can survey the rise of CEO pay relative to the pay of the average worker. This endeavor brings us gradually nearer to a complete picture.

In my own work I have looked at data relating to homelessness, and the many-model paradigm strikes me as potentially very useful. We already know some of the many causes include trauma, domestic violence, and local increases in rent. Shedding even more light on this difficult issue would be a help.

What has given this book a place in my permanent library is its deep dives into dozens of models. Equations and the diagrams are here, but so are applications. Chapter 11tells us that broadcast, diffusion, and contagion models are used in communication, marketing, and epidemiology. These models are equally useful for how people learn new information or how people catch a disease.

As an example, I might think wearing tight jeans is uncomfortable, but as more people wear tight jeans, I may become more likely to wear them as well. Similar logic applies to my chances of becoming involved in a social movement, adopting a new technology, or getting a tattoo.

Finally, Page reminds us in each chapter to ever be on the alert for the dangers inherent in our work. The tight-jeans example had to allow that the probability of adoption per exposure increased with more exposures. A model that had simply used data on past behavior to estimate future behavior would not have worked because people can learn and respond to changes in their environment. At least for me, caveats like these are appreciated and grounding.

The book includes the following chapters:

Chapter 1 – The Many-Model Thinker
Chapter 2 – Why Model?
Chapter 3 – The Science of Many Models
Chapter 4 – Modeling Human Actors
Chapter 5 – Normal Distributions: The Bell Curve
Chapter 6 – Power-Law Distributions: Long Tails
Chapter 7 – Linear Models
Chapter 8 – Concavity and Convexity
Chapter 9 – Models of Value and Power
Chapter 10 – Network Models
Chapter 11 – Broadcast, Diffusion, and Contagion
Chapter 12 – Entropy: Modeling Uncertainty
Chapter 13 – Random Walks
Chapter 14 – Path Dependence
Chapter 15 – Local Interaction Models
Chapter 16 – Lyapunov Functions and Equilibria
Chapter 17 – Markov Models
Chapter 18 – Systems Dynamics Models
Chapter 19 – Threshold Models with Feedbacks
Chapter 20 – Spatial and Hedonic Choice
Chapter 21 – Game Theory Models Times Three
Chapter 22 – Models of Cooperation
Chapter 23 – Collective Action Problems
Chapter 24 – Mechanism Design
Chapter 25 – Signaling Models
Chapter 26 – Models of Learning
Chapter 27 – Multi-Armed Bandit Problems
Chapter 28 – Rugged-Landscape Models
Chapter 29 – Opioids, Inequality, and Humility

Carol Wells is a Data Scientist and freelance writer. She lives in Portland, Oregon.

Sign up for our insideHPC Newsletter