03. Bayesian Models & Probabilistic Thinking

There are two general perspectives to view probabilities in use in statistics today. The most common is the frequentist model, and the other is the older, but less familiar Bayesian model. Most statistical questions can be pursued with either model, but there are significant differences when they are applied to AI or machine learning. Frequentist statistics tend to analyze data from the past to determine relationships of correlation or causality in the observed data. Bayesian methods combine your prior beliefs about these relationships (in advance of seeing the data or running the experiment) with the actually observed data to produce an “updated” set of posterior beliefs in the form of probabilities. To refine or continually update the model, the latest posterior beliefs become the current prior beliefs. The frequentist model attempts to produce relatively conclusive findings from empirical data. The Bayesian model attempts to update beliefs and probabilities from the empirical data. Bayesian methods are particularly valuable in an AI or machine learning system that is continually updated from sensor data and is continually refining its predictions, like predicting the weather. Here’s a one-sentence explanation of Bayes’ models, “by updating our initial beliefs with objective new information, we get a new and improved belief.”

History of Bayesian Modeling

In 1740’s England, Reverend Thomas Bayes’ scientific interests led him to explore the mathematics of cause-and-effect. He was a defender of Isaac Newton’s new Calculus, and he was invited to join the Royal Society (Britain’s scientific elite!). Bayes was intrigued by the uproar over a non-mathematical essay by David Hume that dared to explore cause-and-effect. It was scandalous because the church taught that God was the cause of the everything, PERIOD. Bayes explored problems like this: if a gambler holds a hand of 4 aces, what is the probability that the deck of card was rigged? As he explored these ideas of probability, he realized that they conflicted with his own religious ideas, so he never published them. Fortunately, his paper on “an imperfect solution of one of the most difficult problems in the in the doctrine of chances” was published after his death by his friend Richard Price. Americans should know of Richard Price because he avidly supported the American Revolution and civil liberties in general. He corresponded with Thomas Jefferson, John Adams, and other American “fathers.” When Yale University awarded two honorary doctorates in 1781, they chose George Washington and Richard Price! But did anyone read and understand Bayes’ masterpiece?

French scientist and mathematician Pierre Simon Laplace can be regarded as the western world’s “Einstein” in the age of Jefferson and Napoleon. In 1775 (American Continental Congress first meets, resulting in American independence) around age 25, he was already publishing his independently-discovered work on chance and probability. Richard Price introduced the Bayes paper to French mathematics, and Laplace instantly incorporated Bayes’ ideas. Laplace perfected this work over 40 years into the Bayesian model we know today, applying it to science from demographics to astronomy. But he found that his model required too many calculations to make it practical in the days before calculators and computers.

In the 1900’s Bayes’ ideas were adopted, sometimes secretly, to solve real problems, but Fisher’s (F-test) competing “frequentist” model became the main kind of statistics for the 20th century. Frequentist statistics is the model that continues to be taught in schools. But there were very interesting Bayesian applications during this time roughly from 1908 to 2008 (examples below drawn from McGrayne, S. B. (2011). The Theory That Would Not Die.):

  • Just before women won the right to vote, Bell Telephone invented a relay technology that would make human telephone operators obsolete. But with women’s new power, they were afraid to eliminate those jobs. They eliminated half of the operators, using Bayes’ methods to calculate call volume probability. AT&T regarded their use of Bayes’ work as a proprietary secret!
  • Bayesian models helped build America’a first employer’s insurance industry around 1915. It was the only way insurers and government regulators could set fair premiums without decades of prior data which they didn’t have. They needed to forecast future probabilities! During that time, the French military used Bayes in their own forecasting.
  • During World War II, Alan Turing and the mathematicians in Britain’s Bletchley Park used Bayesian models in primitive computers to crack the German Enigma code (which had been regarded as impossible to crack). After the war, these mathematicians were forbidden to discuss how Bayes’ rule helped save Britain and defeat Nazi Germany!
  • A 1951 medical paper by Jerome Cornfeld used Bayes’ methods to finally demonstrate the high probability that smoking caused lung cancer. Frequentist correlation studies had explored this for years, but that model was powerless to show causation.
  • During the Cold War of the 1960’s to 80’s, Bayesian models were used to dramatically improve the safety of nuclear weapons and to reduce the probability of nuclear war. They were the only models that could explore the probabilities inherent in different policy choices. All of this work was highly classified, so again, the power of Bayes’ model was hidden.
  • The U.S. Navy has used Bayesian models to recover a nuclear bomb and to locate the wreck of the submarine Scorpion from the depths of the Atlantic Ocean. Bayes’ succeeded where traditional search strategies failed.
  • As business competition has become more global and fast-paced, executives need to make rapid decisions to stay competitive—always with incomplete information! Bayesian models, called “business analytics,” have gotten broad credibility in the corporate world because they can give the executives an array of future options with their likely probabilities.
  • In the 2008 U.S. Presidential election, candidate Barak Obama had a secret weapon, his pollster, Nate Silver. Silver used Bayesian modeling along with other techniques to predict the winning candidate in 49 states, better than any other polling company! Political campaigns now use Bayesian probability along with other techniques to more accurately plan their future strategy and calculate future outcomes.

The 1900’s was the century of the frequentists, and by the end of the century, frequentist models were exclusively taught at nearly every university around the world. What changed? Three major changes have launched Bayesian probabilistic models into the forefront in the 21st century (even though most university statistics departments still are frequentist-focused). The first is that knowledge of Bayesian successes such as those in the section above are recognized for addressing problems at which frequentist models couldn’t address or actually failed. The second is that ubiquitous powerful computing overcame the multiple calculation bottleneck that prevented some theoretical Bayesian models from practical application. The third, and LARGEST, is the refocusing of statistical analysis from analyzing the data from past studies to empowering AI and machine learning future probabilities and predictions across many disciplines from military to business planning. This is NOT to say that Bayesian systems are BETTER than frequentist systems; they are different. The two models have different strengths to analyze different problems. But in our fast-changing culture, Bayes has a strong role and bright future!

Bayesian Systems Today

Bayesian models and networks are in use today in fields from computer science (AI and Machine Learning), neuroscience, biotech and DNA studies, legal issues of evidence, and many more. This model has been used in the field of pattern recognition since the 1950’s, and it is being used whenever predictions based on data must be made. Two major examples are the field of predictive analytics in business, predictions of weather and climate change, and predictions to analyze likely objects and paths in self-driving cars.

Bayesian Models are Built Into Many AI and Machine Learning Systems

  • Now you’ve seen some specific Bayesian approaches. Assuming that most AI and Machine Learning systems are Bayesian, here’s a look ahead to the near-term potential of Al and ML. Watch the Google talk by Peter Diamandis (16-min talk and 30-min conversation) on the expected ways that AI-empowered technologies are and will be changing our lives in dramatic ways (like overall health, elimination of cancer, and rapid growth of literacy). His talk focuses on “exponential technologies” that are built on or somehow involve AI and Machine Learning. NOTE: he’s speaking to an educated, Google audience, so expect that some of his ideas may be unfamiliar or referenced in ways that are new to you (you can Google the mysteries): https://www.youtube.com/watch?v=HJpKxnZ2JeY