07. ML with Python, Jupyter Notebook, COLAB & scikit-learn

The two most popular machine learning languages are Python and R. Both can be used in a bare-bone programming terminal window or in supportive environment for productivity and/or learning. RStudio provides this service for R; we will use Jupyter Notebook or Google Collaboratory (COLAB) environment to support our introduction to Python. While the R language is specialized for data and statistical analysis and prediction, Python is a general-purpose language, in fact, by many estimates, it’s the top computer language used in the world! And like R, it relies on a broad variety of specialized “libraries” customized to support specific programming needs in math, graphics, machine learning, and many, many more.

Installing Python, Jupyter, scikit-learn & common libraries

NOTE: If you are a software/AI/ML developer or a serious student of AI and ML, you’ll pursue this path to install the AI/ML tools on your computer. If you are just interested, beginning to explore, or using a school computer, iPad, or Chromebook, you should move to the section below on Google COLAB.

The world’s most popular system for using Python and R is the Anaconda Distribution. Anaconda will place current versions of Python, Jupyter, scikit-learn, NumPy, TensorFlow, and other needed libraries into an organized software development installation on your Windows, Mac, or Linux computer. It’s way too complex to install on tablet systems like iPads. All of the software is open-source (free to use). Note that Anaconda will take about 5 GB of disk space and will take time to install. Make sure you have a good Internet connection when you install. It is an amazing learning and development system! Here’s the link (link fixed but may take you to MacOS install page. Windows install also available on the site): https://www.anaconda.com/distribution/

Most of the Anaconda installation is buried in your computer; you rarely have to launch the individual components. The common way to launch Python and the other elements is through the Anaconda “launcher.” The Anaconda icon should be easily accessible, and the others should be hidden (for now, at least). The first video below shows how to install Anaconda, launch Jupyter Notebook, and open a new Notebook. You’ll enter your Python code in the Notebook, which can display directions, images, graphs, error messages, and more in an orderly “page.” Below we will use the YouTube video series “Introduction to machine learning with scikit-learn.” Note: the narrator has a relaxed, slow pace of explanation and demonstration. Speed up the video if you like.

Coding with Python in Jupyter Notebook (plan on a LOT of time here)

  • The first video in the series is a 10-min review of concepts introduced and practiced in the previous six sections of Mentaledge/AI/ML. Skip it if you don’t feel you need a review: What is Machine Learning and How Does it Work? https://www.youtube.com/watch?v=elojMnjn4kk
  • The second video shows you how to install the Anaconda framework and to begin working on Python in a Jupyter Notebook. You will find that the narrator uses several terms and code that have been updated since his 2015 video. For example, the “iPython” notebook he refers to has been renamed Jupyter Notebook (but it works the same). You will launch Jupyter Notebook from the Anaconda Navigator opening screen when you launch Anaconda from your own computer. The text and images of his videos have been updated to current terms and code to solve this issue for you. You can view the updated files from the narrator’s GitHub site through the nbviewer system shown in the video. Here’s the link to “Setting up Python for Machine Learning …:https://www.youtube.com/watch?v=IsXXlYVBt1M
  • The GitHub site for all of the Jupyter scripts is: https://github.com/justmarkham/scikit-learn-videos . These scripts have correct, up-to-date Python code for each video.
  • The third video introduces you to working with a real dataset (our old friend, the Iris dataset) with Python and scikit-learn in this clear introductory video: https://www.youtube.com/watch?v=hd1W4CyPX58 . Use the video for the explanations and the GitHub script for updated code.
  • The fourth video TRAINS the ML model and makes predictions from new, unclassified (unlabeled) data. The tutorial compares the result of three different ML models (two k-Nearest-Neighbors models and a regression model. Should they give the same answers? Note how easy it is to switch models in scikit-learn! Remember that scikit-learn has four simple-but-strict requirements for datasets. If your data is set up for one scikit-learn model, you can switch to any model seamlessly! Here’s the link: https://www.youtube.com/watch?v=RlQuVL6-qe8

Introducing: “Great Cautions” of AI/ML: How Randomness Misleads Us

Our study of AI/ML so far has explored a bit of optimism, of data, of coding, of theory, and of education. But since AI and Machine Learning are tools—yes, very powerful tools—they can produce misleading results in several common ways. The first way we will explore is the way we can mistake unexpected random events as evidence for relationships. A simple explanation is that in any large collection of data or events, we should expect to see rare patterns that are the normal result of randomness. Leonard Mlodinov may be a theoretical physicist, but he has co-authored two books with Stephen Hawking and one with Deepak Chopra. At least three of his books were New York Times bestsellers, and he’s been a screenwriter for both MacGyver and Star Trek. One of his bestselling books explains how commonly we mistake rare-but-random events as evidence for real trends or relationships. Enjoy his enlightening 2008 talk at Google (about 30 min): https://www.youtube.com/watch?v=F0sLuRsu1Do

More AI/ML in Education: warning – fasten seat belt!