In an earlier section, you may have explored the Iris dataset with two different machine learning models or algorithms. The first was Linear Regression and the second was k-Nearest-Neighbors. But there are so many models/algorithms and even more combinations of algorithms; how do you know which ML model to choose? A more important question may be how do the models think? How can we humans understand the method underlying a particular algorithm?
How can we humans understand non-human ways of thinking? A first step is to better understand our own, human, thought processes. Most of our thinking and planning is unconscious; it is truly impossible to really know WHY we believe or desire something. Some machine algorithms like nearest neighbors are simple for us to understand. The machine recommends a book or movie to us because that’s what people like us enjoyed. The recommendations or predictions of other algorithms like neural networks, also called deep learning, are impossible to understand because they use hidden layers just like our own unconscious mind does. But how intelligent is a machine algorithm?
Comparing human and machine understanding. The famous neural network demonstration of machine learning of handwritten digits using the MNIST dataset helps us compare human to machine. The machine learns faster than a human but requires many more examples. A human knows what a number is and knows how a handwritten digit is created. The machine knows none of that, so what has the machine actually learned? A human child knows when she learns that “5” means a certain quantity of things like the fingers on your hand. Does the machine even know that it is learning? There’s a missing idea in our discussion of machine intelligence: consciousness.
Evaluating and Expanding Machine Learning Models
We’ve explored two languages widely used in data science and in introductions to machine learning: R and Python. R is more focused on data analysis and predictive statistics, and it’s a bit easier to learn than Python. Python is a general purpose computer language that has many, many libraries to customize it to specific needs. Part of Python’s complexity lies in its myriad of add-on libraries. Python is also good at developing an entire machine learning system, even one that runs continuously to process streams of data (think sensor data streams in a self-driving car). The next tutorial will review the regression machine model, this time using Python and scikit-Learn. It will expand your experience by continuing to learn how to evaluate your ML model. How well does it actually work on out-of-sample, unlabeled data? The video also introduces two new libraries: pandas and Seaborn. The pandas library is frequently used for Python data analysis and machine learning. Here’s the video link (as before, use the Jupyter Notebook script from Github): Data science pipeline: pandas, seaborn, scikit-learn: https://www.youtube.com/watch?v=3ZWuPVWq7p4
case studies: AI and ML Development by Middle and High School Kids
- 16-yr old Kavya Kopparapu developed “Eyeagnosis” syste, to diagnose diabetic retinopathy: https://spectrum.ieee.org/the-human-os/biomedical/diagnostics/teenage-whiz-kid-invents-an-ai-system-to-diagnose-her-grandfathers-eye-disease
- 17-yr old Brittany Wenger created the “Cloud4Cancer” machine learning system to diagnose breast cancer in a less-invasive, more accurate way. Here’s an interview with Brittany explaining her start in AI/ML (her TED Talk is in the next link): https://ideas.ted.com/brittany-wenger-cancer-research/
- Check out Brittany’s 8-min TED talk on How to Make a Neural Network in Your Bedroom: https://www.youtube.com/watch?v=n-YbJi4EPxc. Here’s Brittany’s other 9-min TED Talk: https://www.youtube.com/watch?v=FiUmjmOKlto
- and here’s her Cloud4Cancer app, which is 99.11% sensitive to detecting malignancy, more accurate than any previous diagnostic system: https://cloud4cancer.appspot.com/