Mental Models of Artificial Intelligence Part 3

Let's dive into the some of the inner workings of getting these models working!

Jessica Ezemba

12/7/20236 min read

a cartoon character with a thought bubble bubble
a cartoon character with a thought bubble bubble

If you haven’t been living under a rock, you have probably heard of the internet. Another issue arises when you take raw information from the internet to build models.

Can you think of it?

In the days before the internet was popular, people believed that anything on the internet was true and credible. In the present time, we would most likely never believe that as fact. Anyone can upload anything on the internet. ANYTHING. It can range from illegal activities to false statements and even bigoted propaganda. At the same time, there are multiple advantages of the internet in that it helps people stay connected together and allows people to more easily access information. Taking data from the internet and using it to train models could include both the good and the bad and it might not be obvious to the machine learning engineers either.

How you know something is good or bad is based on your personal experiences and can vary from individual. It is hard trusting a new blog for factual information but imagine if you had to vet billions of articles. It is almost impossible! This is why things slip through the cracks in the form of biased outputs from the model which favor a certain person’s traits over another. The machine learning engineers did not program the computer to discriminate against women in the hiring process [6] but the model learned from multiple sources over the internet the inequalities that already exist with gender in the hiring process.

Data is what defines a model and data is also what defines the world with all its advantages, disadvantages, biases, inequality, etc. The major difference between a machine learning algorithm making biased choices is that it will never know that it is biased while a human might be aware of their personal biases. This is because the algorithm does not have agency or thought capabilities but only the information it was trained on. To understand how Amazon's algorithm learned that male candidates were more preferable, let’s dive a little deeper into how this AI algorithm can even “see” data.

What’s wrong with web scrappers?

What is Data to computers?

Data types that are commonly used in machine learning algorithms are text, voice/audio, images, 3D models, etc. These data types are easily understood by human intelligence because the human brain has the advantage of easily recognizing things like images and filling in the blanks in data. The human mind does not need an additional breakdown of data because it automatically allows the data to be digested into understandable pieces. For example, reading a document is broken down into words and the overall sentence structure, and looking at images is translated into the different patterns and lines that the brain connects based on past experiences to recognize the image and all of this happens in the split of a second. That is why it is easier for humans to recognize a blurred-out photo of Marlin Monroe. But given an equation to solve it takes a certain subset of humans, very few, that can computationally work out the solution as fast as a computer can.

Comparison between computer and human
Comparison between computer and human

Computers cannot “see”. They do not have vision receptors that work the way our eyes do and can only understand numbers so machine learning scientists have come up with ways to parse these data types into a way that computers can understand. All forms of data are cleverly converted into numbers that can be used to draw patterns by the computer.

Different modalities for AI
Different modalities for AI

As you can see from the image above, computers “see” in numbers. This means any data type format used typically goes through a form of conversion to numbers. Text can be broken down into phonemes where each syllable or character can be changed to a number. Images can be broken down into pixel values where each pixel value represents a tiny section of the picture. For speech/voice, audio waves are parsed to frequencies that are converted to numbers, and videos are broken down into images to be changed to numbers. 3D models are broken down into lines and nodes which can also be represented by numbers.

As you can imagine, there are multiple ways to go about this conversion to numbers for each data type, and a lot of people specialize in them but to understand how computers “see” you need to know that computers only understand numbers. This will be important to know when learning about weights.

Weights? What are weights?

Up until this point, I have referred to the process of getting an output from an input as inferring a pattern or connection. In the case of neural networks, a popular process of moving from an input to an output, this connection is determined by the weights and biases. Weights and biases are numbers that an algorithm assigns to an input-output pair that allows the algorithm to know how important that input affects the output. Okay, let's break that down.

Weights are the deciding factor to let the algorithm know for sure that it is correct. A weight stands between the input and the output and controls what parts of the input should be highly considered and what parts should be ignored. It is similar to picking up a product on the grocery shelf and deciding whether you want to buy it or not. There will be certain factors that will be very important, some that are good to have, and some that are unnecessary. Price may be very important but the color of cereal maybe not so you will “weigh” the price to be more important than the color. The machine learning algorithm does this but with numbers (again a lot of them) and assigns weight based on the importance of the output.

A highly positive weight would mean that the input is extremely important in determining the output while a highly negative weight would mean that the input is not important in determining the output of the algorithm. Bias is the tweaking term that allows you to add a number to the weight to make this connection stronger or weaker. In our grocery shopping case, a bias could be your favorite brand so even though a competitor’s brand is cheaper you still pick your favorite brand because you have a “bias” towards it.

Weights and Bias explained as Cereal
Weights and Bias explained as Cereal

It is all about the weights in neural networks. Weights are the numbers that the computers learn to associate an input with an output. The issue with neural networks is because these weights are so many in number it is hard to understand what parts of the input are more important than the others. This is again because human minds can’t scan through billions of numbers and be able to tell which one is higher than the other (remember that’s where computers win).

Computers can tell what weights are more important, but they can’t always tell why they are important. Recall, computers see in numbers so, given an image, the computer sees multiple numbers for one image. These numbers from multiple images are “weighted” with each other to form patterns.

For example, multiple images labeled “sheep”, look like multiple pixel values to computers. Pixel values change based on color. So, areas in the image that are white may be highly weighted as being a sheep because of multiple sheep images that also had the color white in them associated with the label “sheep”.

What this means is that if the AI algorithm saw a new image (unlabeled) with a lot of similar white pixels, it may classify the new image as a sheep.

Is there a better way to visualize this?

Glad you asked! To understand what a neural network considers important, tools such as saliency maps have been introduced to show pictorially what parts of an image an algorithm weighs more important than the others. In the figure below, the brightly colored red areas show where in the images the algorithm considered important (highly weighted) for determining the output.

Image Parsing
Image Parsing

As you can see in the figure above, the algorithm was trying to find the sheep in the image. The AI algorithm determined that the white wool fur was the most important part of the image in determining that the animal on the right was a sheep. This case makes sense to us as humans but there could be other cases where the algorithm learns a wrong association. For example, the algorithm has only been trained on data with white sheep, a new image of a brown coat sheep could be predicted incorrectly because of the previous data that only matches white fur with the label “sheep”

The same comparison can also be made for words by using word-matching and Natural Language Explanations techniques that show which word is highly weighted in determining the outcome. This is why if you have ever tried to google something and cannot find your answer it is because you are missing that highly weighted word in your sentence.