Mental Models of Artificial Intelligence Part 2

Let's continue diving into what artificial intelligence is!

Jessica Ezemba

12/7/20234 min read

a cartoon of a woman with a phone
a cartoon of a woman with a phone

What does Artificial Intelligence mean?

Artificial intelligence was called artificial intelligence because it was a way to go beyond manually telling a computer what to do. Instead, computers infer decisions and tasks without them being explicitly told so. In reality, it has little correlation to the intelligence of humans, or even animals for that matter because computers cannot reason, see, or make decisions by themselves (no matter what Hollywood says). Artificial intelligence is limited by several factors which by the end of this article you should be able to identify but before going into more detail about what artificial intelligence is you need to understand some terminology that is often confusing. Artificial intelligence is popularly called AI and it is also only a subset of all computer programming.

There are still traditional algorithms that use explicitly programmed rules to get an output when given an input. A good example is a program that calculates your body mass index (BMI). There is a known equation that determines your BMI from your weight and height. Given an input of weight and height, this program will always use the same algorithm (equation) to determine the output which is your BMI. This is not artificial intelligence because it does not produce a different outcome outside of what the equation is defined by.

BMI Calculations
BMI Calculations

The subset of artificial intelligence that is commonly heard in media is machine learning. Machine learning is commonly defined as “learning” from data and experience. This is also a multifaceted term that also has many other techniques that fall underneath it. It is important to know that machine learning is just one subset of artificial intelligence but both deal with making computers “smarter”.

Another common term is an algorithm. An algorithm is a process that goes through certain predefined rules and calculations typically for a computer. Given an input, the algorithm is what determines the output, so it is what is used in not only artificial intelligence but all forms of computing. They are many types of algorithms, unfortunately not commonly known outside the computer science community, but for the purpose of this document, when most people speak about artificial intelligence and machine learning they talk about the algorithms (or calculations) that are used in these AI and Machine Learning.

a diagram of the artificial intelligence of machine learning
a diagram of the artificial intelligence of machine learning

People who work with artificial intelligence algorithms must determine 4 major components for the system to produce an output. These components are data source/type, model architecture, objective function, and hyperparameters. The data source/type is one of the most important aspects in machine learning algorithms such as Deep Learning so listen up, it's time to dive into the data.

The data source or type is how the algorithm is determined. Recall that machine learning is unique from all types of computation because it does not require explicit rules to be written to determine an output from an input as we saw in the case of determining Body Mass Index (BMI). Instead, the algorithm (or calculation) is determined automatically from data; loads of it. Data is on the order of a billion which is a really big number to imagine. Think of it this way 1 million seconds ago was 1 week and a half ago while a billion seconds ago was 32 years!

Data given to the algorithm to automatically determine the answer without written rules or equations is a process that is commonly termed training the algorithm. Training the algorithm works because computers are very good at detecting patterns in a large dataset and doing it very efficiently. The more data we have the more computers can look for patterns in data between the given input and the desired output and use those connections on any new data that comes into an algorithm.

The amount and quality of data define how good an algorithm is going to be. Depending on the data type you may be able to find large amounts of data available on the internet. There are many data types currently being used by machine learning algorithms such as text, speech, audio, images, videos, 3D geometries, etc. Most algorithms work with labeled data for training purposes. Labeled data is data where you know the input and the output. For example, you could have a picture of a dog with the task to predict what the image is, but you already have the tag “dog” associated with that image. This is necessary for training an algorithm because with enough pictures and tagged “dog” pairs the computer can infer that images that “look” like X are most likely a dog.

All About Data

a dog and a cat are shown in this picture
a dog and a cat are shown in this picture

This data is gotten from multiple sources across the internet and through private entities (as is the case with medical records gotten from hospital databases). Most web scrappers are made to extract all the necessary data types from a specific source or multiple sources. Most text, images, video, and audio are sourced from YouTube, Discussion Forums online (and social media), and images gotten from every website published. This data is open access because of most of the terms and conditions one blindly signs to any media platform or forum.

Typically, as a rule of thumb, if you upload content on a “free” or some paid service, your data can be transferred to third parties.

It is important to note that if a regular person with access to the internet wants to build a web scrapper, some platforms restrict this access to people who pay for the data (the third parties). This means that the data you provide freely (and sometimes paid) is going to those interests who pay a fee to access the data and not necessarily an average person that build models. This raises issues of accessibility. You provide data to companies but you as an individual cannot access that data to build algorithms. Accessibility is one of the four problems highlighted in this article but before we visit that let's understand more about what could be wrong with algorithms using web scrappers.

a cartoon character with a laptop and a man in a suit
a cartoon character with a laptop and a man in a suit