Openledger | The AI Blockchain

The Brain’s Way of Solving Problems

Your brain is a prediction engine. When you touch fire as a child, your neurons wire together to remember pain. When you see a dog, your brain does not check a rulebook. Instead, millions of neurons fire in patterns that encode your past experiences of “dogness.”

This is intelligence: the ability to generalize from experience. Machines wanted to do the same. The first attempt was not to build neurons but to write rules. If fire = hot, do not touch. If dog = four legs, tail, barks, then classify as dog. This was called symbolic AI, a world of handcrafted logic.

Illustration comparing brain-like and rule-based problem solving

It worked for narrow problems but collapsed the moment reality got messy. The brain thrives in messy situations, rules do not.

Learn more:
History of Artificial Intelligence (Wikipedia)

The Dartmouth Conference: Where “AI” Was Born

In the summer of 1956, John McCarthy, Marvin Minsky, Claude Shannon, and Nathaniel Rochester gathered at Dartmouth College for the Dartmouth Summer Research Project on AI.

It was here that the term “Artificial Intelligence” was first coined. The proposal stated:

“Every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.”

This wasn’t a coding hackathon. It was a blueprint for a field, pointing to neural nets, search, symbolic reasoning, and language. The dream was set.

To learn more:
Dartmouth conference

From Rules to Learning: The Perceptron

In 1957, Frank Rosenblatt asked: what if machines could learn like neurons? He introduced the perceptron, the first mathematical model of a neuron.

The perceptron takes inputs, multiplies them by weights, adds a bias, and runs them through a step function:

f(x) = h(w ⋅ x + b)

Inputs (x_i) = features, like pixel values
Weights (w_i) = importance of each feature
Bias (b) = adjusts the decision boundary
Step function (h) = binary output (1 or 0)

This made the perceptron a linear classifier, able to draw a straight-line boundary between classes.

Rosenblatt also built hardware: the Mark I Perceptron (1960). It had a 20×20 grid of photocells acting like a retina, connected randomly to association units, with adjustable weights implemented by potentiometers. Motors updated these weights during learning.

It was able to classify simple patterns and created massive excitement. The New York Times even claimed it could one day walk, talk, and be conscious (
NYT Archive, 1958).

But it had limits: it could not solve problems like XOR, which are not linearly separable.

XOR non-linear separability illustration

📖 Learn more:
Perceptron (Wikipedia),
Rosenblatt’s 1958 Paper (PDF).

Language Models and Next Word Prediction

In parallel, a very different idea was brewing. Could machines predict text instead of reasoning with logic?

Claude Shannon (1948–1951): Measured the entropy of English by asking humans to guess the next letter. This proved language is statistically predictable.
N-grams (1960s–1970s): Instead of full reasoning, approximate by looking at the last few words. A trigram model predicts P(w_t | w_t−2, w_t−1).
Corpora: The Brown Corpus (1961) provided 1M words of text, enabling statistical models to be tested.
Applications: Early speech recognition experiments at IBM and Bell Labs in the 1970s used n-gram models with smoothing methods like Good-Turing and later Kneser-Ney.

This is important because modern LLMs still use the same objective: predict the next token. The difference is scale and neural architectures, not the goal.

Learn more:
Click Here!

Symbolic AI and Expert Systems

After Dartmouth and Perceptron, the early years were dominated by symbolic AI. Researchers built expert systems: programs that encoded domain-specific knowledge as logical rules.

Example: MYCIN (1972) at Stanford. It used ~600 rules to recommend antibiotics for infections. In narrow cases, it performed as well as doctors.

But symbolic AI faced the knowledge acquisition bottleneck. Writing and maintaining rules for messy, real-world domains became impossible. This started the search for an alternative in different ways.

Prolog: Programming in Logic

In 1972, Alain Colmerauer and Philippe Roussel introduced Prolog (“Programming in Logic”). Unlike imperative programming, Prolog was declarative. You wrote facts and rules, and the system inferred answers.

Example:

cat(tom).

mouse(jerry).

hunts(X, Y) :- cat(X), mouse(Y).

Query: ?- hunts(tom, jerry). → true

Prolog fueled symbolic AI and was central to Japan’s Fifth Generation Computer Project (1982–1992), which invested $400M in building intelligent reasoning machines.

Machine Learning: Data Becomes the Teacher

📖 Further reading: Statistical Learning Theory – Vapnik, Foundations of Machine Learning – Mohri, Rostamizadeh, Talwalkar

By the 1980s, symbolic AI was stuck. Rules could not capture the endless messiness of the real world. The new idea was radical: instead of writing rules by hand, feed the system data and let the algorithm discover the rules on its own.

This marked the birth of machine learning. The transition was not just philosophical but deeply mathematical. Vladimir Vapnik and Alexey Chervonenkis formalized the idea through Statistical Learning Theory.

The central problem was generalization: given a finite set of training data, how can a model make accurate predictions on unseen cases? Vapnik and Chervonenkis introduced key ideas:

VC Dimension: a measure of the capacity of a model class
Empirical Risk Minimization (ERM): minimize training error
Structural Risk Minimization (SRM): balance training error with model complexity to avoid overfitting

This made machine learning a science instead of guesswork.

Early Algorithms: Trees, Bayes, and Margins

Once the theory was in place, practical algorithms began to shape industries.

Decision Trees
Ross Quinlan introduced ID3 in 1986. Decision trees split data step by step, creating if-then rules directly from examples. They were interpretable and useful in fraud detection, medical diagnosis, and customer segmentation.

Naive Bayes
Rooted in Bayes’ theorem, Naive Bayes assumes features are independent. Despite this simplification, it worked well for text classification. In the 1990s, it powered spam filters and document classification at scale.

Support Vector Machines (SVMs)
Introduced by Vapnik in the 1990s, SVMs aimed to find the hyperplane that best separated classes by maximizing the margin. They excelled in handwriting recognition, face detection, and bioinformatics, showing strong generalization power in high-dimensional spaces.

Decision trees, Naive Bayes, and SVMs overview

📖 Learn more:
Decision Tree Learning (Wikipedia),
Naive Bayes (Wikipedia),
Support Vector Machines (Wikipedia).

Networks and the Breakthrough of Back-propagation

The human brain builds understanding in layers: from edges to shapes to objects. A single perceptron could not do that, but multilayer perceptrons (MLPs) could.

In 1986, Rumelhart, Hinton, and Williams popularized backpropagation, a method to train these multilayer networks. Errors from the output layer were propagated backward, adjusting weights in earlier layers step by step.

Backpropagation used gradient descent, nudging weights toward values that reduced error. This made MLPs powerful enough to approximate almost any function, a fact later proven by the Universal Approximation Theorem.

While limited by the compute power and small datasets of the time, backprop laid the foundation for the neural networks that would later dominate AI.

Learn more:
Back-propagation (Wikipedia)

Conclusion: The Stage for Modern AI

By the 1990s, AI stood on two strong legs. On one side, machine learning algorithms like decision trees, Naive Bayes, and SVMs were powering applications in finance, healthcare, and telecom. On the other side, neural networks with backpropagation had the theoretical power to approximate almost anything, but they were held back by the limits of data and compute.

Running alongside these was a quieter but equally important thread in language modeling. From Claude Shannon’s early experiments with predictability in English text to n-gram models and speech recognition research, the idea of predicting the next word became a practical way to capture patterns in language.

When large datasets appeared in the 2000s and GPUs unlocked scale, these three currents began to converge. Data driven algorithms, neural networks with backpropagation, and the tradition of next word prediction merged into what we now call deep learning.

The perceptron’s humble beginnings, the rigor of statistical learning theory, the breakthrough of backpropagation, and the persistence of language modeling all came together to create the foundations of modern AI.

In the next blog, we will explore how neural networks evolved into CNNs, RNNs, and deep learning, and how the need for compute and data bottlenecks set the stage for the birth of transformers.

The Curious Mind #1: What’s even AI? Where Did It All Start?

The Curious Mind #1: What’s even AI? Where Did It All Start?

Copyright © 2025 Open Ledger. All rights reserved.