AI Is Nothing New
·
6 min read
tl;dr: AI didn’t suddenly appear—it was built step by step over many years. Today, tools like ChatGPT are just the latest chapter in a long journey of progress. So, get ready for the ride. Now, it’s way more public, and it’s exponential from here ;)
How We Got Here: A Full Timeline
1950s: The Dream Began
- 1950 – Alan Turing’s big question: “Can machines think?” sets the stage.
- 1956 – The Dartmouth Conference: The term artificial intelligence is born. Early pioneers predict rapid progress.
➡️ But within just a few years, expectations were already seen as unrealistic.
1970s: The First “AI Winter”
- 1973 – The Lighthill Report (U.K.): A government review says AI has failed to deliver useful results. Funding is withdrawn in Britain.
- Interest in neural networks collapses because the math (backpropagation) wasn’t yet figured out.
- Symbolic AI (hand-coded rules) dominated, but critics said it couldn’t scale to real-world intelligence.
➡️ Result: Researchers left the field, calling AI a dead end.
1990s: Quiet but Productive
- AI was largely rebranded as “machine learning” or “data mining.”
- Skepticism remained, but hidden under the surface, useful tools like support vector machines, Bayesian networks, and early recommender systems flourished.
- Public view: AI was considered science fiction, not serious science.
2000–2010: Laying the Groundwork
-
Early 2000s – Smarter math: Researchers refined algorithms like support vector machines and random forests. These were good at spotting patterns in data but still needed humans to hand-craft features (the important parts of the data).
-
2006 – The comeback of neural nets: Geoffrey Hinton and his students showed that “deep learning” (stacking many layers of artificial neurons) could actually work when trained in stages. This revived interest in neural networks, which had been considered outdated.
➡️ Many people rejected Hinton’s work, citing deep learning as impractical until big data + GPUs proved otherwise.
-
2007 – Speech recognition leaps: Google and other companies began using statistical models at internet scale to improve voice search and speech recognition.
-
2009 – ImageNet launched: Fei-Fei Li and her team created a huge labeled dataset of millions of images. This gave researchers the “fuel” needed to train powerful vision systems and paved the way for the 2012 AlexNet breakthrough.
-
2009 – Big data era: Tools like Hadoop and MapReduce spread, letting companies store and process massive datasets cheaply. This availability of data and compute was essential for the next decade of AI.
2010-Today: Breakthroughs
-
2010 – Online ML/AI competitions: Kaggle opened, letting people all over the world compete at building smarter computer programs.
-
2011 – Jeopardy! champion: IBM’s Watson computer played the TV quiz show Jeopardy! and beat two human champions. It could pull up answers quickly by searching lots of text.
-
2012 – Teaching computers to see: Researchers taught a program to recognize cats in YouTube videos. In the same year, a model called AlexNet learned to spot and name objects in pictures much better than older systems.
🔑 Why it worked: AlexNet was trained on GPUs (graphics cards), which could process thousands of operations in parallel. This made it possible to train deep neural networks that were previously too slow to be practical.
➡️ Still, many researchers argued neural nets were a “fad” that wouldn’t generalize.
-
2013 – Meaningful word numbers: Google scientists created a way to turn words into numbers so computers could understand relationships between them. This helped machines get better at reading text.
-
2014 – Recognizing faces: A system called DeepFace learned to identify faces with about 97 % accuracy, close to what humans can do. Again, GPUs made training these large image models feasible.
-
2016 – Mastering Go: Google’s AlphaGo program beat a champion at the ancient board game Go. It combined smart pattern learning with careful searching.
🔑 Behind the scenes: Specialized TPUs (Google’s Tensor Processing Units) and GPU clusters powered the massive computations required.
-
2017 – A new blueprint: Researchers at Google introduced a design called the “Transformer”. It lets programs pay attention to all parts of a sentence at once, which is great for language tasks.
-
2018 – BERT & GPT-1:
- BERT (Google) – Read sentences in both directions and filled in missing words. Great for understanding language but not for writing it.
- GPT-1 (OpenAI) – Trained left to right, predicting the next word. This made it capable of generating fluent text.
➡️ The race began: Google, OpenAI, and others quietly pushed to see how far scaling could go.
-
2020–2021 – Scaling laws discovered: Researchers proved that bigger really is better: as you scale parameters, data, and compute together, model performance improves predictably.
🔑 NVIDIA’s A100 GPUs and Google’s TPUv3/v4 made training models with hundreds of billions of parameters possible.
💸 Example: Training GPT-3 (175B parameters, 2020) required thousands of GPUs running for weeks and is estimated to have cost $10–20 million in compute alone.
-
2022 – ChatGPT arrives: OpenAI released ChatGPT, trained on clusters of thousands of GPUs. For the first time, anyone could chat with a computer that could write stories, answer questions, and even hold conversations.
Why It Feels Like AI Is Everywhere Now
For decades, computers have quietly learned to recognize patterns—reading text, spotting faces, winning games—but they weren’t in everyday conversations. Four big things changed:
- Smarter designs: The Transformer and similar ideas let computers handle long sentences and complex tasks.
- Better text handling: New tokenizers like WordPiece and tiktoken (byte pair encoding at OpenAI) split unfamiliar words into familiar parts, so the model isn’t stumped by new terms.
- Generative training: Models like GPT learned to guess the next word, turning them into storytellers that can write emails, summarize articles, and chat naturally.
- More Data and Better Processors: Big data became widely available and NVIDIA made GPUs that could power the hundred.. to thousand.. to now hundreds of billions of parameters in LLMs we see today.