Its been one year since I wrote Generative AI. AI was at its nascent stages back then, its datasets untainted with agentic usecase and chain of thought examples. Simple input, and simple output. In about one year, it went from this to having control of your computer. Tools like Claude Code, Manus and Comet browser have taken shape, letting the user use AI for agentically doing any task they can do. Its, obviously, not there yet, tasks take way too long to finish, but as they always say, this will be the worst it’ll ever be.
This one year was honestly a very confusing time period for me with regards to this AI thing. At first, clearly, I was this AI maximalist, willing to throw AI at anything and everything to see how it fit in, see how it made my life better. I wanted to give this new tech every opportunity to improve my life in every facet possible. But as time progressed, I gained new insights about this technology, its deep and intricate workings from philosophical conversations, blogs, talks from experts, videos on the internet and other places. I even read the book AI Engineering end to end to deepen my understanding on this (thanks Chip Huyen!). I believe my perspective is a bit more nuanced now.
Now before I go ahead, I have a small confession to make and an apology for the people I have been talking to about this topic. I had always been playing the role of a devils advocate with you guys (LMAO). It was honestly a combination of my past AI maximalism and a sheer desire to ragebait you guys (yes Abinav, Andy and Farhan I am sorry), but most importantly, I wanted to really battle-test the anti-AI maximalists. I wanted to really understand what they had to say. The way I understand things is by adversarially trying to break it down. You guys just happened to be my unfortunate victims (sorry again).
The Picasso of Bullshitting
Generative AI is in its essence, a very beefed up autocomplete. Yeah, thats it. Have you ever used the google keyboard autosuggestions to finish a sentence? Then you know how sometimes it feels like the keyboard kinda knows you. As if it has a very, very primitive, blurry model of one facet of your life (your keyboard usage), and it was just barely sufficient for it to finish that sentence. Yeah. Researchers at the Silicon Valley (and China) took this shit really seriously and dialed it up to a 100. More accurately, to a 100 billion parameters.
This simple keyboard autocomplete is a simple tree based, levenshtein optimized next word finder. Generative AI is based on transformer blocks, attention and feed forward mechanisms and embedding tokens. But the principle is the exact same. It predicts the next word. In the keyboard, the primitive model predicting your next word in your sentence is not a dude inside your phone empathizing with you, but rather a model which has managed to encode the frequency of the most common words tied to the words around them in your vocabulary, and managed to spit it out when you needed them.
You could say that its understanding of you, is very much a (fortunate) side effect of the standard distribution of your vocabulary.
What happens when you dial it up to the max? Instead of this primitive model being exposed just to your text corpus, what if we managed to feed it the text corpus of the entire world? What if we upgraded the model to somehow find patterns in this text, much beyond just the frequency of it appearing after another word? What if we could come up with some algorithm which managed to not only represent a word as a vector in some dimensional space, but also adjust this vector according to the meaning of the sentence that is in? Well, a large language model is what you get at the end.
You now have this magical thing which has the understanding of the entire world as a whole. Ask it any question about anything in the world, and you get back a coherent answer. Code, art, music, skills, TV, philosophy, everything is on the table. It is insane, how does it know so much?
But the truth is, it doesn’t know anything. It’s understanding of the world is a surprising side effect of it predicting the next word (or the next token, in this case). It is just too good at being able to understand the patterns behind the input set of tokens, and finding the next token matching this pattern.
We make the patterns, it just finds them
LLMs are the collective humanity’s surprise at how tall this local maxima of true artificial intelligence can actually be. Pretending a keyboard autocomplete is true intelligence is hilarious. But what if we could actually push it to the maximum? How far can we actually get it to mimic this intelligence we are looking for? Suddenly no one is laughing now. This was the problem statement for the people at these AI labs. But they also know exactly what this is, a local maximum. This was never intended to be AGI.
For the longest time after the advent of AI, I had essentially modelled the human brain after LLMs. I mean, take an impressionable function generator, dump a megaton of data in it, and you get coherence and an understanding of the world from it. This is true and possible in everything from linear regression to convolutional neural networks to transformers. I even wrote about it in Zero Knowledge Specialized Intelligence. I thought the brain was essentially just that. An undefined, biological and organic editable function with a billion parameters where the 5 modalities of our 5 senses are the inputs, and the outputs are what we can do. This is how I imagined the brain in my head:
def brain(a,b,c,d,e,f,g,h,i,j.......up to a billion or more who knows):
response = (some combo of these params)
return action(response)
But I know now how extremely reductionist this was. I am not going to pretend to know the nature of consciousness and what makes natural intelligence the one true intelligence1, but I do know that our world model is not the side effect of us predicting our next stimuli, but rather it is the MAIN effect of the stimuli in our lives. We don’t match our stimuli to a hard set of “patterns” as vectors in an dimensional space, but rather, we naturally and organically add and remove new axes to our set of patterns in our head. I guess that’s what makes a human, a human. The word apple is mapped not to the web of other tokens it commonly co-occurs with, but to the experience of tasting the apple itself.
Our uniqueness stems from this weird and unique set of patterns we have built up over our lives. Our strength in teams comes from the sum of all of these patterns. Sometimes the world doesn’t fit our set of patterns, so we add a new pattern.2 Sometimes we forget a pattern because it hasn’t been used in a long time. War and clashes happen due to a stark difference in these patterns. Peace happens when we decide to tolerate said difference in these patterns. Understanding begins when we break things down to these fundamental patterns. Success is achieved when we find that one missing pattern that fits the world model in our head to the actual world. Failure, when that missing pattern is the wrong pattern.
LLMs do not have this privilege of altering the dimensionality of their latent space (yet, atleast). They have the ability to predict what comes next in this analogy for example man:woman::king:?
only because the difference vector from to points to the point when aligned to the end of . There exists an axis, a dimension in the latent space along which this difference vector is aligned, and this is the pattern. The model has inferred that this specific axis is “gender” through its training. It feels reductionist to dimensionally understand this concept because I am trying to reduce a 1 million dimensional space to a 3 dimensional space for simplicity. But in math, it is okay.
I think you are starting to see the problem here. These patterns exist within an LLM, but obviously, this set of patterns is finite. And is largely dependent on the input data. By definition, it is ungrounded and static.3
God of Triviality
It isn’t true intelligence. It’s an imitation of it confined to a finite set of patterns, a finitely dimensional latent space. This model will eventually top out at some point. No amount of parameters, training, synthetic data can ever spawn general intelligence. This approach is fundamentally flawed. Don’t let anyone, ANYONE tell you otherwise. Can we reach AGI through LLMs? Can you map a sphere on a sheet of paper? The answer is no, but we can get close. Same here. But that however, has never stopped me from being eternally surprised by every new thing coming out of this space. If anything, this whole ordeal has only brought me newfound admiration to human ingenuity, ironically. How far can these nerds at silicon valley actually push this? The answer seems to be, quite fucking far.
Researchers at China/USA are working hard to expand the set of these patterns in LLMs. They’re trying to encode every pattern a human has ever had within the confines of a model. But I can assure you, they’ll never get to that 100%. It is like trying to get to the boundary of an infinite universe. We constantly make new patterns, influence the world under this new understanding, thereby forcing people around us to update their set of patterns to better understand this new world, and so on and so forth. Heck, even old people struggle to update their patternset (ergo, their world model), and they’re human! What hope does this thing have at mimicking and encapsulating these sets of patterns at the speed at which the world is changing? And don’t even get me started on the fact that is fundamentally restricted to the world of text, a world in which we have only managed to project a subset of the true set of patterns we possess.
But an Intern for the Novel
It may not be capable of curating its own patterns yet, but it has managed to grasp, I would daresay, around 70-80% of the most common patterns required for normal human functioning. That in itself is a great thing. Why? Because now its useful. To do basic, everyday tasks, you only need to grok the most basic patterns of life. Only when you are actually creating something, you tap into the unmapped 20-30% to bring forth something absolutely, mind-bogglingly unique.4
But don’t let that count out AI. Like I said, 70% is pretty damn useful. You can effectively outsource the monotony of life to this thing now, and focus on the more, non-trivial things in life. One amazing benefit of AI is that its replicable, recursable and multiplicable at inhuman speeds. It is very difficult for us to summon 50 new human beings at a remote location to solve an trivial but lengthy issue, but it is possible for us to deliver that same amount of meh-level intelligence through the internet. We can have 50 inference instances of the same AI at the click of a button. I’d like to see you do that with fleshy humans.
So to summarize, if you want truly intelligent entities for ubermensch-ey non-trivial tasks, your best shot is a living, breathing human being. Even if this person isn’t fit for the task, the person will eventually rise up to it. You can count on that. An AI however, will not. Simply because this high-level task doesn’t exist in its set of parameters. It is out of its depth, its distribution.
But if you need pretty meh levels of intelligence to solve a trivial problem, obviously the answer is AI. It’s a no brainer! Who in their right mind would hire a human being to sit and summarize swathes of text anymore? With AI you can do it faster, cheaper and have 100’s of them doing this parallely at a given moment. Whereas employing a human for this task would be a costly and slow endeavour, and the guy employed would probably get depressed by the monotony of this simplicity, after a point.
Man:woman::king:queen is a trivial pattern. Analysis of the financial landscape using umpteen things going on in the world and using them to stay profitable like Warren Buffett requires non-trivial patterns. You don’t employ an AI to run Berkshire Hathaway. It will fail. You don’t employ Buffett to go through a 1000 line long excel sheet with analogy questions like this. He will probably be bored out of his mind. Get my gist?5
Outsource the boring, own the weird.
Of course, LLMs have a lot of kinks and cracks that need to be worked out. Hallucination, context rot, biases, sycophancy and a whole laundry list of other issues need to be fixed. And we will. Because the world needs trivial problem solvers. We have a looooot of meh problems to solve. They keep coming up all the time too. To be able to outsource this monotony to AI and worry about bigger problems is the privilege we all want. Solving trivial problems may be profitable, but they really tire out the soul. This is why you find a lot of depressed and burnt out people in corporate jobs. And this is precisely where we should have AI.
Automate away the monotony, bureaucracy and the utter meaningless bits of our current systems, and we will be left with a large unemployed population willing to upgrade their patternset for bigger tasks. To ask bigger questions. To think about the bigger picture. This era will see an increase in people starting their own things. We will come closer to what Naval prophesized: every person will have a business of their own.
The only way to achieve something like this is to understand that you need to be extremely dynamic about your own patternset. Absorb the world around you. Find new patterns in how the world works. Put them to the test. Frequently. Remove them if they fail. Because the only way you can lose in this new world is when you are stagnant about your worldview. Eventually a unique problem will appear. You will be the only person able to solve it. And you’ll be able to find your place in the world.
Your edge over AI is how quickly you can edit your latent space, and how much control you have over it. And also the fact that you can act quicker than it does thanks to your 4 limb attachments of course (this is changing, robots are soon coming for us). Use this to your greatest advantage. You don’t have to be better than AI, you just need to be faster than the AI engineers feeding it data.6
This doesn’t stop the researchers from working towards AGI. And when they do, the world will change again. A system which has the best of both worlds, the adaptability of a human combined with the edge deployability of a program, now that is an interesting future to think about. But we will cross that bridge when we get to it, okay?
Footnotes
-
My biological chavinism being put to full view here (sorry Basilisk, spare me). Functionalism (yes the philosophy) states that intelligence is what the system does, not what it is made of. If AI can perform tasks that previously needed intelligence, calling it a mere imitation of human intelligence is just straight up clankerist. ↩
-
(not me playing devils advocate on my own post lmao) Here’s a question: When we add a new pattern, are we adding a dimension out of thin air, or are we simply just performing a very complex remix of our existing knowledge and experience? An LLM asked to write a sonnet about a lonely data server in the style of Shakespeare, also combines both concepts in a way that is never seen before, an interpolation in the latent space between Shakespeare’s work and data servers. Is our own creativity mechanistically different from this, or are we just a more sophisticated and grounded version of this same synthesis process? ↩
-
A fair challenge here would be to ask “isn’t this line too clean?” An LLM’s knowledge base isnt technically static. RAG methodologies let it pull in live data and finetuning lets us alter its behaviour and in some sense, edit the dimensionality of its latent space. And at the same time, humans are not really dynamic either. Biases and dogma can make human world models hellishly rigid for years on end before they actually adapt to reality. The line separating the staticity of an LLM and the dynamicity of a human is probably more blurrier than I am letting on. ↩
-
This is in all honesty, a moving target. The unmapped, out of distribution ideas of today could get mapped tomorrow, and the model trained on it 5-10 months later. What then? And not to mention the fact that some “weird” problems (like making a car aerodynamic) might be best solved by finding analogies between distant fields (in this case, from observing birds, ornithology). This is a task that an LLM can solve much better than a human thanks to its latent space. The human edge might actually be more about the grounding we have than the novelty of the abstract. ↩
-
I’d like to point out that there’s a bit of Survivorship Bias in here. Trivial tasks may actually be non-trivial to humans and vice versa. Triviality may not always correlate to how much it is talked about on the internet, which is the main input to LLMs. The other day I was trying to customize my Lazyvim and I needed a formatted init.lua file for it. Gemini failed at this task quite spectacularly. It is a trivial task, but there aren’t many sources on it in the internet, and hence, Gemini sucked at it. ↩
-
Reminds me of “You don’t have to outrun the tiger, you just have to outrun the other guy.” ↩