Open Questions On LLMs

14 Jun 2025 / Alex Bolboaca / No Comments

Alex is looking at four fundamental questions about Large Language Models:

Are they intelligent?
Are they creative?
What are their limitations?
Does prompts’ structure actually matter?

Transcript

Are LLMs intelligent can they create something new what are their limitations and do prompts matter? These are a few things that I want to discuss in this video and I don’t believe I will get to the bottom of all these issues but I think these are important questions to ask so this is what we’ll do today.

Welcome back to Think.Design.Work Smart. I’m Alex Bolboaca and I’m coming at you from the Mosaic Works Studios and what I have for you today is a deep dive into a few core questions related to large language models or GenAi or what people call today artificial intelligence.

So let’s look at the first question are large language models actually intelligent?

This is a very difficult question to answer and the problem is that we don’t really know how to define intelligence. There are various definitions and some of them have worked for a while but then they didn’t work anymore so basically you could think about intelligence in two ways. I’m going to simplify this a lot.
On one hand you can look at functional aspects uh things like in the case of people you know what is their neurons speed, how many connections are between neurons in the brain and so on. That would be similar for us programmers to things like CPU speed, memory capacity, cache levels so on the functional aspects of computers or in the case of a brain the functional aspects of a brain how fast things are triggering how many connections are there and so on.

The issue with this is that it’s very hard to monitor a brain and I think this is a thing that’s shared with artificial intelligence although a very interesting thing with large language models is that you can kind of take nodes – identify a few of them and then figure out what nodes do a certain thing and try to replace their output with something else. I’ve seen some studies where they replaced certain words so they realize that certain nodes in the neural network correspond to specific words or notions like let’s say the city of New York and they replaced New York with Chicago and they saw Chicago appearing in various places instead of New York. Very simplified example once again. So it’s easier in a way to do a kind of reverse engineering on neural networks than it is to do it in case of a brain where you cannot just pick a neuron and replace its function and see what it does. It’s not possible and it’s not ethical so it’s obviously not the same thing.

The second type of definition is behavioral which is a blackbox approach. We give the human or the large language model a test and we see what are the results and based on those results we assess intelligence. But the problem with this is that neural scientists figured out that there are multiple components [to intelligence] so generally speaking there’s growing recognition that intelligence involves multiple components working together: cognitive abilities like reasoning, memory processing speed, adaptive behavior which is practical problem solving in real world context, and learning capacity which is acquiring new knowledge and skills efficiency.

So the problem here is we don’t really understand human intelligence so how can we claim to understand non-human intelligence?

It’s completely different and so there are a few questions that I’m wondering about.

First is: people say that large language models are basically glorified autocomplete and that is correct but the the thing is autocomplete in the case of large language model works on different levels and it works on a statistical basis. On different levels means that it looks at the next word at the next uh group of words at the next sentence at the next phrase and it does this based on a statistic pattern matching so it’s more likely for certain things to follow other things. The interesting thing if we look at the blackbox style responses I would argue that large language models are much smarter than we would have expected. I mean the only thing that they did was they took a lot of data basically the whole internet, mixed it into a vector database that kind of splits into pieces and creates these statistical connections, but then the the result is that we get answers that sound often logical that sound often intelligent even though there’s no notion of logic, there’s no notion of mathematics, there’s no notion of computation in a large language model, there’s no notion of what a thing means. There’s just a node somewhere that is connected to another node based on weights that are derived from statistical analysis.

But then this raises an interesting problem: is language more important than we thought for intelligence? We often associate intelligence with things like cognitive abilities what we talked about here adaptive behavior, learning capacity, but could language be fundamental to the whole construction of intelligence?

And if you look at the evolution of human species I think that there might be an argument for this because language was always some part of this behavior, some part of
the brain. Then we have this interesting thing: Noam Chomsky’s universal grammar. What Nam Chomsky showed or well it was a theory a hypothesis but I think it’s more and more probable right now is that it’s not that we start with zero knowledge of language when we are babies; we are born with the structures that allow us to learn any language which is fascinating because it means that the functional structures are somehow there and it looks to me as if the large language models take the language and kind of reverse engineers those functional structures in a way or other.

Now of course what’s missing is probably other functional parts: the parts that are related to logic, the parts that allow you to do reasoning or mathematical
reasoning or logical reasoning, critical thinking maybe, but a lot of it is language. I’ve seen that um some people were saying that one argument against large language models being intelligent is that we perceive language as intelligence, but maybe it’s the other way around, maybe language is the foundation of intelligence and we build on top of that in our brains.

So the question I’m pondering is could intelligence be actually be a mix of pattern matching and emerging properties? And what I mean by that is large language models are doing pattern matching in a sense right, but this is what the brain is doing as well our brains are very good pattern matching machines and we know that because of the logical fallacies. Logical fallacies are basically pattern matching gone too far. So pattern matching is a fundamental part of this and in a way the large language models are doing a form of pattern matching because it’s based on those statistical models. They see a question, they see a prompt and it’s likely that certain things follow after that prompt and they generate that answer and once they start a sentence then other things follow and so on.

This is a very very intriguing question for me and I think it’s a bit unfair to say large language models are not intelligent at all because they don’t do logical reasoning, they don’t have memory etc. I think the interesting question is: could language be in the center?

Because in that case you could take a large language model – well in theory we don’t know how as far as I know – but imagine you take a large language model, combine it with some modules that do logical thinking and somehow make this work. Of course the big problem is how to make this work but then you’ll get reasoning, memory, and language, and perhaps with this combination you get intelligence.

But honestly I wouldn’t expect that intelligence to be higher than human; I would expect it to be very similar to human intelligence but that’s just a speculation.

This is already very interesting but now we can move to the next one which is: can large language models actually create something new? Are LLMs creative? The general answer to this is people who say that large language models cannot create something new they say they don’t live through experiences, they don’t go through what people are going they don’t have the insights that people have in a specific topic. So if you take a creator like a painter, like a musician, like a writer they go through certain experiences and based on those they create something that is uniquely theirs.

And I think there’s a valid argument here but let me ask you a different question and then I’ll come back to the initial argument. The question is: where does creativity come from?

And I think, well, I really hope that part of it is this insight, this experience, it’s something that you have to go through, it’s something that we don’t quite know how to define. But I would argue that part of creativity is combining things that have not been combined until then or combining things in new ways.
I noticed that in order to be more creative for me one thing that’s really useful is to be exposed to randomness. I am more creative when I am walking around looking at random things. One thing by the way that I like doing with Claude the AI assistant is to ask him “give me five random things from the internet” and then I look at those and those are the ones that spark creativity.

Is it possible that creativity is just a complex anagram engine that happens on multiple levels?

Allow me to do a little bit of a detour here one of my favorite books is called “Foucault’s Pendulum” written

by Umberto Ecco. Umberto Ecco was a genius author because he was also a genius in the science of semiotics which basically talks about how we transform symbols into meaning, what is the connection between let’s say a few smidges on a paper and words and something that you actually understand that you can visualize in your mind

All of his books have something connected to semiotics but I think “Foucault’s Pendulum” is one of the greatest books ever written, the reason for that is that it happens actually on multiple planes and it’s very difficult to to grasp it I had to read it about three times to figure that out. One of the core themes in the book is anagrams; the book starts with a group of people playing around with a computer and doing anagrams of particular sacred words from sacred traditions and discussing how the anagrams of sacred words influence the word the world they live in and then they actually start doing anagrams of ideas. They start grouping ideas and compiling them and they end up creating an uber conspiracy theory that is not actually true of course – they just mixed and matched things from various other conspiracy theories.
And they create a very believable conspiracy theory that is adopted by a secret society and at the same time they are discussing about the biology of anagrams and how for example cancer is about anagrams in genes in cells.

What I learned from it is that by doing anagrams, by combining ideas in various ways you can create new meaning and if you are watching the internet on any specific thing you know that whenever there’s a very popular TV series like how it was with Game of Thrones there were people throwing out all possible theories about the ending of Game of Thrones and eventually somebody through this process of combination ended up with one theory that was actually correct.

This is a very interesting thing because okay large language models are autocomplete on steroids, they are a statistical engine so they go and combine things. Can you
actually make them travel less statistically relevant things because by statistically relevant what we mean is that they got the whole internet and they saw that things fit together. But if you could give them a specific prompt that allows them to travel through lower statistical paths then uh that might create new things

Now is this equal with people’s creativity, with human creativity? My personal hope is that it’s not. I hope that creativity is more than that that. You have to put something specific to you in the work and that something specific is not something that is visible to a large language model because it’s not included in the statistics because it’s inside you, it’s something that you have. It’s very likely that we have something particular to each of us that allows us to create things that only us can create.

On the other hand saying that large language models can never create something new I don’t think that’s completely true; I think they can mix and match things that already exist and when you have all the internet basically mixing and matching things at different levels will create something new eventually.

Now will that be good? I’m not sure that’s a completely different thing but creating something new I think it’s possible with large language models.

All right now let’s talk about the next one which is: what are the limitations of large language models

We now know from a study published by Apple recently that models collapse after high enough complexity so what they did was they tried to give them logical problems, things like towers of Hanoi and they increased the number of pieces and once the number of pieces got high enough, they couldn’t do anything with it anymore

Now this is still a bit difficult to say okay this is definite proof but I’ve definitely noticed that large language models, the wider the context the less good information they give you and they are very useful for very specific things that you can verify quickly. Code is one of them because code okay we generate some code, we can test it, we can see it works. Others are texts in a specific domain where you can verify that they are correct. There are a number of these things that we can do but once you go to general general domains it’s much harder and my feeling is that if you try to give them complex problems they will just not work

But I’m curious what your experiences are here because I’m still experimenting in this uh area. I know another topic where I’ve seen um large language models fails. So I tried prompting them for domain modeling – that was an interesting experience. I gave them the description of a relatively simple problem and asked give me a domain model for that. It gave me a domain model that was so generic it was not fit for purpose so it was not useful.

I know other people experimented with large code bases. They gave the model large code bases and started asking questions and didn’t get very good answers or the changes that the assistants made in large code bases were not fit and they had to um redo them.

But then this is an interesting question because if context and the complexity of the context is a problem, the solution to using large language models in specific context is to reduce that, and you reduce the complexity by doing divide and conquer, by splitting larger things into smaller things

This is a bit of a problem when you’re dealing with complex adaptive systems because then you have a lot of connections between various elements, but if you can split a larger context into a smaller one they will probably tend to be more accurate. One example here would be instead of giving them a large code base you just work on a module and if you have specific guidelines for the module you include them.

So I’m curious what what are your experiences with the limits of large language models

Of course we already know and I mentioned a few: reasoning – I don’t expect them to be logical about things. They might generate a chain of reasoning which is an interesting thing and by the way it was an emergent property so it comes up out of the model under specific constraints. So the chain of reasoning is generated but it’s not precisely how it reasons which is another thing: they are actually lying about how they reason

So logical reasoning, memory is limited, we know that there’s a limit in computation – they don’t know how to do math properly – these are a few but I’m curious if you tried anything else what did you notice? What were things that were missing, that were not working as expected?

And the last one to which I’m not sure I have a good answer is do prompts actually matter?

And by do prompts actually matter what I mean is there are all kind of prompting strategies out there and people are swearing that you really have to tell the assistant that act as a marketing specialist, this is the context, and these are these are the constraints and that’s the question.

And I’m really curious if these are imagined benefits or real benefits. I don’t know I mean the thing you could do I imagine in this scenario would be to look inside the model and see which nodes are triggered and see whether they go on a different path depending on the different prompts

I tend to like simpler things and I use simple prompts most of the time and I got quite good results and then I couldn’t see a difference in the few occasions when I tried a simple prompt versus a more complex prompt for the same problem.

So I don’t know do prompts actually matter. Do we have studies that prove that if you structure the prompt in a certain way then you get a better answer, better expertise, more precision?

I don’t know, so what is your experience with prompts?

That’s another question that I would like to answer.

So these are a few questions that I wanted to talk about and I think these are fundamental to large language models and they also show where I am with these tools. I don’t see them as a replacement for people, I see them as tools that can help people be more effective when used properly. And this “when used properly”, this is the big question mark right now: how do we use them properly? Where do they apply most effectively? This is what I’m trying to figure out, and I’m trying to figure this out in the context in which there’s a lot of noise. People are pushing for products, for startups, for financing, there are a lot of money involved of course, there is a lot of noise around this and it’s probably a bubble as well so we’ll see but I’m always open to pragmatically using tools that help me so I’m going to look at how these tools help me and as I said in the video last week I still believe that more and more companies will ask for engineers for software developers to know these tools so I’m curious on how to use them effectively and I plan to do a series of experiments with large language models in various contexts and if you have any ideas for where we might try them uh let me know. Some ideas are: for domain modeling, for things like let’s say once you have a specification for your microservices written in open API specification then you can use that to generate microservices. There is a study about that that I’ve read and it was quite interesting. They hit some limitations but the question is does it help or does it just not work and of course I’m far from doing scientific experiments I will just try out things and see what happens and then maybe we’ll get to a collective wisdom around these topics. That’s what I would like to see

All right that’s it from me today. What do you think about all this? Let me know in the comments. Leave us a like, give us a share, subscribe to the channel because there’s more of this coming or if you want me to talk about other topics than large language models that’s fine as well but just let me know it’s great to hear from you

Thank you kindly for the view and until next time remember to Think, Design and Work Smart! See you next Saturday!