Read all disclaimers at beave.rs/disclaimer
Last year, I developed a five-year plan for an organization. I successfully developed both a vision for the future and a roadmap to get there. The output was arguably very creative. Yet, at no step, did I make any "purely creative" decisions.
See, I am not a creative person by nature. I am very analytical and pattern-oriented. I also have managed a team of software engineers, who tend to be analytical and pattern-oriented. Yet, we needed creative and innovative output.
The answer was what I call systematic creativity. We developed workflows to break a goal down into first principles, make projections based on past patterns, and reassemble into an end result. This is how most inventions happen. True creativity, especially in the world of engineering, is difficult to find.
You can Emulate Creative Output through Logic and Patterns
How did I develop the five year plan?
- Use data to identify five core areas of improvement
- Analyze the current state in each area, and look at past experience for organizations in similar states
- Use the patterns in growth, and the problems faced along the way, in how each organization evolved to approximate how certain initiatives would change the state of our organization
- Identify the initiatives that offer the greatest potential risk-adjusted ROI
- Assemble each individual initiative into a comprehensive plan by identifying commonalities
- Time segment the steps required to achieve each objective over the available time, using past data as an approximation of duration
The outcome, a comprehensive five-year plan with deadlines attached, was entirely the result of pattern recognition on first principles. I have used systematic creativity to bring radical restructuring of codebases and the development of new mechanical mechanisms.
Generative AI has shown that something as abstract as creativity can be emulated through pattern recognition. This is a far departure from the typical applications of data analysis. However, some of the most popular models for text and image generation have successfully simulated conversation and creativity. This is a positive indicator for pattern recognition as viable in the real world. At the same time, it has caused people to overestimate the capabilities, and treat convincing communication as an indicator for truth.
The Problem of Size
The problem with LLMs is that they are so large. This makes them:
- Very expensive to run (and infeasible to run on edge devices)
- Not very good at specific tasks
The rapid mainstreaming of LLMs has led to concerning overuse. Common examples I have seen are for data analysis or filling out security questionnaires. Once again, LLMs are only based on predictions of next words. They can't analyze data or understand your security posture.
The models are growing in size. GPT-4 is several times larger than GPT-3 for what Mashable called "subtle improvement." These models are already ridiculously expensive to run, far more than what could ever conceivably be run on the edge.
To their credit, GPT-3 itself is close to the asymptote of what is possible for LLMs. It was designed as a research project rather than a commercial product, and they are trying to adapt to misuse. I don't know anyone at OpenAI, but I imagine some of them are concerned at how the project is being used far beyond its limitations.
The problem going forward is that the LLM models will continue to grow exponentially with marginal returns. These larger models are designed to reduce false hallucinations. Yet, larger models that improve hallucination errors do not change the inherent design choices that led to hallucinations in the first place.
People talk about hallucination as incorrect statements made by an LLM. However, I think instead that something can be a hallucination, even if it is correct. Language models are designed to have a deep understanding of language patterns. When it says something correct, it does so accidentally.
Typically, an LLM will create true statement because it has been trained to see words in a particular order. The same fact tends to have similar word patterns across different pieces of content, so that will tip pattern recognition towards the truth. However, it does not understand anything as truth itself.
People are increasingly using LLMs like GPT-4 for logical reasoning. I have seen people say that you should use ChatGPT to analyze spreadsheets. When they do that, they are depending on hallucinations. Any time an LLM generates something that was not in the prompt, even if it is true, it is a hallucination. There are tools that can be used for statistical analysis which are designed specifically for that purpose. Yet, LLMs cannot do that natively, and instead will return a well-written hallucination.
Hallucinations are not a design fault of the model. Instead, it is a user fault. If LLMs were used as nothing more than a way to generate text, we would not have hallucinations. Instead, people engage with LLMs as a way to ask questions and get answers. Or as a way to analyze data in spreadsheets. Or countless other tasks besides generating text itself.
Data Flows and Specialized AI
Hallucination cannot be solved through larger large language models. Ultimately, any facts that it may have will always be incidental and based on language patterns, rather than analysis on actual data. Thus, the solution to hallucination is specialization.
That doesn't have to be the case. There are AI models that are optimized for recognizing patterns in data. And the smaller the domain of the training data, the more efficient they can be and the more accurate they can be.
This is why, I believe, we are heading in a direction of smaller, more specialized models that pass data between each other. If we can break a task down into its core rules and processes, then we can have smaller and cheaper models that deliver building blocks. These can then be recombined with models (like GPT-4) in order to make a complete product.
You can reduce the dependence on hallucination, and instead leave generative models to do what they do best: generate content. The massive models used for language and image generation could be scaled back. Theoretically, we could even reach the day where generative AI can run on the edge if they can scale back enough.
The LLM Mistake
Many companies are trying to adapt LLMs into their workflow. Tools like Microsoft Copilot are powerful. Investors, driven by AI hype are going to be emphasize time-to-adoption for "AI." That being said, they are going to be over-confident in the power of tools like Copilot.
The models will continue to grow, and the increases will only be marginal. At some point, organizations will realize that Copilot is helpful but insufficient for AI assisted workflows. At that point, they will start collecting data to train their own models or integrate open-source models into their workflow.
Time to adoption is important, and I am excited to see many enterprises investing in AI-assisted workflows. That being said, the company that adopts the small, specialized model first will be much more efficient 3-5 years in the future.
Companies should leverage tools like Copilot for now. They should understand how AI may be able to help different workflows (for both analytical and creative positions). They should also collect training data to help their workflows three years in the future. It takes time to collect data, which means when Copilot becomes insufficient, the company with the largest specialized data set will have the fastest TTM.
No matter how large an LLM is, it will always be an LLM. There seems to be widespread misunderstanding of what LLMs can do and what they can't. Reliance on hallucination, even if the hallucinations are sometimes true, will mean that the results will always be unpredictable. Let's leave LLMs to language patterns and build relevant models that better achieve our goals.