Air view of the crowd connected to lines
One thing this year is teaching us that innovators can transcend the boundaries of traditional scale laws, and turn to new types of systems that increase productivity and performance.
Last November we were listening a lot about the laws of scaling that led to a reduction in returns, and what the laboratories would do in response. Satya Nadella of Microsoft (among other things) mentioned the test time calculates the research, saying: “We are seeing the emergence of a new scaling law.”
In fact, you can return in 2023, where Sam Altman was suggesting that the age of the giant models he was already close to. Researchers began to look for a better modeling architecture than the transformer, who, as a attention mechanism, was directing the performance of the model’s results.
Are new models emerging?
Some vanguard companies are looking to change how to set the LLM in order to follow greater benefits.
For example, there is a symbol, a company that wants to address the scaling riddle by building models based on symbols collections.
The founder of Symbolica, George Morgan, says we must stop thinking about building larger models and turn to other ways to improve systems.
“The overwhelming thesis that everyone is getting now is that we have to build bigger models,” Morgan said in January in an interview with Forbes Randall Lane at the Davos IIA event. “So you see the big players in him by raising billions of dollars, and then immediately turning around and delivering that money to Nvidia or other calculation providers to train their models. And all these models are based on exactly the same architecture.
This homogeneous approach, he claimed, leads to a somewhat predictable result.
“There is little differentiation between them in terms of their ability,” he said. “And all of these models are a kind of flattening and convergent towards exactly the same throughout the table, and that it is not very good for the consumer and not very good for the business, because everyone is constantly hungry for more skills, and these model providers are not looking like they will be able to provide it. Higher performance, we will have to build a different architecture that empowers these models. We design this progress? and we are executing this. “
To help this goal, Symbolica has some big names after him, including Vinod Khosla, and $ 33 million in the financial runway.
Looking at the other side of the basin
Another thing that came out during Morgan’s interview is that the company is moving to Europe after collecting its funds in the SH.BA
“We decided to move to Europe, because a lot of talent in the mathematical disciplines that we have to do this work come from different universities there,” he said.
Explaining the company’s system, he spoke about how the name, the symbol, evokes the idea of a collection of symbols.
“This is somewhat a return to how he was, before nerve networks and deep learning, … The company’s name symbolizes the fact that we are taking these symbols and will bring them back into action, rather than simply building the giant matrix you are multiplying in GPUs in ways you do not understand.”
Energy and landscape
To complete the interview, Lane and Morgan spoke about the energy people assume it will need to build models and maintain their performance.
In the history of technology technologies they only become more accessible, it has become cheaper and the more widespread is being done, this is exactly the opposite trend these models are taking
“In the (history) of technology, technology has become only more accessible it has become cheaper, and the more widespread it becomes,” he said. “And this is precisely the opposite trend that these models he is getting now. They are becoming less accessible. There are only, like, five players in the space they can afford to train art models. They cost billions of dollars. There will be a much more innovation if we can reduce that cost, you know, one to two, three orders.”
In short, he suggested that we reduce things down and move towards what many see as decentralization of smaller models on skirts.
More new strategy
I feel like I would be regretted not to mention the appearance of liquid networks, which increase the performance of smaller models through the refining the way LLM is placed. I will include the usual denial that I have been consulted on some of these projects undertaken by people on Liquid he and Mit’s Cliail Lab.
And then there is that agent
Model cooperation
This is really a basic point in any new analysis of LLM systems. The idea is that you do not need to have a large monolithic engine, because you can have many young people working together. A small model model can handle one or more tasks, and the result will be a detailed and powerful digital sensitivity that can handle a wide range of projects.
I always go back to the book of Marvin Minsky’s mind society, where he describes the human brain as a similar type of tens of places, working at the same time, or more precisely, synchronously, to produce everything we can do with our extraordinary biological machines.
Again, a single system he or the agent does not need to be great. It simply has to be positioned properly to contribute what can to the greatest collective good.
And this can be a lesson that we can learn as people, too.