The modern #AI revolution is definitely compute driven. The various neural networks use many different learning mechanisms, all of them fundamentally compute driven.
The Convolution Neural Networks (#cnn) use convolution kernels that in my mental model I equate to spatial summarization. Since image data is spatial in nature, CNNs do really well.
Next up is the #attention mechanism used in #Transformers, which is vector dot product (cosine distance) and measure the degree of similarity, and by extension interaction between those vectors (token embeddings or intermediate representations). This similarity mechanism is especially suited for language.
Then we have the various non-linearities such as #ReLU etc. which mimic a logic gate with an activation potential.
Finally we have the fully connected (or in #MoE designs, routed) feed forward layers that represent mapping layers that map inputs into arbitrary & learned output spaces useful for the particular task at hand.
All of these are compute heavy and increasing the density (i.e. the number of parameters) of models tends to blow up computational cost.
That is why the new #Blackwell #GPU from NVIDIA has been so well received. The insight in its design is that the power requirements of the optical networking make #scaleup a more power efficient design than #scaleout. As Jensen Huang said at #GTC24, performance/watt is important at scale current #lmm models are at.
https://lnkd.in/gKztTEDd
Director, Ecosystem at Arm
3wExciting news, I'm looking forward to this hardware! Congrats Reza Kazerounian and the Alif team!