KOLMOGOROV-ARNOLD NETWORK
(KAN)

Silas Liu - May 11, 2024

Neural Networks, Topology

Kolmogorov-Arnold Network (KAN) is a new neural networks topology proposed by researches from four institutions, including Massachusetts Institute of Technology and California Institute of Technology, submitted recently, at April 30th 2024.

This article caught my attention by bringing such innovative content with immense revolutionary potential. Its topology differs entirely from traditional Multi-Layer Perceptrons (MLPs) and brings a series of gains such as: higher accuracy with fewer parameters and, mainly, with potential for high visual explainability. There are opportunities for KAN models to become foundation models for future scientific discoveries.

PAPER

GITHUB

I can't recall the last time I was so amazed by an article. Kolmogorov-Arnold Network (KAN) is a new neural networks topology, completely different from the traditional Multi-Layer Perceptrons (MLPs), recently submitted (April 30th 2024) by researches from four institutions, including MIT. I believe this new topology has revolutionary potential, bringing a new perspective to neural networks applications.

The main differentiators of this new topology are models with higher accuracy and fewer parameters, compared to MLPs, and, most importantly, their high explainability. In an example of Partial Differential Equation (PDE) solving, a 2-Layer width-10 KAN achieved an accuracy 100 times better than a 4-Layer width-100 MLP (10e-7 vs 10e-5 MSE) while being 100 times more parameter-efficient (10e2 vs 10e4 parameters). KAN models open doors for new use cases of neural networks: foundation model for scientific discovery, referred to in the article as AI+Science.

The article is comprehensive and presents a series of mathematical analyses, always comparing KAN with MLPs. Among the analyses performed, considerations were made regarding: hyperparameters of the networks (structure and number of layers, nodes, grid points, total parameters), training and testing metrics (accuracy, RMSE, r2), implementation difficulties (scaling laws, curse of dimensionality, catastrophic forgetting in continual learning) in both supervised learning (regression and classification) and unsupervised learning.

Fundamentally, the KAN structure is based on activation functions learnable on edges between the nodes, unlike the traditional MLP, where activation functions are fixed in the nodes and the weights are learnable. The activation functions considered in this study are compositions of splines, but there is the possibility of employing more complex functions, as well as kernels.

Image original from the paper.

Several techniques are analyzed for the construction of the structure and functions of the KAN. Techniques such as regularization and pruning are proposed for the macro structure of the network, while techniques such as grid extension are applied to the micro structure of the functions. This gives flexibility to the network, in order to avoid classical problems such as the curse of dimensionality and catastrophic forgetting in continual learning. All related mathematics are presented throughout the paper.

An example case is presented in detail and illustrated as folows:

Image original from the paper.

The initial structure follows the Kolmogorov-Arnold law, which sets the number of layers and nodes according to the problem input: [n_in, 2*n_in+1, 1]. Thus, in this example, with two inputs, th network has [2, 5, 1] nodes.

Sparsification is applied using adapted regularization methods.
Pruning is applied, and the resulting network has only [2, 2, 1] nodes, much simpler than the initial structure. This corresponds to a [2, 1, 1] KAN.
Symbolic formula is applied, and mathematical functions are assigned to each of the functions found by the network.
Train affine parameters using techniques such as iterative grid search and linear regression.

The KAN structure is capable of extracting intrinsic mathematical relationships. Several mathematical functions were tested: PDE solving, high-dimensional examples (100 input dimensions), multivariate special functions (Bessel, Jacobian, Legendre), Feynman dataset (fitting physical equations such as: solving Navier-Strokes equations, density functional theory, regressions).

Following images visually demonstrate the comparison of errors achieved by KAN and MLP networks, versus the number of parameters of the models, in classic mathematics problems. In all of them, KAN outperformed with better results.

Images original from the paper.

One of the main uses of KAN is to apply symbolic formulas to the functions found and extract, in this way, visual explainability to what the neural network is doing. As a simple example, the product of two numbers:

In KAN it is represented by linear and quadratic functions:

In this case, the network discovered the functions, and its final structure has 3 layers of nodes and 2 layers of functions, entirely explainable. Several other functions are explored.

Image from the paper with my own annotations on example (a).

Extrapolating this ability of KAN topology to complex problems, the paper analyzes its application in two current cutting-edge research areas. In both case studies, KAN were able to detect correlated variables and model formulas and mathematical relationships deduced by scientists.

Knot Theory: a research area of Mathematics, which has applications in several different areas: structure and behavior of DNA molecules in Biology, algorithms and cryptography protocols in quantum computing, string theory and superstrings in Theoretical Physics.
Anderson localization: a fundamental phenomenon in Quantum Physics, with several applications such as material conductivity in materials science, optics and propagation, electrical properties in nanotechnology.

In my opinion, the KAN topology brings a new shine and concepts to neural networks applications and can contribute greatly to future scientific research. One point is that the paper fully analyzes the use of KAN for scientific-related tasks, but there is still a huge space for development, analysis, and testing, mainly applications in Machine Learning related tasks, such as transformers and CNNs. At the moment, the biggest limitations encountered with KAN topology were their training time compared to MLPs and dependence on the network structure, wich may vary significantly and be non-trivial. I am looking forward to the next steps of the Data Science community.

KOLMOGOROV-ARNOLD NETWORK (KAN)

KOLMOGOROV-ARNOLD NETWORK
(KAN)