This post is a rebuttal on Andrej Karpathy’s Software 2.0. It is brilliant and you should definitely read it but I thought the critical takeaway was lost in the many great points and I wanted to make that more explicit.


Traditional methods for solving a software problem involve people coming up with the required algorithmic steps to tackle it.

For example, think about a problem of estimating the depth of an object from an image. A traditional algorithm would contain the following steps

  • Gather 2 frames with enough baseline such that there is change within the scene but not too much

  • Find features such as SIFT, etc. on one frame, assuming the same features also exist on the other frame.

  • Find correspondence between these features, filter the correspondences and estimate a 2D flow

  • Derive relative depth from the flow

Scaling with data

Such an approach would work good enough in general and can be deployed for all kinds of scenarios which satisfy the approach’s approximations/assumptions but its performance would not depend on the amount of available data.

s

Performance of a traditional algorithm does not depend on the amount of avaiable data

How can we do better?

We can try to improve each of the approach’s steps and gain overall performance, and decades of research in Computer Vision does exactly that. But, by changing our fundemental thinking towards learning based approaches, unprecidented improvements were realized.

Era of learning

Learning based approaches replaced the handcrafted features (SURF in the above example) to derive these features from the data. Neural networks replaced the rest

Checkout my blog post on self-supervised learning for foundation models

Neural networks are function approximators

Traditional approach is to come up with a model/algorithm that converts an input to an output.

$ \text{input} \rightarrow \ model \ \rightarrow \text{output} $

Deep learning replaces this with 2 steps:

  1. Training: Given many examples of input and output data, the model is learnt.

  2. Deployment: The learnt model is used on new inputs to infer their outputs.

This allows the performance of the model to be dependent on the amount of training data available.

s

As the training data increases, the performance of a DL model increases but traditional methods do not depend on the training data

Production time

Traditional algorithms have to be almost always completely re-written from their prototyping phase to production while keeping the computational power and memory in check. At test time, Deep learning replaces all of this with just one model inference call. So, we go from re-implementing production level code to passing the input through a bunch of linear algebra. Neural networks cost same amount of memory and even better comoutational effeciency during inference time.

Algorithm 2.0ic thinking

  1. First, to gain performance, it is not eough to copy paste one training example into multiple copies. The model needs to be intellignetly scaled by feeding in huge amounts of new data with much variation, representing the true population as truely as possible.

  2. Secondly, new data doesn’t come for free. It is labor-intensive and expensive to label data. Smart gathering is required. One cannot just copy paste existing data and assume the data has increased. It has to increase meaningfully.

  3. Thirdly, in order to be able to feed in that much amount of data, there needs to be infrastructure in place.

  4. As in the case of traditional algorithms, one cannot just hand-engineer the algorithm and use it in deployment forever. One needs to continuously improve/train the learnt model and continuoiusly deploy it. This is also where the data can be selected smartly for the next iteration of improvement.

The holy grail? or just a fad?

For many applications, the performance gain is really not that critical. Other factors such as explainability, fail-safe, etc are equally important, which the current deeo learning approaches lack. The availability of data is another critical factor in deciding to choose learning based approaches.

s

If data is limited (green part), traditional algorithms provide better performance

Continuous improvement

But, if the performance gains are to be realized, scaling with data is the way to go.

Amount of data used to train notable AI systems. Source: Our World in Data

And the way to execute this smartly is by continuous improvement

s

Continuous Improvement loop