Engineering

The essential guide to training and inference

I can confidently say that somewhere, somehow you have heard about ‘Generative AI (GenAI)’ unless you have been living under a rock. The power behind ChatGPT can turn your emotions into a song or text-to-image models such as Stable Diffusion that can render your family photo into a renaissance portrait. 

Although it seems pretty simple to throw in a text prompt and be satisfied with the output, Generative AI is not as simple or easy to understand. 

This blog will open the curtains and examine the two essential processes for generative AI to produce sensible results: “training” and “inference.”

Training: What, why and how?

There is no coincidence that the field of machine learning resembles neuroscientific terminology. By breaking down the components of the human brain we can build on the development of AI. We humans have billions of neurons in our brains that are shooting around to communicate with one another. Artificial neural networks (ANN) resemble this and comprise layers of nodes representing biological neurons. 

In AI, a weighted score is associated with the data parameters when data passes from one layer to the next. The machine learning process consists of repeated iterations of predictions, known as forward propagation and feedback, known as backward propagation. This process ensures that the weighting is so accurate that the right connection will always be chosen. 

Training process

Although AI does not understand your command immediately, it goes through a practised guessing phase on the data to generate an educated response. 

However, it wasn’t always this simple. Earlier methods consisted of training on “labelled data” supervised by human programmers, which eventually became tedious. The advent of big data provides AI models with a wealth of information, enabling them to engage in self-supervised and semi-supervised learning using unlabelled inputs. 

In order to leverage big data in training AI models, you will require extensive computing resources. Let’s take GPT-1 for example, it was released in 2018 and required 8 GPUs and used approximately 0.96 petaflop/s-days (pfs-days) of resources. In 2020, we had the release of GPT-3 which used up 3,630 pfs-days of resources. Although we do not have the numbers for GPT-4, we can assume that the computing resources required are greater than GPT-3.

Therefore, AI training requires powerful GPU computing to handle a large amount of data through parallel computing. Compute performance and efficiency go hand-in-hand, so it is clear that choosing the right GPU for your workload is imperative to the success of your AI model deployment. 

Inference: What, why and how?

You’ve trained and tested your model, and you’re wondering if it will sink or swim when exposed to new, unfamiliar data. This is where Inference comes into the picture. For example, a user could ask ChatGPT to write an essay about life on Mars and produce a painting depicting what this life would look like. 

From a logical perspective, Inference is known as an idea or conclusion based on evidence and reasoning. Based on experience, the inference process allows you to make collective decisions using numerous things you already know. 

During the inference process, the model will compare the parameters of the new input based on what it’s already learned during the training and testing phase to generate an accurate output. 

Inference process

During inference, human feedback on the model's responses can be collected to create labelled data for future training sessions. This process helps the model improve by reinforcing correct outputs and addressing criticism, ultimately enhancing accuracy.

This continuous loop of training and inferencing makes artificial intelligence smarter and more lifelike every day.

Computing resources and GPU acceleration are still important factors when it comes to inferencing, however, there is another element to take into consideration. Users want fast replies from AI models, therefore, latency is important. Latency is the time delay between when an AI system receives an input and when it generates an output and is typically measured in milliseconds.

Outside of latency, there are other common AI Inference metrics used, such as Throughput, which is the number of predictions an ML model can produce within a specific period without failing, and Accuracy, which doesn’t necessarily measure the accuracy of the inference process itself but rather the quality of your model’s outputs. 

Training and Inference with Nscale

As a fully integrated platform, Nscale provides not only bare metal and virtualised GPU nodes but also Kubernetes-native services (NKS), AI workload scheduling powered by SLURM, for training and inference of Gen AI models as well as advanced serverless inference services. 

If you are interested in trying out our inference platform, join the waitlist by clicking here

By choosing Nscale, you'll benefit from a robust, scalable AI infrastructure that grows with your needs, all while reducing costs and meeting sustainability goals. 

Nisha Arya
Head of Content
Bio

Data scientist turned marketer with 6+ years of experience in the AI and machine learning industry.

Explore More

A guide to AI frameworks for inference
Nscale $155M Series A: Words from the CEO
Nscale makes the Top500 List