Maximising Gen AI Performance: The GPU & Beyond

Karl Havard

November 5, 2024

•

2 minutes

Karl Havard

November 5, 2024

•

2 minutes

Organisations eager to enter the Generative AI market still have significant gaps and hurdles to overcome. Enterprises need to understand the tech stack and how the puzzle pieces fit together to fully benefit from Generative AI.

The tech stack

‍

Starting from the bottom

At the bottom of the stack, we have GPUs (Graphics Processing Units), which are known as the powerhouse of Generative AI. Many of us know of the infamous large language model ChatGPT, powered by NVIDIA’s A100s. Although GPUs fuel the future of Generative AI, they alone are only part of the equation.

‍

As we can see in the tech stack image above, many components need to be optimised to ensure you get the most out of your model and achieve maximum performance. Each component, interconnected within the system, presents potential bottlenecks or failure points. To ensure optimal efficiency and resiliency, it is imperative to optimise both intra- and inter-component connectivity.

But it doesn’t stop there. We now move on to the orchestration layer, for example, Kubernetes and/or Slonk: Slurm on Kubernetes.

As enterprise organisations begin to readily adopt GenAI the above elements will become more relevant to the use case at hand - model optimisation. For example, if you are focused on training - which is GPU intensive, tuning - where you bring your own data to the party or inference where you are deploying the use of your trained models.

At the top of the stack is the user interface, a component of the tech stack that typically gets neglected. However, a well-designed interface with the right connectivity and ease of use can accelerate the entire process and take an enterprise from development to deployment in a much faster timeframe.

Each of these layers offers enterprises the ability to achieve maximum optimisation so that their models reach peak performance.

‍

Beneath the surface

However, outside of choosing the right GPU for your workload or fine-tuning your model, various external factors often get overlooked.

Last year, there was significant concern around the GPU supply chains. But the question of where we house this infrastructure isn’t raised enough. The answer? You need a purpose-built data centre.

‍

‍

I’m not talking about data centres of the past - which are no longer fit for this purpose. The location of your data centre is critical. You will need to take into consideration how the location you choose will impact the community, sovereignty and compliance.

And all of this requires power, right? Lower power costs translate to more affordable services.

But does this come at the expense of ESG (Environmental, Social and Governance) goals?
Have you selected a cooling solution that aligns with your ESG goals?
What type of cooling have you chosen to go ahead with?
How much does it cost?
Does your choice in cooling offer a bi-product which could benefit the community etc?

Once you have addressed these elements, the next critical element is networking. If you locate natural resources, which are typically in remote locations, this can lead to a compromise in the network latency.

Last but not least, you want to look into the expertise that you can also offer.

‍

Maturity: Values, priority and focus

Looking at the current market, we have organisations that have the luxury to afford dedicated hardware to achieve their goal. But for many others, which I call the ‘Model Makers’, the questions they pose are:

How many GPUs do you have?
When are they available?
How much does it cost?

The Model Maker's main focus is gaining access to GPUs at scale and quickly.

If we look at it through an enterprise lens, the questions they pose focus more on outcomes and business values. The intricate details of the actual technology are deemed less important as they are more focused on what they want to achieve and how they can achieve it from an ESG, secure, compliant and economical perspective. Enterprise leadership are asking questions such as:

Can you help us use GenAI to create new products and services?
Can we create our own GenAI private cloud with Tier 3 capability?

‍

Correlation: Gen AI and business performance

As an enterprise, if you’re looking to implement or integrate Generative AI into your current processes or build a GenAI service, you want to focus on speeding up your training time to lead to faster time to deployment and faster outcomes.

To remain competitive, you also want to ensure that you have high-performance inferencing to provide your end users with a better customer experience.

‍

‍

Another business performance component which is on a lot of organisations' radars is hitting their ESG goals. Generative AI burns a lot of power and energy, for example, ChatGPT uses ten times the electricity of a Google search to provide an output, according to the World Economic Forum.

Key components to take into consideration are delivering secure, compliant and renewable energy for the future of AI. When you’re thinking about the future and sustainability of AI, you want to think about evergreen technology and how this can help you scale and grow.

If you’re interested in how you can better understand your tech stack, reach out to Nscale here.

Nscale delivers high-performance AI infrastructure with unmatched scalability and sustainability. Our vertically integrated platform, powered by 100% renewable energy, is designed to support demanding AI workloads with ease. With a GW+ pipeline of greenfield sites across Europe and North America, Nscale ensures we can deliver bespoke GPU clusters at any scale.

‍

Karl Havard

CCO

Bio

Karl Havard has 25+ years in the IT, Cloud and AI industry. Has previously held senior leadership roles inside AWS and Google as well as building and leading start-up businesses in the HPC and Generative AI cloud service provider industry.