Product

Introducing Nscale Serverless Inference: Scalable AI without infrastructure hassles

Deploying AI models should not be complex. However, many organisations and teams face the overwhelming challenges of high infrastructure costs, difficult integrations, and unpredictable scaling needs, which can slow down the innovation process. 

Today, we're announcing the launch of Nscale's Serverless Inference platform, the first public on-demand offering within its broader AI infrastructure suite. This service allows developers and enterprises with instant access to deploying popular leading AI models at scale without the complexities of managing the underlying infrastructure, complementing Nscale’s established private cloud solutions tailored for large-scale enterprise AI workloads.

A simple path to production

Our platform provides a library of pre-trained models for various tasks, including models from popular labs such as Meta's Llama, Alibaba's Qwen, and DeepSeek. Developers can invoke these pre-built models through simple APIs or via the Nscale web interface with our pay-as-you-go solution. Over time, Nscale will expand this selection based on your feedback and continue curating our endpoints to include new leading models as they emerge.

OpenAI API and SDK compatibility ensure teams can leverage existing code and tools, making it straightforward to build new AI-powered features or migrate existing ones. This significantly reduces development overhead and allows teams to focus on their core application logic.

Cost-effective serverless architecture

Nscale’s Serverless Inference platform follows a simple pay-per-request model. Pricing is based on input and output tokens for text and vision models. For example, in image generation, pricing is based on output image dimensions, ensuring you’re only billed for the resources required for your workload. Our vertically integrated stack is optimised at each layer, meaning lower compute costs and savings that we pass directly to our customers.

To help you start building, every new user can claim $5 of free credit to explore our models. When you’re ready to scale, simply add a payment card to purchase additional credit.

This approach solves a major challenge in AI deployment: managing complex infrastructure across multiple models and serving frameworks. While traditional deployments often struggle with capacity planning and resource allocation, Nscale Serverless eliminates these concerns by adding capacity automatically and scaling up with your traffic.

Production-ready infrastructure

Our platform handles all scaling, monitoring, and operational aspects, allowing your engineering teams to focus on delivering business value.

Designed for security, availability and reliability, the platform ensures protection without added complexity. All endpoints are served over encrypted connections protected by your API credentials. The Nscale Serverless Inference platform does not log or train on request or response content, ensuring the privacy and security of your data.

Getting started

To start using the Serverless Inference service:

  1. Sign up here
  2. Claim your free credit and create your API key
  3. Browse the available models
  4. Begin making inference requests via the OpenAI-compatible API

The service launches with detailed documentation covering everything from basic setup to advanced usage patterns. This includes implementation guides, API references, and example applications. Our team of experts is available to answer any questions about the service and help with onboarding.

We're committed to making the AI user experience simple and cost-effective. For full pricing information, visit our pricing page.

Steven Crake
Principal Software Engineer
Bio

Explore More

Distributed fine-tuning at scale: Deepseek-R1
A guide to AI frameworks for inference