serverless compute

Modal

Serverless GPUs for inference, training, batch jobs and sandboxes — from a few lines of Python.

Website Hugging Face X GitHub

01What they offer

Not a model — the workshop floor your models run on. Modal gives you on-demand GPUs without managing infrastructure: define a function, attach a GPU, and it scales from zero to many and back. Ideal for fast inference, fine-tuning scripts in ~300 lines, parallel hyperparameter sweeps, and running agents in sandboxes.

02Capabilities

What you can run

Inference

Deploy and scale inference for LLMs, audio and image/video generation.

Training

Fine-tune open-source models on single or multi-node clusters instantly.

Batch

Parallelize massive jobs with hyper-elastic compute infrastructure.

Sandboxes

Programmatically scale ephemeral environments for running untrusted code.

03Build this with us

If you want to build…

Serve your model as an API

Inference endpointsAutoscaling

Fine-tune a small model

TrainingPay-per-second GPUs

Run an agent’s code safely

SandboxesIsolated containers

Massive parallel jobs

BatchHyper-elastic

04Getting started

05Eligible prize

MODAL · SPONSOR PRIZE

Best Use of Modal

To qualify ·Use Modal for the development or runtime of your app and note it in your Space README. 1st 10,000 · 2nd 7,000 · 3rd 3,000 credits.

20,000 credits

06How to get help

Stuck? Holler.

Community Slack

modal.com/slack