serverless compute

Modal

Serverless GPUs for inference, training, batch jobs and sandboxes — from a few lines of Python.

Not a model — the workshop floor your models run on. Modal gives you on-demand GPUs without managing infrastructure: define a function, attach a GPU, and it scales from zero to many and back. Ideal for fast inference, fine-tuning scripts in ~300 lines, parallel hyperparameter sweeps, and running agents in sandboxes.

What you can run

Inference

Deploy and scale inference for LLMs, audio and image/video generation.

Training

Fine-tune open-source models on single or multi-node clusters instantly.

Batch

Parallelize massive jobs with hyper-elastic compute infrastructure.

Sandboxes

Programmatically scale ephemeral environments for running untrusted code.

If you want to build…

Serve your model as an API
Inference endpointsAutoscaling
Fine-tune a small model
TrainingPay-per-second GPUs
Run an agent’s code safely
SandboxesIsolated containers
Massive parallel jobs
BatchHyper-elastic
MODAL · SPONSOR PRIZE

Best Use of Modal

To qualify ·Use Modal for the development or runtime of your app and note it in your Space README. 1st 10,000 · 2nd 7,000 · 3rd 3,000 credits.

20,000 credits
All prizes