Snowpark Container Services - SnowPro Gen AI C02 study notes

SPCS is Snowflake's managed container orchestration. For the Gen AI exam, the relevant angle is GPU-backed model serving and custom Python runtimes that exceed warehouse limits.

Core concepts

Concept	What it is
Compute pool	Collection of VM nodes where services run. You specify machine type (CPU or GPU instance class), min nodes, max scaling.
Image registry	OCIv2-compliant registry inside your Snowflake account. Push with Docker CLI; images stored per-repository.
Service	Long-running containerized workload. Defined by a YAML service spec. Can expose endpoints.
Job	Run-once container (training, batch inference).
Service function	A SQL-callable function backed by a container service endpoint. Lets SQL queries invoke containerized code.
Service spec (YAML)	Defines containers, endpoints, volumes, resource requests, secrets.

When to use SPCS vs warehouses vs Cortex

Workload	Use
Stock SQL + Snowpark Python	Virtual warehouse
Call hosted LLM (Anthropic, Llama, Mistral)	Cortex AI functions
Open-source model you brought, GPU-bound inference	SPCS + Model Registry SPCS deployment
Fine-tuning open-source model (e.g., training Llama)	SPCS job with GPU compute pool
Custom Python service (Streamlit, FastAPI, vLLM, TGI)	SPCS service
Sidecar to vector store or other infrastructure	SPCS

GPU support

GPU instance classes available in compute pools for training and high-throughput inference
Used by Model Registry's SPCS deployment target to expose GPU-backed REST endpoints
Documentation confirms GPU support is the gate for serving large open-source models (Llama 70B etc.) yourself

Integration with Snowflake

Services can:

Connect back to Snowflake and run SQL (OAuth token injected)
Read/write stage files
Use role-based access control for service-to-service calls

Typical Gen AI SPCS workflow

Build a Docker image (e.g., vLLM server + your fine-tuned model weights)
PUT image to SPCS image registry
Create or pick a compute pool with GPU instance type
Define a service spec YAML declaring container, resources, endpoint
CREATE SERVICE ... FROM SPECIFICATION (or use Model Registry to handle this for you)
Either:
- Hit the service's HTTPS endpoint from the network, OR
- Wrap as a service function and call from SQL

Service spec YAML key fields

(Memorize the shape, not the exact keys, for the exam.)

spec:
  containers:
    - name: app
      image: /db/schema/repo/my-image:latest
      env:
        MODEL_NAME: llama-3-8b
      resources:
        requests:
          memory: 16Gi
          nvidia.com/gpu: 1
        limits:
          memory: 16Gi
          nvidia.com/gpu: 1
  endpoints:
    - name: api
      port: 8080
      public: true
  volumes:
    - name: stage-mount
      source: "@my_stage"
      target: /mnt/stage

Permissions checklist

USAGE on compute pool
USAGE on image repository
BIND SERVICE ENDPOINT (for public endpoints)
CREATE SERVICE on schema
Service runs as a role (the WITH OWNER role) — must have access to underlying Snowflake objects it queries

Pitfalls

Compute pools incur cost while active even if no service is using them (configurable auto-suspend)
GPU compute pools cost meaningfully more — quote-aware infra design matters
Network egress to the public internet requires external access integrations (network rules + secrets)
Image push uses Docker; auth via snow spcs image-registry token or snowsql