Snowpark Container Services - SnowPro Gen AI C02 study notes
SPCS is Snowflake's managed container orchestration. For the Gen AI exam, the relevant angle is GPU-backed model serving and custom Python runtimes that exceed warehouse limits.
Core concepts
| Concept | What it is |
|---|---|
| Compute pool | Collection of VM nodes where services run. You specify machine type (CPU or GPU instance class), min nodes, max scaling. |
| Image registry | OCIv2-compliant registry inside your Snowflake account. Push with Docker CLI; images stored per-repository. |
| Service | Long-running containerized workload. Defined by a YAML service spec. Can expose endpoints. |
| Job | Run-once container (training, batch inference). |
| Service function | A SQL-callable function backed by a container service endpoint. Lets SQL queries invoke containerized code. |
| Service spec (YAML) | Defines containers, endpoints, volumes, resource requests, secrets. |
When to use SPCS vs warehouses vs Cortex
| Workload | Use |
|---|---|
| Stock SQL + Snowpark Python | Virtual warehouse |
| Call hosted LLM (Anthropic, Llama, Mistral) | Cortex AI functions |
| Open-source model you brought, GPU-bound inference | SPCS + Model Registry SPCS deployment |
| Fine-tuning open-source model (e.g., training Llama) | SPCS job with GPU compute pool |
| Custom Python service (Streamlit, FastAPI, vLLM, TGI) | SPCS service |
| Sidecar to vector store or other infrastructure | SPCS |
GPU support
- GPU instance classes available in compute pools for training and high-throughput inference
- Used by Model Registry's SPCS deployment target to expose GPU-backed REST endpoints
- Documentation confirms GPU support is the gate for serving large open-source models (Llama 70B etc.) yourself
Integration with Snowflake
Services can:
- Connect back to Snowflake and run SQL (OAuth token injected)
- Read/write stage files
- Use role-based access control for service-to-service calls
Typical Gen AI SPCS workflow
- Build a Docker image (e.g., vLLM server + your fine-tuned model weights)
PUTimage to SPCS image registry- Create or pick a compute pool with GPU instance type
- Define a service spec YAML declaring container, resources, endpoint
CREATE SERVICE ... FROM SPECIFICATION(or use Model Registry to handle this for you)- Either:
- Hit the service's HTTPS endpoint from the network, OR
- Wrap as a service function and call from SQL
Service spec YAML key fields
(Memorize the shape, not the exact keys, for the exam.)
spec:
containers:
- name: app
image: /db/schema/repo/my-image:latest
env:
MODEL_NAME: llama-3-8b
resources:
requests:
memory: 16Gi
nvidia.com/gpu: 1
limits:
memory: 16Gi
nvidia.com/gpu: 1
endpoints:
- name: api
port: 8080
public: true
volumes:
- name: stage-mount
source: "@my_stage"
target: /mnt/stage
Permissions checklist
USAGEon compute poolUSAGEon image repositoryBIND SERVICE ENDPOINT(for public endpoints)CREATE SERVICEon schema- Service runs as a role (the
WITH OWNERrole) — must have access to underlying Snowflake objects it queries
Pitfalls
- Compute pools incur cost while active even if no service is using them (configurable auto-suspend)
- GPU compute pools cost meaningfully more — quote-aware infra design matters
- Network egress to the public internet requires external access integrations (network rules + secrets)
- Image push uses Docker; auth via
snow spcs image-registry tokenorsnowsql