ETH Zurich · Agentic AI Education Platform

Build. Collaborate.
Stay at the frontier.

A professional intelligence hub for building agentic AI workflows, evaluating foundation models, and connecting with a community of industry leaders.

Workspace

Three Ways to Work

Choose your interface. The platform adapts to your expertise — from code-free guided analysis to full Jupyter-compatible notebooks. All modes share the same models, tools, and evidence generation.

Guided Analysis

Step-by-step wizard for structured workflows. Choose a goal, select data, configure methods — the platform handles the orchestration. No code required.

  • Goal-driven workflow
  • Template library
  • Auto-method selection
  • Built-in validation

AI Copilot

Natural language interface backed by specialized agents. Ask questions, get grounded answers with citations. Agents route to the right tools and models.

  • Multi-agent routing
  • Source attribution
  • Tool use visible
  • Cost tracking per query

Expert Notebook

Interactive code environment for full control. Write Python, build pipelines, run experiments. Jupyter-compatible with integrated model access and MCP tools.

  • Python execution
  • Jupyter-compatible cells
  • Integrated model API
  • MCP tool imports
  • Export as .ipynb

Foundation Models

Google

Gemini 2.5 Flash

Fast, efficient reasoning model for high-throughput agentic tasks. Excellent cost-performance ratio.

1M tokens$
Google

Gemini 2.5 Pro

Most capable reasoning model with deep thinking. Best for complex multi-step workflows.

1M tokens$$$
Anthropic

Claude Sonnet 4

Balanced intelligence and speed. Strong at code generation, analysis, and nuanced reasoning.

200K tokens$$
Anthropic

Claude 3.5 Haiku

Fastest Anthropic model. Ideal for real-time agent routing and lightweight tasks.

200K tokens$

Learn

Tutorials

BeginnerCAS GenAI
30 min

Your First Agentic Workflow

Build a simple agent that uses tools to answer questions. Learn agent loops, tool calling, and structured responses using LangGraph.

6 of 6 steps

IntermediateCAS GenAI
60 min

RAG Pipeline from Scratch

Ingest documents, chunk intelligently, embed with multiple models, and build a retrieval pipeline. Compare embedding strategies and measure retrieval quality.

3 of 8 steps

AdvancedCAS GenAI
90 min

Multi-Agent Orchestration

Design agent teams — router agents, specialist agents, critic agents. Learn delegation patterns, consensus mechanisms, and graceful failure handling.

0 of 10 steps

IntermediateCAS GenAI
45 min

Tool Use with MCP

Connect agents to external tools via Model Context Protocol. Build custom MCP servers, integrate databases, APIs, and enterprise systems.

0 of 7 steps

IntermediateBoth
45 min

Model Comparison & Evaluation

Systematically compare foundation models on your tasks. Build evaluation harnesses, measure quality/cost/latency trade-offs, avoid sycophantic agreement.

0 of 6 steps

AdvancedBoth
60 min

Structured Outputs for Production

Guarantee JSON schema compliance. Use constrained decoding, Pydantic models, and validation pipelines for reliable, auditable production systems.

0 of 8 steps

Community

Discussions

Which model handles regulatory text best in production?

Dr. Sarah Meier·24 replies·2 hours ago

MCP tools we've built — share yours

Marco Bernasconi·18 replies·5 hours ago

Paper discussion: Causal Foundation Models reliability concerns (April 2026)

Prof. Anna Kovács·31 replies·1 day ago

Cost optimization tricks for multi-agent workflows

Lucas Tran·12 replies·2 days ago

Community

Active Challenges

Expert34 participants

Build an explainability agent for automated decisions

Create an agent that can explain any ML model's decision in natural language, with causal reasoning and counterfactual explanations. Must comply with EU AI Act Article 86 requirements.

Deadline: June 30, 2026
Advanced21 participants

RAG pipeline under CHF 0.10 per query

Design a retrieval-augmented generation pipeline that maintains quality while keeping per-query costs below CHF 0.10. Evaluate on the community benchmark dataset.

Deadline: July 15, 2026

Evaluate

Rigor & Governance

Model Arena

Side-by-side model comparison with structured evaluation

Compliance Sandbox

Test workflows against regulatory requirements before deployment

Evidence Packages

Signed, auditable compliance artifacts with full methodology trail

Cost & Carbon

Real-time sustainability and efficiency metrics per workflow

Evaluate

Evidence Packages

Every analysis generates a signed, tamper-proof artifact. Exportable as PDF, executable notebook, or structured JSON.

Methodology

Which method was used and why. Identification strategy and assumptions.

Data Profile

Dataset description, feature distributions, missing values, quality checks.

Results

Point estimates with confidence intervals, effect sizes, statistical significance.

Validation

Refutation tests, sensitivity analysis, robustness checks. What could invalidate the finding.

Limitations

Where causal claims break down, known confounders, generalizability bounds.

Decision Trace

Full log of agent decisions, tool calls, model invocations. Reproducibility hash.

Export as:PDFJupyter Notebook (.ipynb)JSON MetadataLaTeX

For Instructors

Instructor Console

Full control over content, cohorts, budgets, and assessment. The platform adapts to any CAS programme structure. Integrates with ETH Moodle via LTI.

Cohort Management

Create cohorts, assign students, set programme dates. Integrates with Moodle via LTI.

Content Management

Create and organise tutorials, scenarios, and exercises. Align with any CAS programme structure.

Budget Configuration

Set per-user and per-cohort budgets for model API usage. Monitor spending in real-time.

Scenario Configuration

Configure evaluation scenarios with custom rubrics, datasets, and scoring dimensions.

Live Monitoring

See student activity in real-time — who is working, which models they're using, where they're stuck.

Review & Assessment

Review student workflows, evidence packages, and notebook submissions. Export grades to Moodle.

Learn

Research Feed

Causal Foundation Models: Promise and Production-Readiness

Zhang et al. · ICML 2026 2026

Systematic evaluation of CausalPFN and Do-PFN reliability. Found poor uncertainty coverage in out-of-distribution settings.

Causal AIFoundation Models

LLM Agents as Causal Orchestrators, Not Causal Reasoners

Kiciman et al. · NeurIPS 2025 2025

LLMs perform poorly at causal reasoning but excel at routing to formal causal methods. The orchestration pattern outperforms end-to-end approaches.

AgentsCausal AI

Structured Outputs at Scale: Constrained Decoding in Production

OpenAI Research · arXiv 2026 2026

How to guarantee JSON schema compliance at inference time without quality degradation. Benchmarks across 7 models.

Structured OutputProduction

The MCP Standard: Universal Tool Integration for AI Agents

Anthropic · Anthropic Technical Report 2025

Model Context Protocol specification and adoption patterns. How tool ecosystems scale beyond single-vendor APIs.

MCPTools

Industry

Expert Network

SM

Dr. Sarah Meier

Head of AI, Swiss Re

12 workflows

MB

Marco Bernasconi

Principal Engineer, PostFinance

8 workflows

LT

Lucas Tran

VP Analytics, Zurich Insurance

15 workflows

Industry

Portfolio Showcase

Multi-Model Regulatory Review Agent

by Dr. Elena Rossi

An agent pipeline that reviews regulatory documents across 3 models, synthesizes findings, and generates a compliance report with citations.

47 123 models

Agentic Document Q&A with Evidence Trail

by Thomas Gruber

RAG pipeline that answers questions from uploaded PDFs, with full evidence trail showing which chunks were retrieved and why.

38 92 models

Cost-Optimized Routing Agent

by Marco Bernasconi

Smart router that classifies query complexity and routes to the cheapest model that can handle it. 73% cost reduction vs. always using the largest model.

62 183 models

Trust & Safety

Grounded · Honest · Compliant · Transparent

Every claim backed by evidence. Every decision traceable. Every output auditable. Not by policy — by architecture.

Grounded

Every claim backed by evidence

  • Source Attribution

    Every agent response cites retrieved sources. No unsupported claims — retrieval chunks linked, confidence scored, gaps flagged.

  • Hallucination Detection

    Cross-reference pipeline checks outputs against retrieved evidence before delivery. Inconsistencies surfaced, not hidden.

  • Calibrated Uncertainty

    Models report what they don't know. Low-confidence answers are marked, not presented as fact. Uncertainty quantiles, not just point estimates.

Honest

Pushback over agreement

  • Honest Disagreement

    Agents push back on incorrect assumptions with evidence. If the premise is wrong, the system says so — never agrees to be agreeable.

  • Multi-Model Consensus

    Same query, multiple models. Disagreements surfaced explicitly. Consensus builds confidence; divergence signals caution.

  • Built-in Red Teaming

    Challenge workflows with adversarial inputs, edge cases, and contradictions before deployment. Know where your system breaks.

Compliant

Regulation-ready by design

  • EU AI Act Readiness

    High-risk obligations enforceable August 2, 2026. Auto-generated Article 86 documentation, audit trails, and explainability artifacts.

  • Signed Evidence Packages

    Every analysis produces a cryptographically signed artifact: methodology, assumptions, results, validation, limitations. Tamper-proof.

  • Swiss Data Sovereignty

    All compute in Switzerland (Zurich). Data never leaves Swiss jurisdiction. FADP/nDSG compliant by architecture, not by promise.

Transparent

Nothing hidden, everything traceable

  • Full Decision Trace

    Every agent decision logged: tools called, alternatives considered, reasoning exposed. Research-grade audit trail for every run.

  • Reproducibility by Design

    Every workflow run versioned, hashable, re-runnable. Pin model versions, fix seeds, lock tools — deterministic replay guaranteed.

  • Cost & Carbon Accounting

    Per-query cost tracking by model. Carbon emissions from Swiss grid mix. No hidden costs, no unmetered usage.

Learn

Guest Speakers

upcomingJune 18, 2026

Dr. Ilya Sutskever

Co-founder, SSI

What AI Safety Means for Enterprise Deployment

recordedMay 14, 2026

Dr. Judea Pearl

Professor, UCLA

Causal Reasoning in the Age of Large Language Models

recordedApril 23, 2026

Amanda Askell

AI Policy Lead, Anthropic

Designing AI Systems That Know What They Don't Know