ETH Zurich · Agentic AI Education Platform

Build. Collaborate.
Stay at the frontier.

A professional intelligence hub for building agentic AI workflows, evaluating foundation models, and connecting with a community of industry leaders.

127

Members

342

Workflows

1247

Completed

Workspace

Three Ways to Work

Choose your interface. The platform adapts to your expertise — from code-free guided analysis to full Jupyter-compatible notebooks. All modes share the same models, tools, and evidence generation.

Guided Analysis

Step-by-step wizard for structured workflows. Choose a goal, select data, configure methods — the platform handles the orchestration. No code required.

Goal-driven workflow
Template library
Auto-method selection
Built-in validation

AI Copilot

Natural language interface backed by specialized agents. Ask questions, get grounded answers with citations. Agents route to the right tools and models.

Multi-agent routing
Source attribution
Tool use visible
Cost tracking per query

Expert Notebook

Interactive code environment for full control. Write Python, build pipelines, run experiments. Jupyter-compatible with integrated model access and MCP tools.

Python execution
Jupyter-compatible cells
Integrated model API
MCP tool imports
Export as .ipynb

Foundation Models

Google

Gemini 2.5 Flash

Fast, efficient reasoning model for high-throughput agentic tasks. Excellent cost-performance ratio.

1M tokens$

Google

Gemini 2.5 Pro

Most capable reasoning model with deep thinking. Best for complex multi-step workflows.

1M tokens$$$

Anthropic

Claude Sonnet 4

Balanced intelligence and speed. Strong at code generation, analysis, and nuanced reasoning.

200K tokens$$

Anthropic

Claude 3.5 Haiku

Fastest Anthropic model. Ideal for real-time agent routing and lightweight tasks.

200K tokens$

Learn

Tutorials

BeginnerCAS GenAI

30 min

Your First Agentic Workflow

Build a simple agent that uses tools to answer questions. Learn agent loops, tool calling, and structured responses using LangGraph.

6 of 6 steps

IntermediateCAS GenAI

60 min

RAG Pipeline from Scratch

Ingest documents, chunk intelligently, embed with multiple models, and build a retrieval pipeline. Compare embedding strategies and measure retrieval quality.

3 of 8 steps

AdvancedCAS GenAI

90 min

Multi-Agent Orchestration

Design agent teams — router agents, specialist agents, critic agents. Learn delegation patterns, consensus mechanisms, and graceful failure handling.

0 of 10 steps

IntermediateCAS GenAI

45 min

Tool Use with MCP

Connect agents to external tools via Model Context Protocol. Build custom MCP servers, integrate databases, APIs, and enterprise systems.

0 of 7 steps

IntermediateBoth

45 min

Model Comparison & Evaluation

Systematically compare foundation models on your tasks. Build evaluation harnesses, measure quality/cost/latency trade-offs, avoid sycophantic agreement.

0 of 6 steps

AdvancedBoth

60 min

Structured Outputs for Production

Guarantee JSON schema compliance. Use constrained decoding, Pydantic models, and validation pipelines for reliable, auditable production systems.

0 of 8 steps

Community

Discussions

Which model handles regulatory text best in production?

Dr. Sarah Meier·24 replies·2 hours ago

MCP tools we've built — share yours

Marco Bernasconi·18 replies·5 hours ago

Paper discussion: Causal Foundation Models reliability concerns (April 2026)

Prof. Anna Kovács·31 replies·1 day ago

Cost optimization tricks for multi-agent workflows

Lucas Tran·12 replies·2 days ago

Community

Active Challenges

Expert34 participants

Build an explainability agent for automated decisions

Create an agent that can explain any ML model's decision in natural language, with causal reasoning and counterfactual explanations. Must comply with EU AI Act Article 86 requirements.

Deadline: June 30, 2026

Advanced21 participants

RAG pipeline under CHF 0.10 per query

Design a retrieval-augmented generation pipeline that maintains quality while keeping per-query costs below CHF 0.10. Evaluate on the community benchmark dataset.

Deadline: July 15, 2026

Evaluate

Rigor & Governance

Model Arena

Side-by-side model comparison with structured evaluation

Compliance Sandbox

Test workflows against regulatory requirements before deployment

Evidence Packages

Signed, auditable compliance artifacts with full methodology trail

Cost & Carbon

Real-time sustainability and efficiency metrics per workflow

Evaluate

Evidence Packages

Every analysis generates a signed, tamper-proof artifact. Exportable as PDF, executable notebook, or structured JSON.

Methodology

Which method was used and why. Identification strategy and assumptions.

Data Profile

Dataset description, feature distributions, missing values, quality checks.

Results

Point estimates with confidence intervals, effect sizes, statistical significance.

Validation

Refutation tests, sensitivity analysis, robustness checks. What could invalidate the finding.

Limitations

Where causal claims break down, known confounders, generalizability bounds.

Decision Trace

Full log of agent decisions, tool calls, model invocations. Reproducibility hash.

Export as:PDFJupyter Notebook (.ipynb)JSON MetadataLaTeX

For Instructors

Instructor Console

Full control over content, cohorts, budgets, and assessment. The platform adapts to any CAS programme structure. Integrates with ETH Moodle via LTI.

Cohort Management

Create cohorts, assign students, set programme dates. Integrates with Moodle via LTI.

Content Management

Create and organise tutorials, scenarios, and exercises. Align with any CAS programme structure.

Budget Configuration

Set per-user and per-cohort budgets for model API usage. Monitor spending in real-time.

Scenario Configuration

Configure evaluation scenarios with custom rubrics, datasets, and scoring dimensions.

Live Monitoring

See student activity in real-time — who is working, which models they're using, where they're stuck.

Review & Assessment

Review student workflows, evidence packages, and notebook submissions. Export grades to Moodle.

Learn

Research Feed

Causal Foundation Models: Promise and Production-Readiness

Zhang et al. · ICML 2026 2026

Systematic evaluation of CausalPFN and Do-PFN reliability. Found poor uncertainty coverage in out-of-distribution settings.

Causal AIFoundation Models

LLM Agents as Causal Orchestrators, Not Causal Reasoners

Kiciman et al. · NeurIPS 2025 2025

LLMs perform poorly at causal reasoning but excel at routing to formal causal methods. The orchestration pattern outperforms end-to-end approaches.

AgentsCausal AI

Structured Outputs at Scale: Constrained Decoding in Production

OpenAI Research · arXiv 2026 2026

How to guarantee JSON schema compliance at inference time without quality degradation. Benchmarks across 7 models.

Structured OutputProduction

The MCP Standard: Universal Tool Integration for AI Agents

Anthropic · Anthropic Technical Report 2025

Model Context Protocol specification and adoption patterns. How tool ecosystems scale beyond single-vendor APIs.

MCPTools

Industry

Expert Network

Dr. Sarah Meier

Head of AI, Swiss Re

12 workflows

Marco Bernasconi

Principal Engineer, PostFinance

8 workflows

Lucas Tran

VP Analytics, Zurich Insurance

15 workflows

Industry

Portfolio Showcase

Multi-Model Regulatory Review Agent

by Dr. Elena Rossi

An agent pipeline that reviews regulatory documents across 3 models, synthesizes findings, and generates a compliance report with citations.

47 123 models

Agentic Document Q&A with Evidence Trail

by Thomas Gruber

RAG pipeline that answers questions from uploaded PDFs, with full evidence trail showing which chunks were retrieved and why.

38 92 models

Cost-Optimized Routing Agent

by Marco Bernasconi

Smart router that classifies query complexity and routes to the cheapest model that can handle it. 73% cost reduction vs. always using the largest model.

62 183 models

Trust & Safety

Grounded · Honest · Compliant · Transparent

Every claim backed by evidence. Every decision traceable. Every output auditable. Not by policy — by architecture.

Grounded

Every claim backed by evidence

Source Attribution
Every agent response cites retrieved sources. No unsupported claims — retrieval chunks linked, confidence scored, gaps flagged.
Hallucination Detection
Cross-reference pipeline checks outputs against retrieved evidence before delivery. Inconsistencies surfaced, not hidden.
Calibrated Uncertainty
Models report what they don't know. Low-confidence answers are marked, not presented as fact. Uncertainty quantiles, not just point estimates.

Honest

Pushback over agreement

Honest Disagreement
Agents push back on incorrect assumptions with evidence. If the premise is wrong, the system says so — never agrees to be agreeable.
Multi-Model Consensus
Same query, multiple models. Disagreements surfaced explicitly. Consensus builds confidence; divergence signals caution.
Built-in Red Teaming
Challenge workflows with adversarial inputs, edge cases, and contradictions before deployment. Know where your system breaks.

Compliant

Regulation-ready by design

EU AI Act Readiness
High-risk obligations enforceable August 2, 2026. Auto-generated Article 86 documentation, audit trails, and explainability artifacts.
Signed Evidence Packages
Every analysis produces a cryptographically signed artifact: methodology, assumptions, results, validation, limitations. Tamper-proof.
Swiss Data Sovereignty
All compute in Switzerland (Zurich). Data never leaves Swiss jurisdiction. FADP/nDSG compliant by architecture, not by promise.

Transparent

Nothing hidden, everything traceable

Full Decision Trace
Every agent decision logged: tools called, alternatives considered, reasoning exposed. Research-grade audit trail for every run.
Reproducibility by Design
Every workflow run versioned, hashable, re-runnable. Pin model versions, fix seeds, lock tools — deterministic replay guaranteed.
Cost & Carbon Accounting
Per-query cost tracking by model. Carbon emissions from Swiss grid mix. No hidden costs, no unmetered usage.

Learn

Guest Speakers

upcomingJune 18, 2026

Dr. Ilya Sutskever

Co-founder, SSI

What AI Safety Means for Enterprise Deployment

recordedMay 14, 2026

Dr. Judea Pearl

Professor, UCLA

Causal Reasoning in the Age of Large Language Models

recordedApril 23, 2026

Amanda Askell

AI Policy Lead, Anthropic

Designing AI Systems That Know What They Don't Know

Build. Collaborate.Stay at the frontier.

Three Ways to Work

Guided Analysis

AI Copilot

Expert Notebook

Foundation Models

Gemini 2.5 Flash

Gemini 2.5 Pro

Claude Sonnet 4

Claude 3.5 Haiku

Tutorials

Your First Agentic Workflow

RAG Pipeline from Scratch

Multi-Agent Orchestration

Tool Use with MCP

Model Comparison & Evaluation

Structured Outputs for Production

Discussions

Which model handles regulatory text best in production?

MCP tools we've built — share yours

Paper discussion: Causal Foundation Models reliability concerns (April 2026)

Cost optimization tricks for multi-agent workflows

Active Challenges

Build an explainability agent for automated decisions

RAG pipeline under CHF 0.10 per query

Rigor & Governance

Model Arena

Compliance Sandbox

Evidence Packages

Cost & Carbon

Evidence Packages

Methodology

Data Profile

Results

Validation

Limitations

Decision Trace

Instructor Console

Cohort Management

Content Management

Budget Configuration

Scenario Configuration

Live Monitoring

Review & Assessment

Research Feed

Causal Foundation Models: Promise and Production-Readiness

LLM Agents as Causal Orchestrators, Not Causal Reasoners

Structured Outputs at Scale: Constrained Decoding in Production

The MCP Standard: Universal Tool Integration for AI Agents

Expert Network

Dr. Sarah Meier

Marco Bernasconi

Lucas Tran

Portfolio Showcase

Multi-Model Regulatory Review Agent

Agentic Document Q&A with Evidence Trail

Cost-Optimized Routing Agent

Grounded · Honest · Compliant · Transparent

Grounded

Honest

Compliant

Transparent

Guest Speakers

Dr. Ilya Sutskever

Dr. Judea Pearl

Amanda Askell

Build. Collaborate.
Stay at the frontier.