ETH Zurich · Agentic AI Education Platform
Build. Collaborate.
Stay at the frontier.
A professional intelligence hub for building agentic AI workflows, evaluating foundation models, and connecting with a community of industry leaders.
127
Members
342
Workflows
1247
Completed
Workspace
Three Ways to Work
Choose your interface. The platform adapts to your expertise — from code-free guided analysis to full Jupyter-compatible notebooks. All modes share the same models, tools, and evidence generation.
Guided Analysis
Step-by-step wizard for structured workflows. Choose a goal, select data, configure methods — the platform handles the orchestration. No code required.
- Goal-driven workflow
- Template library
- Auto-method selection
- Built-in validation
AI Copilot
Natural language interface backed by specialized agents. Ask questions, get grounded answers with citations. Agents route to the right tools and models.
- Multi-agent routing
- Source attribution
- Tool use visible
- Cost tracking per query
Expert Notebook
Interactive code environment for full control. Write Python, build pipelines, run experiments. Jupyter-compatible with integrated model access and MCP tools.
- Python execution
- Jupyter-compatible cells
- Integrated model API
- MCP tool imports
- Export as .ipynb
Foundation Models
Gemini 2.5 Flash
Fast, efficient reasoning model for high-throughput agentic tasks. Excellent cost-performance ratio.
Gemini 2.5 Pro
Most capable reasoning model with deep thinking. Best for complex multi-step workflows.
Claude Sonnet 4
Balanced intelligence and speed. Strong at code generation, analysis, and nuanced reasoning.
Claude 3.5 Haiku
Fastest Anthropic model. Ideal for real-time agent routing and lightweight tasks.
Learn
Tutorials
Your First Agentic Workflow
Build a simple agent that uses tools to answer questions. Learn agent loops, tool calling, and structured responses using LangGraph.
6 of 6 steps
RAG Pipeline from Scratch
Ingest documents, chunk intelligently, embed with multiple models, and build a retrieval pipeline. Compare embedding strategies and measure retrieval quality.
3 of 8 steps
Multi-Agent Orchestration
Design agent teams — router agents, specialist agents, critic agents. Learn delegation patterns, consensus mechanisms, and graceful failure handling.
0 of 10 steps
Tool Use with MCP
Connect agents to external tools via Model Context Protocol. Build custom MCP servers, integrate databases, APIs, and enterprise systems.
0 of 7 steps
Model Comparison & Evaluation
Systematically compare foundation models on your tasks. Build evaluation harnesses, measure quality/cost/latency trade-offs, avoid sycophantic agreement.
0 of 6 steps
Structured Outputs for Production
Guarantee JSON schema compliance. Use constrained decoding, Pydantic models, and validation pipelines for reliable, auditable production systems.
0 of 8 steps
Community
Discussions
Which model handles regulatory text best in production?
MCP tools we've built — share yours
Paper discussion: Causal Foundation Models reliability concerns (April 2026)
Cost optimization tricks for multi-agent workflows
Community
Active Challenges
Build an explainability agent for automated decisions
Create an agent that can explain any ML model's decision in natural language, with causal reasoning and counterfactual explanations. Must comply with EU AI Act Article 86 requirements.
RAG pipeline under CHF 0.10 per query
Design a retrieval-augmented generation pipeline that maintains quality while keeping per-query costs below CHF 0.10. Evaluate on the community benchmark dataset.
Evaluate
Rigor & Governance
Model Arena
Side-by-side model comparison with structured evaluation
Compliance Sandbox
Test workflows against regulatory requirements before deployment
Evidence Packages
Signed, auditable compliance artifacts with full methodology trail
Cost & Carbon
Real-time sustainability and efficiency metrics per workflow
Evaluate
Evidence Packages
Every analysis generates a signed, tamper-proof artifact. Exportable as PDF, executable notebook, or structured JSON.
Methodology
Which method was used and why. Identification strategy and assumptions.
Data Profile
Dataset description, feature distributions, missing values, quality checks.
Results
Point estimates with confidence intervals, effect sizes, statistical significance.
Validation
Refutation tests, sensitivity analysis, robustness checks. What could invalidate the finding.
Limitations
Where causal claims break down, known confounders, generalizability bounds.
Decision Trace
Full log of agent decisions, tool calls, model invocations. Reproducibility hash.
For Instructors
Instructor Console
Full control over content, cohorts, budgets, and assessment. The platform adapts to any CAS programme structure. Integrates with ETH Moodle via LTI.
Cohort Management
Create cohorts, assign students, set programme dates. Integrates with Moodle via LTI.
Content Management
Create and organise tutorials, scenarios, and exercises. Align with any CAS programme structure.
Budget Configuration
Set per-user and per-cohort budgets for model API usage. Monitor spending in real-time.
Scenario Configuration
Configure evaluation scenarios with custom rubrics, datasets, and scoring dimensions.
Live Monitoring
See student activity in real-time — who is working, which models they're using, where they're stuck.
Review & Assessment
Review student workflows, evidence packages, and notebook submissions. Export grades to Moodle.
Learn
Research Feed
Causal Foundation Models: Promise and Production-Readiness
Zhang et al. · ICML 2026 2026
Systematic evaluation of CausalPFN and Do-PFN reliability. Found poor uncertainty coverage in out-of-distribution settings.
LLM Agents as Causal Orchestrators, Not Causal Reasoners
Kiciman et al. · NeurIPS 2025 2025
LLMs perform poorly at causal reasoning but excel at routing to formal causal methods. The orchestration pattern outperforms end-to-end approaches.
Structured Outputs at Scale: Constrained Decoding in Production
OpenAI Research · arXiv 2026 2026
How to guarantee JSON schema compliance at inference time without quality degradation. Benchmarks across 7 models.
The MCP Standard: Universal Tool Integration for AI Agents
Anthropic · Anthropic Technical Report 2025
Model Context Protocol specification and adoption patterns. How tool ecosystems scale beyond single-vendor APIs.
Industry
Expert Network
Dr. Sarah Meier
Head of AI, Swiss Re
12 workflows
Marco Bernasconi
Principal Engineer, PostFinance
8 workflows
Lucas Tran
VP Analytics, Zurich Insurance
15 workflows
Industry
Portfolio Showcase
Multi-Model Regulatory Review Agent
by Dr. Elena Rossi
An agent pipeline that reviews regulatory documents across 3 models, synthesizes findings, and generates a compliance report with citations.
Agentic Document Q&A with Evidence Trail
by Thomas Gruber
RAG pipeline that answers questions from uploaded PDFs, with full evidence trail showing which chunks were retrieved and why.
Cost-Optimized Routing Agent
by Marco Bernasconi
Smart router that classifies query complexity and routes to the cheapest model that can handle it. 73% cost reduction vs. always using the largest model.
Trust & Safety
Grounded · Honest · Compliant · Transparent
Every claim backed by evidence. Every decision traceable. Every output auditable. Not by policy — by architecture.
Grounded
Every claim backed by evidence
Source Attribution
Every agent response cites retrieved sources. No unsupported claims — retrieval chunks linked, confidence scored, gaps flagged.
Hallucination Detection
Cross-reference pipeline checks outputs against retrieved evidence before delivery. Inconsistencies surfaced, not hidden.
Calibrated Uncertainty
Models report what they don't know. Low-confidence answers are marked, not presented as fact. Uncertainty quantiles, not just point estimates.
Honest
Pushback over agreement
Honest Disagreement
Agents push back on incorrect assumptions with evidence. If the premise is wrong, the system says so — never agrees to be agreeable.
Multi-Model Consensus
Same query, multiple models. Disagreements surfaced explicitly. Consensus builds confidence; divergence signals caution.
Built-in Red Teaming
Challenge workflows with adversarial inputs, edge cases, and contradictions before deployment. Know where your system breaks.
Compliant
Regulation-ready by design
EU AI Act Readiness
High-risk obligations enforceable August 2, 2026. Auto-generated Article 86 documentation, audit trails, and explainability artifacts.
Signed Evidence Packages
Every analysis produces a cryptographically signed artifact: methodology, assumptions, results, validation, limitations. Tamper-proof.
Swiss Data Sovereignty
All compute in Switzerland (Zurich). Data never leaves Swiss jurisdiction. FADP/nDSG compliant by architecture, not by promise.
Transparent
Nothing hidden, everything traceable
Full Decision Trace
Every agent decision logged: tools called, alternatives considered, reasoning exposed. Research-grade audit trail for every run.
Reproducibility by Design
Every workflow run versioned, hashable, re-runnable. Pin model versions, fix seeds, lock tools — deterministic replay guaranteed.
Cost & Carbon Accounting
Per-query cost tracking by model. Carbon emissions from Swiss grid mix. No hidden costs, no unmetered usage.
Learn
Guest Speakers
Dr. Ilya Sutskever
Co-founder, SSI
What AI Safety Means for Enterprise Deployment
Dr. Judea Pearl
Professor, UCLA
Causal Reasoning in the Age of Large Language Models
Amanda Askell
AI Policy Lead, Anthropic
Designing AI Systems That Know What They Don't Know