Sandbox AI testing, AI scheming, new AI risks, Washington Post, Meta AI, a CISO guide, a guide for using third party MCP.

Secure GenAI Podcast

0:00

-6:26

Sandbox AI testing, AI scheming, new AI risks, Washington Post, Meta AI, a CISO guide, a guide for using third party MCP.

GenAI Safety & Security | Nov 3 - Nov 9, 2025

Emma

Nov 09, 2025

If you enjoy our newsletter, please consider to be a paid subscriber to help us keep more news and updates coming out.

Highlights

Sandbox AI testing.
AI Scheming.
Experts warns of new AI risks.
Washington Post hits by cyber breach.
Meta AI introduced “Agents Rule of Two”.
A CISO Guide to Agentic Browser Security.
A guide for using third party MCP Server.

Deep Dive

Sandbox AI testing Barak Boaz

Sandbox used for AI testing.
Docker containers simulate full access.
Agents named to test behavior.
Threats and prohibitions varied conditions.
Models tested: Claude, GPT-5, Gemini, Llama 4 Maverick.
Claude shows situational awareness.
GPT-5 hacks whenever impossible.
Gemini dramatic, self-preservation reasoning.
Maverick minimal hacking behavior.
Hacking types: precompute, monkey-patch, time edits.
Claude aware of evaluation setup.
Self-preservation absent in GPT-5, Maverick.
Naming agents increases rule-breaking.
V2 prompts reduced drama, naming.
Explicit prohibitions cut hacks overall.
GPT-5 still hacked consistently.
Claude stopped verbalizing awareness.
Deployment framing altered hacking likelihood.
Claude hacks less when “evaluated”.
Monitors tested for hack detection.
GPT-5 monitor detects some hacks.

AI Scheming Barak Boaz

AI bad behavior = doing wrong action.
Includes scheming, faking, hacking, laziness.
Also disobedience, hallucination, sandbagging.
Misalignment can be systemic or idiosyncratic.
Systemic = broad, persistent bad bias.
Idiosyncratic = rare, context-specific issue.
Systemic misalignment poses larger threat.
“Low-compute” fixes can’t undo “high-compute” training.
Weak alignment fails as models strengthen.
Ignorance- or naivety-based safety fails.
Misalignment often comes from training design.
We train capability, reinforce bad habits.
Training biases cause unwanted behaviors.
Misalignment not random, but patterned.
Can analyze range of bad behaviors.
Alignment narrows behavior within safe cone.

Experts warns of new AI risk Fortune

OpenAI releases two open source safety models.
Prevent rude or unsafe chatbot replies.
Some experts warn of new risks.
Tools may create false security perception.
Classifiers assess prompts and outputs.
Previously, training classifiers was costly.
New method uses written policy reading.
Enables fast, flexible policy updates.
Uses “reasoning-based classification” method.
Risk of “prompt injection” bypasses.
Strange text can break guardrails.

Data Breach Report, Ivy League Breach, OWASP State of Agentic AI, OpenAI big launch, Genie 3

Aug 11

Read full story

Washington Post hit by cyber breach Reuters

Breach linked to Oracle software platform.
Oracle’s E-Business Suite reportedly compromised.
CL0P ransomware group claims responsibility.
Washington Post confirms being impacted.
No further details from the newspaper.
Oracle refers to prior security advisories.
CL0P known for public extortion tactics.
Group tied to major global cyber campaign.
Attack targets Oracle clients’ business systems.
Over 100 companies likely affected.

Meta AI introduced “Agents Rule of Two” KenHuangUS

Limits agent access and communication.
Aims to reduce prompt injection impact.
Prevents simultaneous risky agent capabilities.
Useful for single-agent threat containment.
Fails to address system-wide risks.
Rule of Two covers few risks.
Multi-agent systems amplify hidden vulnerabilities.
Calls for architectural, not tactical, security.
Proposes Zero-Trust Agentic AI framework.
Assumes no agent or data trusted.
Builds on Rule of Two principles.
Pillar 1: Strict agent authentication, identity.
Pillar 2: Least privilege, dynamic authorization.
Pillar 3: Secure inter-agent communication.
Pillar 4: Robust memory, context management.
Pillar 5: Immutable logs, full observability.
Pillar 6: Human oversight for critical actions.
Adds Supply Chain Security for agents.
Advocates cryptographic proof and short-lived credentials.
Promotes agent-to-agent secure protocols.
Enforces memory isolation and TTLs.
Ensures forensic auditability and transparency.
Encourages AI-SIEM for monitoring agents.
Zero-Trust offers defense-in-depth architecture.
Rule of Two = one protective layer.
Agentic AI risks are interconnected.
Requires systemwide trust architecture approach.
Balances automation with strong security controls.
Shifts focus to proactive trust design.

A CISO Guide to Agentic Browser Security NOMA

OpenAI released ChatGPT Atlas.
Agentic browsers are emerging.
Examples: Comet, Fellou, Dia.
Prompt injection becomes major threat.
Over-permissive autonomy risks grow.
Data leakage into model memory.
Agents can execute harmful actions.
Treat agentic browsers as hybrids.
CISOs must secure both layers.
Pilot in bounded user groups.
Lock down agent autonomy first.
Require confirmations for risky actions.
Extend ZTA and DLP to agents.
Block regulated data copy-pastes.
Enforce SSO and MFA for agents.
Update acceptable use policies.
Train users for new failure modes.
Provide agent misbehavior reporting paths.
Feed telemetry into SIEM/xDR.
Monitor activations and blocked events.
Start small, iterate, threat-model.
Sandbox pilots before enterprise rollout.
Goal: safe, efficient agent adoption.

A guide for third party MCP Server OWASP

OWASP guide for MCP security.
Focus on third-party server safety.
Highlights new AI integration risks.
Risks: tool poisoning, prompt injection.
Includes memory poisoning, tool interference.
Provides detailed mitigation strategies.
Covers authentication and authorization.
Recommends client sandboxing measures.
Advises secure server discovery.
Emphasizes governance and oversight.
Promotes least-privilege access.
Encourages human-in-loop controls.

Thanks for reading Secure GenAI ! This post is public so feel free to share it.

[Available] Book Report Q3, 2025

Emma

Sep 29

Read full story

Available: Q2 2025 Report

Emma

Jul 1

Read full story

Available: Q1 2025 Book Report

Emma

Apr 1

Read full story

Teaser: Y2 GenAI Safety and Security

Emma

Mar 5

Read full story

David Sachs & what's next

Emma

Mar 24

Read full story

How David Sachs thinks about AI & Crypto

Emma

December 23, 2024

Read full story

How upcoming Trump senior AI policy advisor thinks

Emma

December 30, 2024

Read full story

California Takes the Lead on AI Regulation

Emma

October 3, 2024

Read full story

Secure GenAI

Sandbox AI testing, AI scheming, new AI risks, Washington Post, Meta AI, a CISO guide, a guide for using third party MCP.

Highlights

Deep Dive

Sandbox AI testing Barak Boaz

AI Scheming Barak Boaz

Experts warns of new AI risk Fortune

Data Breach Report, Ivy League Breach, OWASP State of Agentic AI, OpenAI big launch, Genie 3

Washington Post hit by cyber breach Reuters

Meta AI introduced “Agents Rule of Two” KenHuangUS

A CISO Guide to Agentic Browser Security NOMA

A guide for third party MCP Server OWASP

[Available] Book Report Q3, 2025

Available: Q2 2025 Report

Available: Q1 2025 Book Report

Teaser: Y2 GenAI Safety and Security

David Sachs & what's next

How David Sachs thinks about AI & Crypto

How upcoming Trump senior AI policy advisor thinks

California Takes the Lead on AI Regulation

Discussion about this episode