Balancing Performance and Security in zkVM Benchmarking

A Framework for Meaningful Comparison in an Emerging Ecosystem

Feb 25, 2025

Credits

This “manifesto” of sorts is the result of great conversations and feedback from:

Mikerah Quintyne-Collins (Stoffel MPC) - @badcryptobitch
John Guibas (Succinct) - @jtguibas
Jacob Everly (Risc0) - @JacobEv3rly
Justin Thaler (a16z) - @SuccinctJT
Kobi Gurkan (Bain Capital Crypto) - @kobigurk
Justin Drake (Ethereum Foundation) - @drakefjustin
Imtiaz Majeed (Orb Labs) - @imtimaj

Over the past eight or so months, I've had considerable time to reflect on the state of zkVM development and benchmarking. This period has included numerous conversations with implementers, academics, and users, each bringing their own perspective on what makes a zkVM successful in practice. Through these discussions and my own experiences building and deploying these systems, I've come to believe we need a more nuanced conversation about how we evaluate zkVM implementations.

This reflection was sparked by Justin Thaler's Summer 2024 article (Understanding Jolt: Clarifications and reflections) discussing Jolt's benchmarking approach. In it, Thaler argued that implementation-specific optimizations and features like precompiles should be considered "confounding factors" when evaluating zkVM performance. While I initially found myself strongly disagreeing with this perspective, the past months have led me to a more measured view – though one that still diverges significantly from Thaler's position.

What I've come to understand is that while concerns about increased attack surface and implementation complexity are valid, the conclusion regarding how we should approach benchmarking doesn't fully align with the practical needs of the ecosystem.

The decision of whether to use a current zkVM implementation, despite potential security risks from optimizations or novel features, must be driven by comprehensive data analysis rather than theoretical purity alone. This means carefully weighing quantified performance benefits against concrete security risks, considering factors like the maturity of specific optimizations, the criticality of the application, and the actual threat model. The key isn't to strip away implementation details in our evaluation, but rather to develop more sophisticated frameworks for understanding how implementation choices affect both performance and security in real-world deployments.

The Evolution of Virtual Machine Evaluation

The challenge of meaningfully comparing different virtual machine implementations is not new to computing. The rivalry between the Java Virtual Machine (JVM) and V8 JavaScript engine provides particularly relevant lessons for the zkVM ecosystem. Despite radically different approaches to achieving portable code execution, both platforms ultimately required sophisticated evaluation frameworks that went far beyond simple performance metrics.

In the early 2000s, the JVM dominated enterprise computing with its "Write Once, Run Anywhere" promise. When V8 emerged in 2008, it took a fundamentally different approach to virtual machine design through aggressive just-in-time compilation. Initially, both platforms were primarily evaluated against native performance benchmarks - a reasonable but incomplete metric that missed crucial dimensions like startup time, memory usage, and security boundary overhead.

The community eventually realized that meaningful comparison required a more nuanced approach. This led to the development of standardized benchmarks like SPECjvm98 that measured not just raw performance, but also real-world workload behavior, memory management efficiency, and security characteristics. These frameworks enabled fair comparison between fundamentally different virtual machine implementations while acknowledging their distinct trade-offs and optimization strategies.

Learning from Hardware's Hard Lessons

The semiconductor industry offers sobering lessons about the relationship between optimization complexity and security. Intel's Spectre vulnerability serves as a prime example: decades of aggressive performance optimization through speculative execution created subtle attack surfaces that weren't discovered until 2018. This catastrophic security flaw affected virtually every modern processor and required fundamental rethinking of hardware security models.

Just as the JVM and V8 communities developed sophisticated frameworks to evaluate different virtual machine implementations, the semiconductor industry had to evolve its approach to balancing performance optimization with security guarantees.

But here's the crucial point: we didn't halt semiconductor production or abandon performance optimization. If we had, the American semiconductor industry wouldn't be the global powerhouse it is today. Instead, the industry adapted. The SPEC CPU benchmark suite evolved to incorporate security considerations alongside traditional performance metrics. Intel and AMD developed sophisticated microarchitectural security testing frameworks and formal verification approaches. SESC and other architectural simulators were enhanced to model security boundaries and potential information leakage. The industry found ways to push performance boundaries while maintaining rigorous security evaluation.

Most importantly, the semiconductor industry learned to treat security evaluation as a first-class citizen in their benchmarking methodology. The once-simple SPEC CPU benchmarks transformed into comprehensive evaluation suites that consider everything from branch prediction accuracy to power consumption to potential timing channel vulnerabilities.

The parallel to zkVMs is clear. When Thaler warns about the increased surface area for bugs with each additional precompile or optimization, he's not wrong. Adding complexity to zero-knowledge systems, which already operate at the bleeding edge of cryptographic security, demands extraordinary caution. You probably shouldn't put a zkVM in a 747's flight control system just yet.

However, the lesson from both virtual machine evolution and hardware security isn't to avoid optimization - it's to develop more sophisticated frameworks for evaluating and managing these trade-offs.

The False Choice of Pure vs. Practical

The suggestion that we should primarily "ascertain the intrinsic performance profiles of different proof systems" isolated from their implementations creates a false choice. Implementation quality isn't merely a confounding factor – it's a core attribute of any system's capabilities and security properties. The real challenge lies in developing robust frameworks for evaluating and managing these trade-offs, just as the semiconductor industry did.

Understanding Jolt: Clarifications and reflections - Link

When we examine successful platforms in computing history, from x86 processors to the JVM, their success came from striking this balance between optimization and safety through systematic approaches to security and performance. The notion that we should somehow discount engineering achievements through the course of benchmarking misses the point entirely. We need comprehensive frameworks that help teams evaluate both theoretical foundations and practical implementations, understanding the full spectrum of trade-offs they're making.

Initially, I found myself resistant to deeply engaging with this problem – the task of holistically evaluating a proof system implementation felt overwhelmingly complex, and it was tempting to fall back on broad generalizations about "theoretical purity" versus "practical engineering." However, by focusing specifically on business-facing properties – like consistent performance under load, operational costs at scale, and security guarantees that match real-world threat models – we can cut through this complexity to meaningful evaluation criteria. While this approach requires more nuanced analysis than simply comparing raw performance numbers or theoretical properties in isolation, it provides the actionable insights that teams actually need when choosing and deploying these systems.

A More Nuanced Approach to zkVM Benchmarking

Rather than either stripping away implementation specifics or blindly embracing all optimizations, we need benchmarking approaches that evaluate the complete system profile. Current benchmarking approaches often focus too narrowly on raw performance metrics or theoretical properties in isolation, missing the complex interplay between implementation choices, security guarantees, and real-world performance characteristics.

1. Real-World Workload Assessment

When measuring performance on real-world workloads, we must go beyond simple throughput metrics. This requires:

Creating benchmark suites that reflect actual production use cases, not just synthetic tests that showcase best-case performance
Documenting the specific security assumptions made by each optimization under real load conditions
Understanding how different workload patterns affect both performance and security guarantees
Measuring the impact of proof composition and recursive proving strategies on end-to-end performance

For example, while a zkVM might show excellent performance on simple program logic, real applications often involve complex memory access patterns and control flow that can significantly impact both performance and security assumptions. We need benchmarks that capture these nuances.

2. System-Wide Security Analysis

Evaluating the entire system means understanding how different components interact and affect the overall security perimeter. This includes:

Mapping the impact of precompiles on the trusted computing base
Understanding how different optimizations might compound to create unexpected security implications
Analyzing how the choice of proving system affects the ability to formally verify the implementation
Evaluating the trade-offs between native operations and precompiles in terms of both performance and security

The semiconductor industry's experience with Spectre and Meltdown demonstrates how optimizations can have non-obvious security implications. In zkVMs, each precompile and optimization potentially expands the attack surface in ways that might not be immediately apparent.

3. Implementation Trade-offs

When assessing performance benefits and security implications, we need concrete metrics for both sides of the equation:

Quantifying performance gains in terms of proof generation time, verification costs, and resource utilization
Measuring the increased complexity and potential attack surface introduced by each optimization
Understanding the long-term maintenance implications of different implementation choices
Evaluating the reversibility of optimization decisions and their impact on system evolution

This analysis should extend beyond immediate performance metrics to consider factors like development velocity, debugging complexity, and the ability to respond to security incidents.

4. Verification and Audit Framework

Formal verification and security audits must be integrated into the evaluation process from the start:

Developing frameworks for formally verifying optimized implementations
Creating clear documentation of verification assumptions and limitations
Establishing standards for security audits that consider both theoretical and practical aspects
Building tools to help track and verify security properties as implementations evolve

Most importantly, we need to understand the interplay between verification capabilities and implementation choices. Some optimizations might provide substantial performance benefits but make formal verification significantly more difficult.

Practical Implications

This more nuanced approach to benchmarking has several practical implications:

Benchmark results should include security context alongside performance metrics
Implementation choices should be evaluated based on their complete system impact
Security assumptions should be explicitly documented and tracked
Performance optimizations should be reversible when possible
Formal verification should be considered as part of the initial design, not an afterthought

The goal isn't to discourage optimization but to ensure that performance improvements are achieved without compromising the fundamental security properties that make zkVMs valuable. This requires developing more sophisticated evaluation frameworks that can capture the nuanced relationships between implementation choices, performance characteristics, and security guarantees.

Moving forward, the industry needs to establish standard benchmarking methodologies that incorporate these considerations. This will help teams make informed decisions about zkVM implementation and deployment while maintaining the security properties that are essential for their applications.

The challenge lies not in choosing between performance and security, but in developing evaluation frameworks that help us understand and manage the trade-offs effectively. By taking this more comprehensive approach to benchmarking, we can better serve the needs of both implementers and users while maintaining the high security standards required for zero-knowledge systems.

The Path Forward

The development of zkVMs stands at a critical juncture. While we've made significant progress in understanding the relationship between implementation choices and system properties, several key challenges remain. Moving forward, the industry needs to focus on:

Infrastructure for Systematic Evaluation

Development of standardized benchmarking suites that incorporate both performance and security metrics
Creation of tools for continuous monitoring of security-performance trade-offs
Establishment of shared datasets for comparing different implementations
Building automated analysis pipelines for tracking optimization impacts

Community Standards and Best Practices

Defining acceptable trade-offs for different application contexts
Creating documentation standards for security assumptions
Establishing common vocabulary for discussing optimization impacts
Building shared knowledge bases for implementation patterns

Technical Foundations

Investment in formal verification tools that can scale with implementation complexity
Development of reversible optimization techniques
Creation of security-preserving precompile frameworks
Research into provably secure optimization strategies

Conclusion

The future success of zkVMs depends on our ability to balance the competing demands of performance, security, and practical implementation. By adopting more nuanced approaches to benchmarking and evaluation, we can build systems that meet real-world needs while maintaining robust security guarantees.

This requires moving beyond simplistic comparisons to develop sophisticated frameworks that capture the full complexity of zkVM implementations. While this path forward presents significant challenges, it also offers the opportunity to build more robust and useful systems.

The key to success lies not in avoiding complexity, but in developing the tools and methodologies to manage it effectively. As the field matures, these more nuanced approaches to evaluation and implementation will become increasingly crucial for building systems that can be deployed with confidence in production environments.

Thanks for reading Burning Bridges! This post is public so feel free to share it.

Burning Bridges