A Blueprint for Building a Quantum Computer,b

key insights

General-purpose quantum computers capable of efficiently solving difficult problems will be physically large, comprising millions or possibly billions of quantum bits in distributed systems.

Quantum computer architecture matches available physical qubit technologies to applications.

The driving force of an architecture is quantum error correction, guarding against loss of fragile quantum data.

Even with quantum computers, constant factors matter as much as asymptotic computational class; researchers worldwide are working to bring errorcorrected clock speeds up to operationally attractive levels.

能够高效解决困难问题的通用量子计算机将在物理上很大，由数百万甚至数十亿个分布式系统中的量子位组成。

量子计算机架构将可用的物理量子比特技术与应用匹配。

架构的驱动力是量子纠错，防止脆弱的量子数据丢失。

即使有了量子计算机，常数因素与渐近计算类别一样重要; 全球研究人员正在努力将纠错时钟速度提高到操作上有吸引力的水平。

Small-scale quantum computing devices built on a variety of underlying physical implementations exist in the laboratory, where they have been evolving for over a decade, and have demonstrated the fundamental characteristics necessary for building systems. The challenge lies in extending these systems to be large enough, fast enough, and accurate enough to solve problems that are intractable for classical systems, such as the factoring of large numbers and the exact simulation of other quantum mechanical systems. The architecture of such a computer will be key to its performance. Structurally, when built, a "quantum computer" will in fact be a hybrid device, with quantum computing units serving as coprocessors to classical systems. The program, much control circuitry, and substantial pre- and postprocessing functions will reside on the classical side of the system. The organization of the quantum system itself, the algorithmic workloads for which it is designed, its speed and capabilities in meeting those goals, its interfaces to the classical control logic, and the design of the classical control systems are all the responsibility of quantum computer architects.

实验室里已经存在了多种基于不同物理实现的小规模量子计算设备，并已经发展了十多年，已经展示了构建系统所必需的基本特性。挑战在于将这些系统扩展到足够大、足够快、足够准确，以解决对于经典系统来说是难以处理的问题，例如大数分解和其他量子机械系统的精确模拟。这种计算机的架构对其性能至关重要。结构上来说，建成后的“量子计算机”实际上将是一个混合设备，量子计算单元将作为协处理器服务于经典系统。程序、控制电路和大量的前后处理功能将存储在系统的经典侧。量子系统本身的组织方式、为其设计的算法工作负载、其在实现这些目标时的速度和能力、与经典控制逻辑的接口以及经典控制系统的设计都是量子计算机架构师的责任。

In this article, we review the progress that has been made in developing architectures for full-scale quantum computers. We highlight the process of integrating the basic elements that have already been developed, and introduce the challenges that remain in delivering on the promise of quantum computing.

The most famous development to date in quantum algorithms is Shor's algorithm for factoring large numbers in polynomial time.33 While the vernacular press often talks of factoring large numbers "in seconds" using a quantum computer, in reality it is not even possible to discuss the prospective performance of a system without knowing the physical and logical clock speed, the topology of the interconnect among the elements, the number of logical quantum bits (qubits) available in the system, and the details of the algorithmic implementationin short, without specifying the architecture. Figure 1 illustrates the impact that architecture can have on the bottom-line viability of a quantum computer; here, the architecture used can make the difference between an interesting proof-of-concept device and an immediate threat to all RSA encryption.

In developing a quantum computer architecture we have much to learn from classical computer architecture, but with a few important caveats. Foremost among these caveats is that the delicate nature of quantum information demands that memory elements be very active. Second, long wires or long-distance connections inside a quantum computer are either nonexistent, requiring nearest neighbor, cellular automaton-like transfer of data, or are at best poor quality, requiring much effort to transfer even a single qubit from place to place using quantum teleportation and error management techniques. Thus, the principles of classical computer architecture can be applied, but the answers arrived at likely will differ substantially from classical architectures.

Quantum computer architecture as a field remains in its infancy, but carries much promise for producing machines that vastly exceed current classical capabilities, for certain systems designed to solve certain problems. As we begin to design larger quantum computers, it must be recognized that large systems are not simply larger versions of small systems. The conceptual stack of subfields that must all contribute to a scalable, real-world machine is shown in Figure 2, divided into a set of layers. In this article, we discuss the elements of this structure in turn, all leading to the central core of quantum computer architecture.

At the bottom of the stack we have the technologies for storing individual qubits, and processing or transporting them to take part in a larger computation. How small groups of qubits will interconnect is the first problem quantum computer architecture must solve. Given the fragility of quantum data, how can many qubits be kept "alive" long enough to complete a complex computation? The solution to this problem, the field of quantum error correction (QEC), began with studies demonstrating that arbitrarily accurate computation is theoretically possible even with imperfect systems, but our concern here is the design of subsystems for executing QEC, which can be called the quantum computer microarchitecture.26 Recent progress in experimentally demonstrated building blocks and the implementation of QEC are the first two topics addressed in this article.

At the top of the stack, machines will be designed for specific workloads, to run certain algorithms (such as factoring or simulation) that exist in computational complexity classes believed to be inaccessible to classical computers. Without these algorithms, there will be no economic incentive to build and deploy machines.

With context established at both the top and bottom of the stack, we present progress that has been made toward integrated architectures, and finish with a detailed example of the immense scale-up in size and slowdown in speed arising from the error correction needs of a full-scale, digital quantum computer. We note that the process of writing this article has been made substantially easier by the appearance in the last few years of excellent reviews on many architecture-relevant subfields.1349131928

在本文中，我们回顾了全面量子计算机架构的发展进展。我们强调了整合已经开发的基本元素的过程，并介绍了实现量子计算承诺所面临的挑战。

到目前为止，量子算法中最著名的发展是Shor的算法，用于在多项式时间内分解大数[33]。虽然通俗的媒体经常谈论使用量子计算机“在几秒钟内”分解大数，但事实上，如果不知道物理和逻辑时钟速度，元素之间互连的拓扑结构，系统中可用的逻辑量子位（qubit）的数量以及算法实现的详细信息，就不可能讨论系统的潜在性能，简而言之，没有指定体系结构。图1说明了体系结构对量子计算机底线可行性的影响；在这里，所使用的体系结构可能是一个有趣的概念证明设备和对所有RSA加密的直接威胁之间的区别。

在开发量子计算机体系结构方面，我们有很多可以从经典计算机体系结构中学习的地方，但也有一些重要的警告。其中最重要的是，量子信息的精细性要求存储元素非常活跃。其次，在量子计算机内部，长导线或长距离连接要么不存在，需要最近的邻居、元胞自动机式的数据传输，要么最好质量差，需要大量的量子隐形传态和错误管理技术才能将单个qubit从一个地方转移到另一个地方。因此，可以应用经典计算机体系结构的原则，但所得到的答案可能与经典体系结构大相径庭。

量子计算机体系结构作为一个领域仍处于萌芽状态，但为解决某些系统用于解决某些问题的机器的现有经典能力的承诺提供了很大的希望。随着我们开始设计更大的量子计算机，必须认识到大型系统不仅仅是小型系统的放大版本。所有必须为可扩展的现实机器做出贡献的概念子领域的堆栈如图2所示，分为一组层。在本文中，我们依次讨论这个结构的要素，所有这些要素都通向量子计算机体系结构的中心核心。

在堆栈的底部，我们有存储单个qubits的技术，以及处理或传输它们以参与更大的计算。小组qubits如何相互连接是量子计算机体系结构必须解决的第一个问题。鉴于量子数据的脆弱性，如何保持许多qubits足够“活跃”，以完成复杂的计算？解决这个问题的方法是量子纠错（QEC）领域，从理论上证明即使在不完美的系统中，任意精确的计算也是可能的，但我们关心的是执行QEC的子系统的设计，可以称为量子计算机微体系结构。最近取得的实验性构建块和QEC实现的进展是本文讨论的前两个主题。

在堆栈的顶部，机器将被设计用于特定的工作负载，以运行某些算法（例如分解或模拟），这些算法属于计算复杂性类别，被认为是经典计算机无法访问的。如果没有这些算法，就没有建立和部署机器的经济激励。

在确定了堆栈的顶部和底部的情况下，我们介绍了已经实现的集成体系结构的进展，并以一个详细的例子结束，该例子显示了完整数字量子计算机的纠错需求引起的规模扩大和速度减缓。我们注意到，撰写本文的过程在过去几年中得到了很多体系结构相关子领域的优秀评论的出现。

Qubit Technologies

At the lowest level of our Figure 2, we have the technological building blocks for the quantum computer. The first significant attempt to characterize the technology needed to build a computer came in the mid-1990s, when DiVincenzo listed criteria that a viable quantum computing technology must have: (1) a two-level physical system to function as a qubit;a (2) a means to initialize the qubits into a known state; (3) a universal set of gates between qubits; (4) measurement; and (5) long memory lifetime.10 These criteria were later augmented with two communication criteria, the ability to convert between stationary and "flying" qubits, and the ability to transmit the latter between two locations.

In any qubit technology, the first criterion is the most vital: What is the state variable? Equivalent to the electrical charge in a classical computer, what aspect of the physical system encodes the basic "0" or "1" state? The initialization, gates, and measurement process then follow from this basic step. Many groups worldwide are currently working with a range of state variables, from the direction of quantum spin of an electron, atom, nucleus, or quantum dot, through the magnetic flux of a micron-scale current loop, to the state of a photon or photon group (its polarization, position or timing).

The accompanying table lists a selection of qubit technologies that have been demonstrated in the laboratory, with examples of the material and the final device given for each state variable.

Controlling any kind of physical system all the way down to the quantum level is difficult, interacting qubits with each other but not anything else is even harder, and control of systems for quantum computing over useful lifetimes is an immense experimental challenge. While experimental progress has been impressive in the last decade, moving away from one- and two-qubit demonstrations, we are still a very long way away from entangling, storing, and manipulating qubits on anything like the scale of classical computing and bits. Here, we are able to discuss only selected examples of the various technologies on offer; for more information, we recommend the recent survey by Ladd et al.19

Ion trap devices and optical systems currently lead in the number of qubits that can be held in a device, and controlled and entangled. Ions are trapped in a vacuum by electromagnetic potentials, and 14-qubit entanglement has been demonstrated (the largest entangled state in any form shown to date).27 Complex bench-top linear optical circuits are capable of entangling eight-photon qubit states, and have been shown to perform nontrivial computation over short timescales.40 Both of these approaches do not scale in those forms, but scalable approaches are also under development. Groups headed by Wineland, Monroe, Chuang, and others have demonstrated the necessary building blocks for ion traps.21 Integrated nanophotonics (using photons as qubits) made on silicon chips provides the route to getting optics off of the lab bench and into easily scalable systems, and is making substantial progress in groups such as O'Brien's.19

Solid-state electronic devices for gate-based computation, while currently trailing in terms of size of entangled state demonstrated, hold out great promise for mass fabrication of qubits.19 By trapping a single extra electron in a 3D block of semiconducting material, a quantum dot has a quantum spin relative to its surrounding that can hold a qubit of data. For flux qubits, the state variable is the quantum state of the magnetic flux generated by a micron-scale ring of current in a loop of superconducting wire (Figure 3).

In solid-state technologies, the experimental focus has been on improving the control of individual qubits rather than growing their numbers, but that focus has begun to shift. Jaw-Shen Tsai has noted that superconducting qubit memory lifetime has more than doubled every year since 1999, and has now reached the point where quantum error correction becomes effective.

Overall, the prospects are very good for systems consisting of tens of qubits to appear in multiple technologies over the next few years, allowing experimental confirmation of the lower reaches of the scaling behavior of quantum algorithms and the effectiveness of quantum error correction.

However, one factor that is often unappreciated when looking at these qubit technologies is the physical scale of the devices, particularly in comparison with classical digital technologies. Many people associate quantum effects with tiny objects, but most of these technologies use devices that are enormous compared to the transistors in modern computer chips. Transistors really are reaching down to atomic scales, with vendors having shipped chips fabricated with a 22-nanometer process at the end of 2011. In these chips, the smallest features will be only about 40 times the silicon crystal lattice cell size. In contrast, although the atoms themselves are of course tiny, ion traps are limited to inter-atom spacing of perhaps tens of microns for RF and optical control. Nanophotonic systems will require components tens of microns across, to accommodate the 1.5μm wavelength light that is desirable for telecommunications and silicon optics. Superconducting flux qubits require a current ring microns across. All of these technologies result in qubit devices that are macroscopic, or nearly so, with areal densities a million times less than computer chips. This fact will have enormous impact on large-scale architectures, as we will see.

Qubit Technologies

在我们的图2的最低层，我们拥有量子计算机的技术构建模块。构建计算机所需技术的第一次重大尝试始于20世纪90年代，当时DiVincenzo列出了一个可行的量子计算技术必须具备的标准：（1）一个可作为量子比特的二级物理系统；（2）将量子位初始化到已知状态的方法；（3）量子比特之间的通用门；（4）测量；以及（5）长寿命。10稍后，这些标准又加入了两个通信标准，即转换静态和“飞行”量子比特的能力以及在两个位置之间传输后者的能力。

在任何量子比特技术中，第一个标准是最重要的：状态变量是什么？相当于经典计算机中的电荷，物理系统的哪个方面编码基本的“0”或“1”状态？然后从这个基本步骤开始跟随初始化、门和测量过程。目前，全球许多团队正在使用一系列状态变量，从电子、原子、核子或量子点的量子自旋方向，到微米级电流环的磁通，再到光子或光子组的状态（其偏振、位置或时间）。

附带表列出了在实验室中展示的一些量子比特技术，每种状态变量都提供了材料和最终设备的示例。

控制任何一种物理系统一直到量子级别都很难，使量子比特相互作用而不与其他任何东西作用更加困难，并且在有用的生命周期内控制用于量子计算的系统是一个巨大的实验挑战。尽管过去十年的实验进展令人印象深刻，远离一比特和两比特的演示，但我们距离在任何类似于经典计算和比特的规模上纠缠、存储和操作量子比特还有很长的路要走。在这里，我们只能讨论一些提供的各种技术的选定示例；有关更多信息，我们建议参阅Ladd等人的最近调查。19

离子陷阱设备和光学系统目前在设备中可以容纳的比特数量、控制和纠缠方面处于领先地位。离子通过电磁势陷入真空，已经展示了14比特的纠缠（任何形式中展示的最大纠缠状态）。27复杂的台式线性光学电路能够纠缠八光子比特状态，并已经展示了在短时间内执行非平凡计算的能力。40这两种方法都无法扩展成这些形式，但也正在开发可扩展的方法。Wineland、Monroe、Chuang等组的团队已经展示了离子陷阱的必要构建模块。21在硅芯片上制造的集成纳米光子学（使用光子作为量子比特）提供了将光学器件从实验室台子上移出并进入容易扩展的系统的途径，并在O'Brien等组中取得了实质性进展。19

基于门的计算的固态电子器件，虽然在演示的纠缠状态大小方面目前落后于其他技术，但为量子比特的大规模制造提供了巨大的希望。通过在半导体材料的三维块中捕获单个额外电子，量子点相对于周围具有量子自旋，可以容纳一个数据比特。对于磁通比特，状态变量是由超导线环中微米级环流产生的磁通的量子状态(图3)。

在固态技术中，实验的重点是改善单个量子比特的控制而不是增加它们的数量，但这个重点已经开始转移。Jaw-Shen Tsai指出，超导量子比特存储寿命自1999年以来每年翻倍，现在已经达到量子纠错变得有效的程度。

总的来说，未来几年将有多种技术的由几十比特组成的系统出现，这将使量子算法的扩展行为的下限以及量子纠错的有效性得到实验验证。

但是，看待这些量子比特技术时经常被忽视的一个因素是设备的物理规模，特别是与经典数字技术相比。许多人将量子效应与微小物体联系在一起，但是大多数这些技术使用的设备与现代计算机芯片中的晶体管相比都是巨大的。晶体管确实已经达到了原子尺度，供应商已经在2011年底出货了使用22纳米工艺制造的芯片。在这些芯片中，最小的特征尺寸将仅约为硅晶体晶格单元尺寸的40倍。相比之下，虽然原子本身当然很小，但离子陷阱的RF和光学控制仅限于大约几十微米的原子间距。纳米光子系统将需要数十微米大小的组件，以适应1.5μm波长的光在电信和硅光学中的理想应用。超导磁通比特需要跨越数微米的电流环。所有这些技术都会导致宏观的或几乎是宏观的量子比特设备，其面积密度比计算机芯片低了一百万倍。这一事实将对大规模体系结构产生巨大影响，正如我们将看到的。

First steps in quantum architecture. At this lowest level, the job of quantum architecture is to determine how individual qubits or qubit blocks interconnect and communicate in order to process data. There are three main areas where experimental groups have begun to consider architectural implications in designing their systems.

Heterogeneity. Some researchers have been investigating technological heterogeneity by using combinations of electron spin, nuclear spin, magnetic flux, and photon polarization in a single system.5 It is, however, equally important to consider structural heterogeneity, both in operational capability and in interconnect speed or fidelity. Martinis's group has recently demonstrated a chip with a functional distinction between two flux qubits and memory storage elements, leading them to refer to it as a quantum von Neumann architecture.24 In many technologies, including some forms of quantum dots and Josephson junction qubits, measurement of a qubit requires an additional physical device, consuming die space and making layout of both classical and quantum components more complex.

Integration and classical control. Increasing the number of on-chip devices will require improving on-chip integration of control circuits and multiplexing of I/O pins to get away from multiple rack-mount units for controlling each individual qubit. Early efforts at considering the on-chip classical control overhead as systems grow include Oskin's design,30 and recent work by Levy et al. details the on-chip hardware's impact on error correction.22 Kim uses his expertise in micro-mirrors to focus on the classical control systems for systems requiring switched optical control.1618 Controlling meso-scale systemstens to low hundreds of physical qubitswill require substantial investment in systems engineering. Experimental laboratories likely will have to contract or hire professional staff with mixed-signal circuit design experience to do extensive control circuits, and for many technologies cryogenic circuits are required.

Interconnects. Even within individual chips, we are already seeing the beginning of designs with multiple types of interconnects. Integration levels will likely reach hundreds to low thousands of qubits in a single chip or subsystem, but reaching millions to billions of devices in a system will require interconnects that remain only on the drawing board. In some proposals, the inter-subsystem connections are physically similar to intra-subsystem connections, while in others they can be radically different. One ion trap proposal, for example, uses shared, quantum, vibrational modes for intra-node operations and optical connections between separate ion traps.29

量子架构的第一步。在这个最基本的层面上，量子架构的工作是确定单个量子比特或量子比特块如何相互连接和通信以处理数据。有三个主要领域的实验组已经开始考虑在设计他们的系统时的架构影响。

异构性。一些研究人员通过在单个系统中使用电子自旋、核自旋、磁通和光子偏振的组合来研究技术异构性。5 然而，考虑结构异构性同样重要，包括操作能力和互连速度或保真度。Martinis的团队最近展示了一款芯片，具有两个流量量子比特和存储元件之间的功能区别，他们因此将其称为量子冯·诺伊曼架构。24 在许多技术中，包括一些量子点和约瑟夫森结量子比特形式，测量量子比特需要额外的物理设备，消耗芯片空间，并使经典和量子组件的布局更加复杂。

集成和经典控制。增加芯片上的设备数量将需要改进控制电路的芯片内集成和I/O引脚的多路复用，以避免每个单独量子比特的多个机架式控制单元。考虑随着系统的增长在芯片上的经典控制开销的早期努力包括Oskin的设计。30 最近Levy等人的工作详细说明了芯片上硬件对错误纠正的影响。22 Kim利用他在微型镜方面的专业知识，专注于需要光控开关的系统的经典控制系统。1618 控制中等规模的系统（数百个物理量子比特）将需要在系统工程方面进行大量投资。实验室可能不得不聘请具有混合信号电路设计经验的专业人员来进行广泛的控制电路设计，并且在许多技术中需要低温电路。

互连。即使在单个芯片内部，我们已经看到了具有多种互连类型的设计的开端。集成级别可能会在单个芯片或子系统中达到数百到数千个量子比特，但要在系统中达到数百万到数十亿个设备，则需要仍然只停留在草图上的互连。在一些提议中，子系统之间的连接在物理上类似于子系统内部的连接，而在其他提议中，它们可能完全不同。例如，一个离子阱提议使用共享的量子振动模式进行节点内操作，并使用光连接连接不同的离子阱。29

Error Correction

The heroic efforts of experimentalists have brought us to the point where approximately 10 qubits can be controlled and entangled. Getting to that stage has been a monumental task as the fragile quantum system must be isolated from the environment, and its state protected from drifting. What will happen, then, when we push to hundreds, thousands, or millions of qubits?

This brings us to the next level of Figure 2, the microarchitecture for quantum computers. If a quantum architecture were designed where gates of an algorithm were run directly on the types of individual physical qubits that we have been describing, it would not work. Both the accuracy of the gate operations and the degree of isolation from the environment required to perform robust computation of any significant size lie far outside the realm of feasibility. To make matters worse, quantum data is subject to a restriction known as the "no-cloning theorem" that means standard, classical, methods of controlling error cannot be implemented.39 Quantum data cannot be "backed-up," or copied for simple repetition code processing to detect and correct for errors.

The possibility of performing quantum computation is saved, however, by quantum error correction. Some techniques are based on classical error correction and erasure correction, while others are based on uniquely quantum approaches.9 In all cases, a number of physical qubits are combined to form one or more logical qubits. The number of physical qubits per logical qubit is determined by the quantum operation error rates, the physical memory lifetime, and the accuracy required of the algorithm, and can vary from tens to possibly thousands. A key property of a code is its threshold, the accuracy to which all physical qubit operations must perform in order for the code to work. Once enough physical qubits can be interacted to make interacting logical qubits, with all physical device operations accurate to below the threshold, then error correction will keep the quantum computer error-free for the runtime of the algorithm.

When assessing a classical error correcting code, an important aspect is the code rate, the ratio of delivered, corrected symbols to the number of raw symbols used. High rates are achieved in part by using very large blocks that encode many bits. Block-based codes have been explored in quantum codes, but suffer from the drawback that logical operations on logical qubits within the block are difficult to execute fault tolerantly. When selecting a quantum code, the rate is important, but the demands made on the implementing hardware and the ease of logical operations are critical.

Performing error correction is not a simple task; in fact, the vast majority of the processing power of a universal quantum computer will be used to correct errors in the quantum state. Application algorithms and data processing make their appearance only at a level well above the real-time, (physical) qubit-by-qubit work of error correction. An architecture for a universal quantum computer will therefore have as its primary goal the execution of specific types of error correction.

The earliest ideas for QEC naturally took advantage of classical error correction techniques. After solving the problems of measuring error syndromes without destroying the quantum state and computing on encoded states without inadvertently spreading errors (known in QC literature as fault tolerance, referring to runtime errors rather than mid-computation failure of hardware components), application of classical error correction became relatively straightforward.9

A promising form of error correction is surface code computation, which grew out of work by Kitaev and others on topological quantum computing. Raussendorf and collaborators created the 2D and 3D versions suitable for solid-state and photonic systems, respectively.1131 Fowler, Devitt, and others have extended the practicality of these results, including implementing the real-time error processing necessary to determine that the classical half of the machine is a tractable engineering problem.8 The code rate of surface codes is poor, but their requirement only for nearest-neighbor connections will allow them to work at a higher physical error rate than other methods on some attractive hardware platforms.

Beyond digital quantum error correction for arbitrary states, other approaches can be used to (partially) isolate qubits from undesirable interactions. Decoherence-free subspaces encode a logical qubit in the phase difference of two physical qubits, suppressing the effect of certain types of noise. Techniques known as spin echo and dynamic decoupling similarly can be used to partially reverse the impact of systematic effects on memory, going with the error for a while and against it for a while, canceling out the effect. Purificationerror detection for specific stateswill be especially useful for communication, either system internal or external.

The implementation of error correction is perhaps the key near-term experimental goal. As experimental capabilities have grown, groups have begun competing to demonstrate quantum error correction in increasingly complete forms. Blatt's group performed multiple rounds of an error correction-like circuit that detects and corrects certain types of errors using certain simplifications.32 Pan's group has recently shown an eight-photon entangled state related to the unit cell of 3D surface error correction.40

Microarchitectures for error correction. As in classical computer architecture, microarchitecture is the bridge between physical device capabilities and the architecture. Microarchitecture in the quantum case can be understood to be the level dedicated to efficient execution of quantum error management, while the system architecture is the organization of microarchitecture blocks into a complete system. There are several specifically quantum elements that must be considered for this microarchitecture.

Clock speed. The conversion factor from physical gate cycle to logical gate cycle has a strong, underappreciated impact on the performance of an algorithm. It depends on a number of architectural features, as well as the error correction code itself. For the ion trap-based system analyzed by Clark et al.,626a 10μsec physical gate results in a 1.6msec error correction cycle time using the basic form of Steane's error correcting code, which encodes one logical qubit in seven physical ones. The code will need to be applied in recursive fashion, resulting in growth of the physical system by an order of magnitude and an increase in the logical clock cycle to 260msec, not far below the topmost quantum line in Figure 1. This dramatic increase illustrates the effect of the base physical error rate on the architecture and performance, and will be discussed later.

Efficient ancilla factories. Most numeric quantum algorithms depend heavily on a three-qubit gate called a controlled-controlled NOT gate, or Toffoli gate. In most quantum error correction paradigms, direct execution of a Toffoli gate on encoded logical qubits is not possible. Instead, the Toffoli gate is performed using several operations, including one that consumes a specially prepared ancilla (temporary variable) state. The ancilla is created using distillation (a quantum error detection code), which takes noisy physical states and builds more accurate logical states. Creation of these states may dominate the workload of the machine, and recent work has assumed that 75%90% of the machine is dedicated to their production. Isailovic et al. referred to this need as running quantum applications "at the speed of data," that is, producing the generic ancilla states rapidly enough that they are not the performance bottleneck.14

Balancing error management mechanisms. A functioning quantum computer almost certainly will not rely on a single type of error correction, but will incorporate different forms of error correction/detection/suppression at different levels. Different error management techniques have different strengths and weaknesses, and the combinatorial space for integrating multiple types is large.

Defects. A quantum architecture must also take into account that fabrication will inevitably be an imperfect process. Qubits may be declared defective because they fail to correctly hold the correct state variable carrier (for example, to trap a single electron), because memory lifetime is short or gate control imprecise, or because they fail to couple properly to other qubits. For gate-based, error-corrected systems, calculations show that a stringent definition of declaring a device to be functional pays for itself in reduced error correction demands.36 A system's resilience to low yield is very microarchitecture-dependent. Alternatively, the digital quantum error correction itself can be adapted to tolerate loss.34

错误校正

实验学家们的英勇努力使我们达到了控制和纠缠大约10个量子比特的程度。达到这个阶段是一个巨大的任务，因为脆弱的量子系统必须与环境隔离，并且其状态必须受到保护以防漂移。那么，当我们推进到数百、数千或数百万个量子比特时会发生什么呢？

这将引导我们进入量子计算机的微体系结构的下一个层次。如果量子体系结构是设计成直接在我们所描述的个别物理量子比特上运行算法门，那么它将行不通。算法门的准确性和执行任何重要规模的强大计算所需的环境隔离程度都远远超出了可行性的范围。更糟糕的是，量子数据受到一种限制，称为“无克隆定理”，这意味着不能实施标准的经典控制错误方法。量子数据不能进行“备份”或简单重复代码处理以检测和纠正错误。

然而，量子错误校正挽救了执行量子计算的可能性。一些技术是基于经典错误校正和擦除校正，而其他技术则基于独特的量子方法。在所有情况下，一些物理量子被组合成一个或多个逻辑量子。物理量子数量每逻辑量子数量由量子操作误差率、物理存储器寿命和算法所需的准确性确定，可以从几十个到可能达到数千个不等。代码的一个关键属性是其阈值，即所有物理量子操作必须执行到一定精度才能使代码正常工作。一旦足够的物理量子可以相互作用以形成与所有物理设备操作精度低于阈值的相互作用逻辑量子，那么错误校正将使得量子计算机在算法运行时无误。

在评估经典纠错代码时，代码速率是一个重要的方面，即传递的校正符号与使用的原始符号数量之比。高速率的实现在一定程度上是通过使用编码许多位的非常大的块来实现的。基于块的代码已经在量子代码中进行了探索，但其缺点是块中逻辑量子上的逻辑操作难以容错地执行。在选择量子代码时，速率很重要，但对实现硬件的要求和逻辑操作的易用性至关重要。

执行错误校正不是一项简单的任务；实际上，通用量子计算机的绝大部分处理能力将用于校正量子状态中的错误。应用算法和数据处理只在实时（物理）逐比特处理错误校正的上方水平上出现。因此，通用量子计算机的体系结构的主要目标是执行特定类型的错误校正。

QEC的最早想法自然利用了经典的错误校正技术。在解决了测量错误综合体而不破坏量子状态和在不经意地扩散错误的情况下对编码状态进行计算（在QC文献中称为容错，指的是运行时错误而不是硬件部件的中途故障）之后，应用经典错误校正就变得相对简单了。

一种有前途的错误校正形式是表面编码计算，它源于Kitaev等人在拓扑量子计算方面的工作。Raussendorf和合作者创建了适用于固态和光子系统的2D和3D版本。Fowler、Devitt和其他人扩展了这些结果的实用性，包括实施实时错误处理，以确定机器的经典部分是一个易于工程处理的问题。表面编码的代码速率很低，但仅需要最近邻连接，将允许它们在某些有吸引力的硬件平台上以更高的物理误差率工作。

除了任意状态的数字量子纠错之外，还可以使用其他方法来（部分）隔离量子比特与不良相互作用。无相干自由子空间在两个物理量子比特的相位差中编码一个逻辑量子，抑制某些类型的噪音的影响。称为自旋回波和动态解耦的技术同样可以用来部分地扭转系统对存储的系统性影响，一段时间内跟着错误，一段时间内反对错误，抵消效应。特定状态的纯化错误检测将在通信方面特别有用，无论是系统内部还是外部。

执行错误校正可能是近期实验的关键目标。随着实验能力的增长，各个团队已经开始竞争，展示越来越完整的量子错误校正形式。Blatt的小组执行了多轮类似于错误校正的电路，使用了某些简化来检测和校正某些类型的错误。潘的小组最近展示了与3D表面错误校正单元格相关的八光子纠缠状态。

错误校正的微体系结构。与经典计算机体系结构一样，微体系结构是物理设备能力和体系结构之间的桥梁。在量子情况下，微体系结构可以理解为专门用于执行量子错误管理的级别，而系统体系结构是将微体系结构块组织成完整系统的组织。对于这个微体系结构，必须考虑到几个特定的量子元素。

时钟速度。物理门周期到逻辑门周期的转换因子对算法性能有着强烈而未受到重视的影响。它取决于许多体系结构特征，以及纠错代码本身。对于Clark等人分析的基于离子陷阱的系统，使用Steane的基本纠错代码，即在七个物理量子比特中编码一个逻辑量子，10μsec的物理门会导致1.6msec的纠错周期时间。代码将需要以递归方式应用，物理系统的增长将增加一个数量级，并且逻辑时钟周期将增加到260msec，不远低于图1的最高量子线。这种显著的增加说明了基本物理误差率对体系结构和性能的影响，并将在后面进行讨论。

高效的辅助因子工厂。大多数数字量子算法在很大程度上依赖于一种称为控制控制NOT门或Toffoli门的三量子比特门。在大多数量子错误校正范例中，无法直接在编码逻辑量子上执行Toffoli门。相反，Toffoli门是使用多个操作执行的，包括消耗特别准备的辅助因子（临时变量）状态的操作。使用精馏（量子错误检测代码）创建辅助因子，它可以采用嘈杂的物理状态并构建更准确的逻辑状态。创建这些状态可能占据机器的工作量，并且最近的工作已经假定75%-90%的机器专门用于它们的生产。Isailovic等人称这种需求为以“数据的速度”运行量子应用程序，即快速地产生通用辅助因子状态，使它们不是性能瓶颈。

平衡错误管理机制。一个功能齐全的量子计算机几乎肯定不会依赖于单一类型的错误校正，而将在不同的级别结合不同形式的错误校正/检测/抑制。不同的错误管理技术具有不同的优点和缺点，多种类型的组合空间很大。

缺陷。量子架构也必须考虑到制造过程不可避免地存在缺陷。量子比特可能会被宣布为有缺陷，因为它们未能正确地保持正确的状态变量载体（例如，无法捕获单个电子），因为存储器寿命很短或门控制不精确，或因为它们无法正确地与其他量子比特耦合。对于基于门的纠错系统，计算表明，严格定义一个设备为功能性的要求在减少纠错需求方面是物有所值的。[36]一个系统对低产量的韧性非常依赖于微架构。另外，数字量子纠错本身可以适应容忍损失。[34]

Workloads

So far we have looked at the lower two levels of Figure 2. Before investigating a complete quantum computer architecture, we need to consider the algorithms and programs that will be run on the physical hardwarethe workload for the quantum computer. We therefore skip to the top of the stack: quantum programming and quantum algorithms. We are still in the time of Babbage, trying to figure out what Knuth, Lampson, and Torvalds will do with a quantum computer. It has been widely believed that Shor's factoring algorithm33 and quantum simulation3420 will provide the two driving reasons to build machines. There are, however, a number of other useful and interesting quantum algorithms, seven of which are being investigated by teams involved in IARPA's Quantum Computer Science Program.b Bacon and van Dam1 and Mosca28 have published surveys covering quantum random walks, game theory, linear algebra, group theory, and more. Our understanding of how to design new quantum algorithms that asymptotically outperform classical ones continues to grow, though the number of people who can put the concepts into practice remains small.

Given the applications we have, how large a computer is needed to run them, and how should it be structured? Only a few quantum algorithms have been evaluated for suitability for actual implementation. Shor's algorithm is commonly used as a benchmark, both for its importance and clarity, and because the arithmetic and quantum Fourier transform on which it is founded are valuable building blocks for other algorithms.7122636 Unfortunately, the size and speed of a machine needed to run the algorithm has been widely misunderstood. Architects have suggested a physical machine comprised of high millions to billions of qubits to factor a 2,048-bit number, a size that experimental physicists find staggering.7162636

In part because designs for a Shor machine have proven to be intimidatingly large, consensus is building that a Shor machine will not be the first commercially practical system, and interest in designing machines for quantum chemistry is growing. In a somewhat unexpected result, Brown's group has shown that certain quantum simulation algorithms expected to be computationally tractable on quantum computers are turning out to have dismayingly large resource requirements.6 However, the field of quantum simulation is varied, and these simulators remain attractive; they are simply going to take more quantum computational resources (hence, more calendar years to develop and more dollars to deploy) than originally hoped.

A key element of the process of developing applications will be programming tools for quantum computers, and enough language designs were under way by 2005 to warrant a survey with a couple of hundred references.13 In what are arguably the first examples of true "quantum programming," as distinct from "quantum algorithm design," shortly after the publication of Shor's factoring algorithm, Vedral et al. and Beckman et al. produced detailed descriptions of the circuits (sequences of gate operations) necessary for the modular exponentiation portion of the algorithm, which appeared as a single line in Shor's original paper.238The next step in implementation is to adapt such a description for execution on a particular machine, as in the block marked "Architecture-aware algorithm implementation" in Figure 2. Matching implementation choices to the strengths of the machine, including choosing adder circuits that match the application-level system interconnect, and trading off time and space, will be a collaboration between the programmer and the tools.37Maslov and others have studied efficient architecture-aware compilation,25 an important element in the IARPA program. Compiler backends for specific experimental hardware configurations will soon be important, as will methods for debugging quantum programs in situ.

工作负载

到目前为止，我们已经查看了图2 的下两个层次。在调查完整个量子计算机体系结构之前，我们需要考虑算法和程序将在物理硬件上运行的工作负载——量子计算机的工作负载。因此，我们跳到堆栈的顶部：量子编程和量子算法。我们仍处于巴贝奇时代，试图弄清楚Knuth、Lampson和Torvalds将如何使用量子计算机。人们普遍认为，Shor的因子分解算法33和量子模拟3420将成为建造机器的两个主要原因。然而，还有许多其他有用和有趣的量子算法，其中七种正在IARPA的量子计算机科学计划相关团队进行研究。b Bacon和van Dam1以及Mosca28 已经发表了关于量子随机漫步、博弈论、线性代数、群论等的调查。我们对如何设计新的量子算法，使其在渐近层面上优于经典算法的理解不断增长，尽管能够将这些概念付诸实践的人数仍然很少。

鉴于我们拥有的应用程序，需要多大的计算机才能运行它们，以及应该如何构建它？目前只有几个量子算法已经评估了其适用性以进行实际实现。Shor的算法通常用作基准，既因为其重要性和清晰度，又因为其基础是算术和量子傅里叶变换，这些都是其他算法的有价值的构建模块。7122636不幸的是，运行这种算法所需的机器的大小和速度已经被广泛误解。架构师建议一个由高百万到数十亿个量子比特组成的物理机器来分解一个2048位的数字，这是实验物理学家感到震惊的规模。7162636

由于Shor机器的设计被证明非常庞大，因此人们普遍认为Shor机器不会是第一个商业实用系统，因此设计用于量子化学的机器的兴趣正在增加。在某种意义上出乎意料的结果是，Brown的团队已经证明，某些预计在量子计算机上计算可行的量子模拟算法需要的资源非常巨大。6然而，量子模拟领域是多样化的，这些模拟器仍然很有吸引力。它们只是需要更多的量子计算资源（因此需要更多的日历年才能开发和更多的资金才能部署）。

开发应用程序过程的关键要素是量子计算机的编程工具，到2005年已经有足够多的语言设计进行了调查，涵盖了数百个参考文献。13在真正的“量子编程”领域中，与“量子算法设计”不同，Shor的因数分解算法发表后不久，Vedral等人和Beckman等人详细描述了算法模块化指数部分所需的电路（门操作序列），这在Shor的原始论文中只出现了一行。238实现的下一步是为特定机器调整这样的描述，正如图2中标记为“面向架构的算法实现”的区块所示。选择与机器的优势相匹配的实现选择，包括选择与应用级系统互连匹配的加法器电路以及时间和空间的权衡，将是程序员和工具之间的协作。37 Maslov等人研究了高效的面向架构的编译，25这是IARPA计划中的一个重要元素。特定实验硬件配置的编译器后端很快就会变得重要，同时还需要在现场调试量子程序的方法。

Quantum System Architectures

Finally we come to the central element in Figure 2. A complete system design will specify everything from the "business-level" requirements for an algorithm and machine capable of outperforming classical computers, through the details of the algorithm's implementation, the strength of error correction required and type of error management applied, the corresponding execution time of logical Toffoli gates (including the ancilla distillation discussed earlier), and the microarchitecture of individual areas of the system.

The DiVincenzo criteria are fundamental and necessary, but not sufficient to build a practical large-scale system. Considering instead the issues of quantum computer architecture results in a different focus, highlighting such mundane engineering criteria as being large enough and fast enough to be useful, and small enough and cheap enough to be built. Very loosely, meeting the DiVincenzo criteria can be viewed as the responsibility of experimental physicists, while the latter criteria are the responsibility of computer engineers.

Small-scale quantum architecture can be said to have begun with Lloyd's 1993 proposal for a molecular chain computer,23 the first for a potentially buildable device. The word "scalable" attained a permanent position in the lexicon with Kielpinski et al.'s 2002 proposal for an ion trap that can shuffle and manipulate individual atoms,17 an approach that continues to pay dividends. These ideas and many others for multi-qubit devices, such as the quantum von Neumann approach or scalable ion traps with distinct operation and storage sites, sketch local areas of the system using the technological building blocks, but provide no direct guidance on how to organize large systems that meet the goal of solving one or more problems that are classically intractable.

When considering the macro architecture, certain aspects of the design become clear. Because all of memory is expected to be active, the system will probably not consist of separate CPU and memory banks connected via wide, shared buses. A more uniform array of microarchitecture error correction building blocks is the obvious approach, tempered by issues such as defective devices and the needs of the classical control subsystems. Each of these microarchitecture building blocks may utilize heterogeneous technology with an internal storage/computation distinction.24 Individual chips or ion traps will not be large enough to execute some algorithms (notably Shor) at scale, likely forcing the adoption of a multicomputer structure.152936

Large-scale designs are going to be difficult to create and evaluate without the appropriate tools. Further investment in automated tools for co-design of internally heterogeneous hardware and compilation of software is critical. One good example of this practice is Svore and Cross, working with Chuang, who have developed tool chains with round-trip engineering and error correction in mind.35

Architectural analyses exist for ion trap systems using Steane error correction, and multiple, distinct forms of nanophotonic and solid-state systems using the surface code.7162636 We next take up one moderately complete architecture as an example.

An Architecture at Scale

We can use error management and application workloads to determine the broad outlines of a computer that could run a useful algorithm at full scale. The size of a quantum computer grows depending on the algorithm's demand for logical qubits, the quantum error correction scheme, the gate and memory error rate, and other factors such as the yield of functional qubits in the system. Overall, including space for various temporary variables and the ancilla state distillation, the scale-up factor from logical to physical qubits can reach half a million. As a specific example, the resource growth in one architecture36 can be assigned approximately as follows:

Establish a substantially post-classical goal of factoring an L = 2,048-bit number using Shor's algorithm, requiring

6L logical qubits to run a time-efficient form of the algorithm, growing by

8 × to build "singular factories" for the state distillation process, allowing the algorithm to run at the speed of data,

1.33 × to provide "wiring" room to move logical qubits around within the system,

10, 000 × to run the Raussendorf-Harrington-Goyal form of the surface code with an error correcting code distance d = 56, and finally

4 × to work around a yield of functional devices of 40%.

A singular factory is simply a region of the computer assigned by the programmer or compiler to the creation of the ancillae discussed here. The size, number, and position of the singular factories is dependent on the physical error rate and the size of the computation to be performed, and the type of interconnects available. The space for wiring is a surface code-dependent factor, not required when using other error correction methods, and is probably a substantial underestimate, though researchers are currently looking for compact compilations of programs on the surface code that will minimize this factor. The chosen code distance d is strongly dependent on the application algorithm itself and on the physical gate error rate. Shor's algorithm for L = 2,048 demands that, roughly speaking, we must be able to run 1015 logical operations with a high probability of correctly executing them all. This work was done assuming a physical error rate of 0.2%, which is not very far below the surface code threshold of 0.75%. These two factors determine the large distance of 56, and in the case of the surface code required resources grow as d2, giving the high scale-up factor.11 The final factor of four is strongly dependent on the details of the microarchitecture and the yield.

When will a quantum computer do science, rather than be science?

This results in a final system size of six billion physical qubits for the main quantum state itself, each of which must be independently controllable. In this particular architecture, this staggering number must be augmented with additional qubits for communications, on-chip optical switches, delay lines, and many external supporting lasers, optical switches, and measurement devices, all deployed in a large multicomputer configuration.

The performance of the system is determined by the error correction time and the complexity of executing the application gates on top of the error correction code. The surface code cycle time on this architecture for measuring all error syndromes is ~50μsec, far slower than the 100psec planned for physical gates. A Toffoli gate will require ~50mseca factor of 1,000 from QEC cycle to logical gate, for this code distance. Demonstrating how system-level issues affect performance, the QEC cycle time is limited by contention for access to on-chip waveguides.

In part to address some of these architectural limitations, the QuDOS architecture was developed.16 QuDOS, if built, would be 100 times faster, largely due to increased parallelism in measuring the error syndromes. Overall, coordinated changes of physical technology, error correction mechanism, and architecture may gain 34 orders of magnitude in performance, demonstrating the impact of the field of quantum computer architecture.

Conclusion

A principal lesson learned so far in research on quantum computer architectures is that systems capable of solving classically intractable problems will be large, although the search for the smallest commercially attractive machine continues. Device sizes will limit integration levels, affecting architecture, and determining logical clock speed requires making many design decisions but dramatically affects what can and cannot be effectively computed (as shown in Figure 1). Architectural problems cover a broad range, but have received only modest amounts of attention compared to near-term experimental hurdles, leaving much room for high-impact research that can help guide the focus of experimental work.

To sharpen the community focus on building systems, it seems to be time to begin demanding Moore's Law-like improvements in system capacity. Reviewers of papers and funding proposals should look for realistic estimates of logical Toffoli gate time, incorporating error correction, for some target logical fidelity. Even more ambitiously, we recommend requiring realistic estimates of application performance.

Developing a sense of community is critical. Creating a shared understanding including vocabulary, concepts, and important problems among the physics and CS theory, algorithm design, physics experiment, engineering, and architecture communities has proven to be difficult, and few journals or conferences currently provide good venues for such interdisciplinary endeavors, but we expect the number will grow.

Let us close with a question that provokes answers ranging from, "Already has," (in reference to direct quantum simulation of a specific quantum system20) to "Twenty years," to "Never,"and all these from people actually working in the field:

When will the first paper appear in Science or Nature in which the point is the results of a quantum computation, rather than the machine itself? That is, when will a quantum computer do science, rather than be science?

量子系统构架

最后我们来到了图2中的中心要素。完整的系统设计将涵盖从算法和机器的“业务级”需求，到算法实现的细节，所需的错误纠正强度和所应用的错误管理类型，逻辑 Toffoli 门的相应执行时间（包括前面讨论的辅助精馏），以及系统中各个区域的微架构。

DiVincenzo 标准是基本的和必要的，但不足以构建实用的大规模系统。考虑量子计算机体系结构的问题会有不同的关注重点，强调的是足够大和足够快以实用，又足够小和便宜以构建。粗略地说，满足 DiVincenzo 标准可以被视为实验物理学家的责任，而后者的标准是计算机工程师的责任。

小规模的量子架构可以说是从 Lloyd 的 1993 年分子链计算机提议23开始的，这是第一个可能可构建的设备的提议。"可扩展"一词在 Kielpinski 等人的 2002 年离子阱提议中获得了永久的位置，该方法可以洗牌和操纵单个原子，17这种方法仍在产生回报。这些想法以及许多其他多量子位设备的想法，例如量子冯·诺伊曼方法或具有不同操作和存储站点的可扩展离子阱，使用技术构建块勾勒出系统的局部区域，但对于如何组织能够解决一个或多个在经典计算机中难以处理的问题的大型系统没有直接的指导。

在考虑宏体系结构时，设计的某些方面变得清晰。由于所有存储器都预计是活动的，系统可能不会由通过广泛的共享总线连接的独立 CPU 和存储器库组成。一个更统一的微架构错误纠正构建块阵列是明显的方法，但必须考虑诸如有缺陷的设备和经典控制子系统的需求等问题。每个这些微架构构建块可能会使用具有内部存储/计算区别的异构技术。24单个芯片或离子阱将不足以在规模上执行某些算法（特别是 Shor 算法），这可能会迫使采用多计算机结构。15，29，36

缺乏适当的工具，大规模设计将很难创建和评估。自动化工具的进一步投资，用于内部异构硬件的协同设计和软件的编译，至关重要。Svore 和 Cross 是这个实践的一个好例子，他们与 Chuang 一起开发了工具链，考虑了往返工程和错误纠正。35

离子阱系统使用 Steane 错误纠正，多种不同形式的纳米光子和固态系统使用表面编码的体系结构分析存在。7，16，26，36我们接下来拿一个相对完整的体系结构作为例子。

大规模体系结构

我们可以使用错误管理和应用工作负载来确定一个计算机的大致轮廓，以便在全面规模上运行有用的算法。量子计算机的大小取决于算法对逻辑量子位的需求、量子错误纠正方案、门和存储器错误率以及诸如系统中功能量子位的产量等因素。总体而言，包括各种临时变量和辅助状态精馏的空间，从逻辑量子位到物理量子位的规模因子可能达到五十万。作为一个具体的例子，一个体系结构36中的资源增长可以近似地分配如下：

建立一个实质上后经典的目标，使用 Shor 算法分解一个 L=2048 位数需要

6 L 逻辑量子位运行时间高效的算法，增长了

8×以构建状态精馏过程的“单一工厂”，使算法以数据速度运行，

1.33×提供“布线”空间，以在系统内移动逻辑量子位，

10,000×运行 Raussendorf-Harrington-Goyal 表面代码形式，其中纠错码距离 d=56，最后是

4×解决 40% 的功能设备产量问题。

单一工厂只是一个由程序员或编译器分配的计算机区域，用于创建这里讨论的辅助精馏。这些单一工厂的大小、数量和位置取决于物理误差率和要执行的计算的大小、可用的互连类型。布线空间是一个表面代码相关的因素，在使用其他错误纠正方法时不需要，这可能是一个实质性低估，虽然研究人员目前正在寻找在表面代码上紧凑编译程序的方式，以使此因素最小化。所选代码距离 d 在很大程度上取决于应用算法本身和物理门误差率。对于 L=2048 的 Shor 算法，大致上来说，我们必须能够以高概率运行 1015 个逻辑操作，且正确执行所有操作。该工作假定物理误差率为 0.2%，这距表面代码阈值 0.75% 不太远。这两个因素决定了大距离 56，对于表面代码的情况，所需资源随增长，给出了高的规模因子。11最后的因子 4 在很大程度上取决于微架构的细节和产量。

什么时候量子计算机将会做科学研究，而不是成为科学研究的研究对象？

这导致主要量子状态本身的六十亿物理量子位的最终系统大小，每个量子位都必须独立可控。在这个特定的体系结构中，这个惊人的数字必须增加额外的量子位用于通信、芯片上的光开关、延迟线和许多外部支持的激光、光开关和测量设备，都部署在一个大型的多计算机配置中。

系统的性能由纠错时间和在纠错码之上执行应用程序门的复杂性决定。在这个体系结构中，测量所有错误综合的表面代码周期时间为 ~50μ秒，远远慢于物理门计划的 100p秒。对于这个代码距离，一个 Toffoli 门将需要 ~50毫秒，从 QEC 循环到逻辑门的因子为 1,000。这显示了系统级问题如何影响性能，QEC 周期时间受到争用芯片上波导的访问的限制。

为了解决一些这些体系结构限制，开发了 QuDOS 架构。16如果建造，QuDOS 将快 100 倍，主要是由于在测量错误综合方面的并行性增加。总的来说，物理技术、错误纠正机制和体系结构的协调变化可能会获得 34 个数量级的性能提升，展示了量子计算机体系结构领域的影响。

结论

到目前为止，在量子计算机体系结构研究中所学到的一个主要经验教训是，能够解决经典难题的系统将是大型的，尽管寻找最小的商业上有吸引力的机器的搜索仍在继续。设备的大小将限制集成级别，影响架构，并且确定逻辑时钟速度需要做出许多设计决策，但会极大地影响能够有效计算的内容。

Sidebar: Approaches to Quantum Information Processing

Data within a quantum computer is typically stored as quantum bits or qubits. Like a classical bit, a qubit has two states 0 and 1 but, unlike a classical bit, a qubit may be in a superposition of the two states, being in a certain sense in both 0 and 1 at the same time. A quantum gate operation that changes the state of this qubit can act on both values simultaneously. Each element in the superposition has a (complex) weight, say α for 0 and β for 1. When measuring a superposed state, only a single result (0 or 1) is returned, but the probability of measuring 0 is |α|2 and of measuring 1 is |β|2. We cannot predict which outcome we will see, only its relative likelihood.

The power of superposition extends when we consider qubit registers: by analogy with a classical register, many qubits act together to store data as bit strings. In the quantum case, the register can be in a superposition of all possible register values. For example, a 3-qubit register can be in the superposed state of all eight values 000 to 111, all with different weights. In some such cases, when the superposition contains more than a single qubit, the qubits can be entangled with each other. The individual qubits no longer act independently, and exhibit much more strongly correlated behavior than is possible for classical systems. As with a single qubit, when quantum gates are performed on a register, operations are performed on all values simultaneously.

Extracting the relevant data is the difficult part of quantum computing. Only one element of a superposition can ever be measured, and we cannot choose which one it is. Algorithm designers aim to manipulate the weights so that, when the time comes to measure the result, the majority of the weight is given to a state that is a solution to the input problem.

Several different methods have been developed to use these fundamental elements of quantum computing. The most widely considered is circuit-based computation. Directly analogous with classical digital computation, data is stored in qubits and manipulated by the application of gate operations. In general, the first step of a circuit-based computation is to create an equal superposition of all register states. Gate operations between qubits then change the weights in the superposition, usually creating entanglement in the process.

A separate approach is adiabatic quantum computation. As with the circuit model, the output state is measured to give the final answer. In this case, however, the state is designed to be the low energy ground state of a quantum system in the quantum computer. The key to the computation is to adjust the coupling between quantum systems in the device to allow it to relax into that specific ground state.

Other approaches include measurement-based quantum computation, in which a large entangled state is reduced to the desired output state simply by carefully choosing how to measure the qubits, and direct simulation, in which the quantum states are designed to model a different physical system, rather than calculate a value numerically.

侧边栏：量子信息处理方法

在量子计算机中，数据通常存储为量子位或量子比特。与传统比特一样，量子比特也有两种状态：0和1，但与传统比特不同的是，量子比特可以处于两种状态的叠加状态，某种意义上同时处于0和1状态。更改此量子比特状态的量子门操作可以同时作用于两个值。叠加中的每个元素都有一个（复杂的）权重，例如0的权重为α，1的权重为β。在测量叠加状态时，只返回一个结果（0或1），但测量0的概率为|α| [2]，测量1的概率为|β| [2]。我们无法预测将看到哪种结果，只能预测其相对可能性。

当考虑量子比特寄存器时，叠加的能力扩展了：类比于经典寄存器，许多量子比特一起作为位字符串存储数据。在量子情况下，寄存器可以处于所有可能的寄存器值的叠加状态。例如，3比特寄存器可以处于所有八个值000到111的叠加状态，每个值都有不同的权重。在某些这样的情况下，当叠加包含多个量子比特时，量子比特可以相互纠缠。单个量子比特不再独立作用，比经典系统可能展现出更强的相关行为。与单个量子比特一样，在寄存器上执行量子门时，操作将同时作用于所有值。

提取相关数据是量子计算的难点。超级叠加中只有一个元素可以被测量，而我们不能选择哪一个。算法设计人员旨在操纵权重，使得在测量结果时，大多数权重都赋予输入问题的解决方案状态。

已经开发了几种不同的方法来使用量子计算的基本元素。最广泛考虑的是基于电路的计算。类比于传统的数字计算，数据存储在量子比特中，并通过应用门操作进行操作。通常，电路计算的第一步是创建所有寄存器状态的相等叠加状态。然后，量子比特之间的门操作会更改叠加中的权重，通常在过程中创建纠缠。

另一种方法是绝热量子计算。与电路模型一样，输出状态被测量以给出最终答案。但是，在这种情况下，状态被设计为量子计算机中量子系统的低能量基态。关键在于调整设备中量子系统之间的耦合，以允许其松弛到特定的基态。

其他方法包括基于测量的量子计算，其中通过仔细选择如何测量量子比特，将大型纠缠态缩减为所需的输出状态，以及直接模拟，其中量子状态被设计为模拟不同的物理系统，而不是通过数值计算来计算值。

Figure 1. Scaling the classical number field sieve (NFS) vs. Shor's quantum algorithm for factoring.

Figure 2. Quantum computer architecture among some sub-fields of quantum computation. QEC is quantum error correction; FT is fault tolerant.

Figure 3. The Josephson junction superconducting flux qubit.

Table. A few of the many types of qubit technologies available. Different technologies rely on the same or different state variables to hold quantum data, implemented in different materials and devices. The equivalent technology for standard classical computing is given.

总结：

本文主要介绍了量子计算的基本概念和方法，以及量子计算机体系结构的发展和挑战。量子计算是一种利用量子力学的特殊性质来解决某些计算问题的方法，包括基于电路的计算、绝热量子计算、基于测量的量子计算和直接模拟等多种方法。

量子计算中的基本单位是量子比特或量子位，与传统的比特不同的是，量子比特可以处于两种状态的叠加状态，即某种意义上同时处于 0 和 1 状态。量子比特的叠加状态可以通过量子门操作来更改，而与单个量子比特相比，量子寄存器可以处于所有可能的寄存器值的叠加状态，寄存器中的量子比特可以相互纠缠。

针对这些特点，本文介绍了基于电路和基于绝热的量子计算、基于测量的量子计算、直接模拟等多种量子计算方法。基于电路的量子计算是一种类比于传统的数字计算的方法，数据存储在量子比特中，并通过应用门操作进行操作。绝热量子计算类似于电路模型，输出状态被测量以给出最终答案，但是，在这种情况下，状态被设计为量子计算机中量子系统的低能量基态。基于测量的量子计算则是通过仔细选择如何测量量子比特，将大型纠缠态缩减为所需的输出状态。直接模拟则是将量子状态设计为模拟不同的物理系统，而不是通过数值计算来计算值。

另外，本文还介绍了量子计算机的体系结构设计和错误纠正技术，以及一些先进的量子比特技术，如 Josephson junction 超导量子比特和磁通量量子比特。错误纠正技术是指通过增加冗余量子比特来检测和纠正量子比特的错误。量子计算机的体系结构设计则是指如何将量子比特构建成集成电路的形式，以便进行大规模计算。

在体系结构研究中，研究者学到的经验教训之一是，解决经典难题的系统将是大型的，设备的大小将限制集成级别，影响架构。为了解决这个问题，本文介绍了一种名为 QuDOS 的体系结构，该体系结构能够快 100 倍，主要是由于在测量错误综合方面的并行性增加。总的来说，物理技术、错误纠正机制和体系结构的协调变化可能会获得 34 个数量级的性能提升，展示了量子计算机体系结构领域的巨大影响。

此外，本文还包括了一些插图和表格，展示了量子计算的相关技术和方法。总的来说，量子计算是一个快速发展的领域，具有许多潜在的应用和挑战。通过深入了解量子计算的基本原理和方法，以及量子计算机的体系结构和错误纠正技术，我们可以更好地理解这个领域，并为未来的发展做出贡献。

A Blueprint for Building a Quantum Computerb

key insights

Qubit Technologies

Qubit Technologies

Error Correction

错误校正

Workloads

工作负载

Quantum System Architectures

An Architecture at Scale

Conclusion

大规模体系结构

结论

Sidebar: Approaches to Quantum Information Processing