Usage
Example
from qulacs import QuantumCircuit, QuantumState
from qulacs.circuit import QuantumCircuitOptimizer
from qulacs.gate import Y,CNOT,merge
from mpi4py import MPI # import is required for MPI execution even if MPI classes are not used
comm = MPI.COMM_WORLD
mpirank = comm.Get_rank()
# Define quantum states and initialize them with random values
state = QuantumState(3, use_multi_cpu=True)
state.set_Haar_random_state()
# Define quantum circuits and update quantum states
circuit = QuantumCircuit(3)
circuit.add_X_gate(0)
merged_gate = merge(CNOT(0, 1), Y(1))
circuit.add_gate(merged_gate)
circuit.add_RX_gate(1, 0.5)
# quantum circuit optimization
Opt_type = "Heavy"
if Opt_type == "Light":
# Optimize using SWAP (reduces inter-node communication in update processing)
swap_level = 1
# Quantum gates of a quantum circuit is merged by a greedy method
# until the target quantum bit becomes the specified size.
if state.get_device_name() == "multi-cpu":
QuantumCircuitOptimizer().optimize_light(circuit, swap_level)
else:
QuantumCircuitOptimizer().optimize_light(circuit)
elif Opt_type == "Heavy":
# Maximum quantum gate size allowed to be merged
max_block_size = 2
# More powerful SWAP/FusedSWAP optimizations than swap_level = 1
swap_level = 2
# The optimisation also takes into account the exchange of exchangeable quantum gates.
# If the quantum circuit size is large, the optimisation process time may increase significantly.
if state.get_device_name() == "multi-cpu":
QuantumCircuitOptimizer().optimize(circuit, max_block_size, swap_level)
else:
QuantumCircuitOptimizer().optimize(circuit, max_block_size)
# Update states using circuit
circuit.update_quantum_state(state)
# Sampling from quantum states
samples = state.sampling(20, 2022) # with random seed = 2022
if mpirank == 0:
print(samples)
The second argument to QuantumState must be use_multi_cpu=True .
(This distributes state vectors across multiple compute nodes, enabling parallel processing by MPI.)
List of APIs with functional differences from Qulacs
QuantumState class
QuantumState(qubits, use_multi_cpu)
use_multi_cpu = False
Positions the entire state vector within one node (same behavior as Qulacs)
use_multi_cpu = True
- Distributes the state vector across multiple nodes.However, (N - k) ≦ log2(S) does not distribute. (S is the number of MPI ranks, N is the number of qubits, and k is the minimum number of qubits per process (constant k = 2).)
The qubit is classified as local qubit and global qubit and is arranged as follows:
local qubit: Placed in same node
global qubit: Distributed across multiple nodes
The top log2(number of ranks) qubits are classified as global qubits
state.get_device_name()
Returns the device where the state vector is located.
Return value
Explanation
“cpu”
State vector created in 1 node
“multi-cpu”
State vector distributed across nodes (ranks)
(“gpu”)
Qulacs means that the state vector is placed on the GPU, but mpiQulacs does not support placement on the GPU.
state.load(vector)
When state vectors are distributed, only the vector elements of the entire state vector that are placed at each rank must be specified as arguments. (For calls to state.load with rank N, only vector elements that are placed in rank N should be specified as arguments to the load method.) If you want to load the entire state vector as an argument, see Method for loading/getting entire distributed state vector.
state.get_vector()
If the state vectors are distributed, returns only the state vector of each rank (that is, a portion of the entire state vector). If you want to get the entire state vector as a return value, see Method for loading/getting entire distributed state vector .
state.to_string()
For a state vector with a large number of elements, the first 256 elements are output.
If the state vectors are distributed, the first part of each rank’s state vector (part of the whole state vector) is output.
The number of local qubit and global qubit is also output.
Sample output-- rank 0 -------------------------------------- *** Quantum State *** * MPI rank / size : 0 / 2 * Qubit Count : 20 (local / global : 19 / 1 ) * Dimension : 262144 * state vector is too long, so the (128 x 2) elements are output. * State vector (rank 0): (1,0) ... -- rank 1 -------------------------------------- * State vector (rank 1): (0,0) ...
state.set_Haar_random_state([seed])
Initializes each element of the state vector with a random number.
If the state vector is distributed,
If you do not specify seed, the random seed value generated by rank 0 is first shared among all ranks.
Based on the specified or shared seed, each rank uses (seed + rank number) as the final seed value. Therefore, if the state vectors are distributed, the state vectors generated will be different if the number of divisions is different, even if the same seed value is specified for the argument.
state.sample(number_sampling [, seed])
If seed is not specified, a random seed value generated at rank 0 is shared among all ranks and used as the seed value.
FusedSWAP(qubit_idx1, qubit_idx2, block_size)
Swaps the placement of block_size qubits from the index specified in qubit_idx1 with the placement of block_size qubits from the index specified in qubit_idx2.
- The effect of the gate is equivalent to continuously applying a SWAP gate to the range of qubit index being swapped.When a FusedSWAP gate is used, its replacement process is completed with one gate.
The primary use of FusedSWAP is to speed up gate operations when state vectors are distributed.
- As the qubit count increases, the information in the top log2(number of ranks) qubits is distributed across the ranks. Quantum gate operations on qubits (global qubits) that are placed across ranks are slower than operations on qubits (local qubits) that are placed in the same rank because they require communication between compute nodes.If you have a lot of gate operations on the global qubit, you can reduce the gate operations on qubits across the ranks by using a FusedSWAP gate to swap the placement of the global qubit and the local qubit. This reduces the amount of communication during gate operation and speeds up the operation.
Please also refer to Example of use of FusedSWAP .
Specifying seed in update_quantum_state
QuantumGate.update_quantum_state(state, seed)
QuantumCircuit.update_quantum_state(state, seed)
Random numbers used in gates using random numbers such as Measurement can be fixed. Once seed is specified, the random number generator initialized by seed is inherited.
Automatic FusedSWAP gate insertion of QuantumCircuitOptimizer
Add the swap_level argument to the optimize method of QuantumCircuitOptimizer. By specifying swap_level = 1, SWAP/FusedSWAP gates are automatically inserted to reduce gate operations on the global qubit. This reduces the amount of communication during gate operation and increases the speed.
optimize(circuit, block_size, swap_level=0)
optimize_light(circuit, swap_level=0)
swap_level = 0
Do not insert SWAP/FusedSWAP gate.
swap_level = 1
Inserting SWAP/FusedSWAP gate reduces and speeds up inter-node communication in update operations.
swap_level = 2
Insert SWAP/FusedSWAP gates while changing the gate order. You might be able to reduce traffic even further than swap_level = 1.
In mpiQulacs v1.3.0 and later, it is possible to perform optimization on a gate-by-gate basis and optimization using FusedSWAP at the same time.
Note
Note
Environment variable settings
QULACS_NUM_THREADS (1 - 1024)
Specifies the maximum number of threads to use with mpiQulacs. Takes precedence over OMP_NUM_THREADS.
Example of use of FusedSWAP
import numpy as np
from qulacs import QuantumCircuit, QuantumState
from mpi4py import MPI
comm = MPI.COMM_WORLD
mpirank = comm.Get_rank()
mpisize = comm.Get_size()
np.random.seed(seed=32)
def build_circuit(depth, nqubits, mpisize):
global_qc = int(np.log2(mpisize))
local_qc = nqubits - global_qc
swapped = False
circuit = QuantumCircuit(nqubits)
for _ in range(depth):
# operation on local qubit
for i in range(local_qc):
circuit.add_H_gate(i)
# Swap the placement of global qubit and local qubit before operating on global qubit
# (Place global qubits in the same rank)
if global_qc != 0:
circuit.add_FusedSWAP_gate(local_qc - global_qc, local_qc, global_qc)
swapped = not swapped
# operation on global qubit
for i in range(global_qc):
circuit.add_H_gate(local_qc - global_qc + i)
# In the for statement above, we added FusedSWAP depth times,
# we added FusedSWAP as needed to return to the original qubit sequence.
if (global_qc != 0) and swapped:
circuit.add_FusedSWAP_gate(local_qc - global_qc, local_qc, global_qc)
return circuit
if __name__ == '__main__':
nqubits = 36
st = QuantumState(nqubits, use_multi_cpu=True)
circuit = build_circuit(10, nqubits, mpisize)
circuit.update_quantum_state(st)
Method for loading/getting entire distributed state vector
As described in List of APIs with functional differences from Qulacs , QuantumState.load(vector), QuantumState.get_vector() processes only those vector elements that are placed at each rank in the entire state vector.
If you want to load/get_vector the entire state vector as an argument/return value, define a method like this and use it instead of load, get_vector:.
import numpy as np
from mpi4py import MPI
def load_state_vector(state, vector):
"""
Loads the given entire state vector into the given state.
Args:
state (qulacs.QuantumState): a quantum state
vector: a state vector to load
"""
if state.get_device_name() == 'multi-cpu':
mpicomm = MPI.COMM_WORLD
mpirank = mpicomm.Get_rank()
mpisize = mpicomm.Get_size()
vector_len = len(vector)
idx_start = vector_len // mpisize * mpirank
idx_end = vector_len // mpisize * (mpirank + 1)
state.load(vector[idx_start:idx_end])
else:
state.load(vector)
def get_state_vector(state):
"""
Gets the entire state vector from the given state.
Args:
state (qulacs.QuantumState): a quantum state
Return:
vector: a state vector
"""
if state.get_device_name() == 'multi-cpu':
mpicomm = MPI.COMM_WORLD
mpisize = mpicomm.Get_size()
vec_part = state.get_vector()
len_part = len(vec_part)
vector_len = len_part * mpisize
vector = np.zeros(vector_len, dtype=np.complex128)
mpicomm.Allgather([vec_part, MPI.DOUBLE_COMPLEX],
[vector, MPI.DOUBLE_COMPLEX])
return vector
else:
return state.get_vector()