About mpiQulacs

mpiQulacs is a quantum simulator based on Qulacs that enables parallel processing on multi-processes and multi-nodes, enabling high-speed execution of large-scale quantum circuits.

mpiQulacs is based on Qulacs.

mpiQulacs

Underlying Qulacs

~v1.2.2

v0.3.0

v1.3.0~

v0.3.1

Key Differences with Qulacs

Some features specific to mpiQulacs include:.

(See Usage for more information)

You can specify a flag use_multi_cpu when you create a QuantumState instance.
- QuantumState(qubits, use_multi_cpu)
  - “use_multi_cpu = True” indicates that the state vector will be distributed among multiple compute nodes (multiple ranks) as appropriate.
    
    Basically, always specify True.
    
    Note that even if True is specified, the state vector is not distributed if the number of qubits is small ( (N - k) ≦ log₂(S) ).
    
    (S is the number of MPI ranks, N is the number of qubits, and k is the minimum number of qubits per process (constant k = 2))

You can see whether the state vectors are distributed in state.get_device_name() .

Return value of state.get_device_name()

Return Value

Explanation

“cpu”

State vector created in 1 node

“multi-cpu”

State vector distributed across nodes (ranks)

(“gpu”)

Qulacs means that the state vector is placed on the GPU, but mpiQulacs does not support placement on the GPU.

Special behavior when state vectors are distributed
- The following methods display only the state vector information (part of the whole state vector) that each node has.
  - state.get_vector()
  - state.to_string()
- The method state.set_Haar_random_state(seed) generates different state vectors if the number of state vector divisions is different, even if seed is specified.
Random seed can now be specified in update_quantum_state().
- Added API
  - QuantumGateBase::update_quantum_state(state, seed)
  - QuantumCircuit::update_quantum_state(state, seed)
New adding of FusedSWAP gate
- The higher the qubits, the more the upper qubits are distributed across the ranks.
  
  Qubits placed across ranks are called global qubits, and qubits completed within the same rank are called local qubits.
  
  Quantum gate operations on global qubits are slower than operations on local qubits because they involve communication between compute nodes.
  
  By using a FusedSWAP gate, the global and local qubits can be repositioned to reduce the amount of communication during gate operation.
Optimization using FusedSWAP gate is added to QuantumCircuitOptimizer
It is optimized for 512 bit-SVE instructions on the A64FX.