10. Troubleshooting

10.1. Error on Running Programs

10.1.1. Segmentation fault

Make sure from mpi4py import MPI is written int the program code.
If you run a program that uses MPI without MPI import, a segmentation fault occurs. (However, using qiskit-qulacs requires no MPI import statements.)
If segmentation faults occur despite the MPI import statements, contact your Fujitsu representative.

10.1.2. Cannot allocate memory in static TLS block

This error is due to a bug in the GNU C libraries installed on the system.
If the error target is /lib64/libgomp.so.1, the workaround is to set the environment variable LD_PRELOAD=/lib64/libgomp.so.1 in job.sh (ipcluster_config.py for Jupyter).
If the error target is a file other than /lib64/libgomp.so.1, contact your Fujitsu representative.
(Reference) To fix the error, you need to re-create the wheel package by source-building the Python package on the quantum simulator system.

Warning

Searching the Internet for error messages will find a workaround to specify the path to the appropriate libgomp.so in the environment variable LD_PRELOAD, but this should not be done if LD_PRELOAD specifies a path other than /lib64/libgomp.so.1, which is installed on the system by default, as this may conflict with mpiQulacs behavior.

10.2. Error on Job Submission

When using the command squeue, the (REASON) of a running job may look like this.

  • (Resources): Other user information is not visible in this system, so PD may occur even if the running queue is not displayed. Check the availability with the command sinfo.

  • (PartitionNodeLimit): This is displayed when you run the job with more than the number of nodes each queue has. Check with your Fujitsu representative to determine the maximum number of nodes in each queue.

$ squeue
         JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            82     Batch test.job  USER PD       0:00      4 (PartitionNodeLimit)
            85     Batch test.job  USER PD       0:00      2 (Resources)
When using commands srun, sacct, sinfo, etc., the following error may occur.
If you encounter such an error or if any of the commands do not work correctly, contact your Fujitsu representative.
Error contents
  • srun: error: Application launch failed: Communication connection failure

  • slurm_load_partitions: Unable to contact slurm controller (connect failure)

  • sacct: error: Problem talking to the database: Connection refused

  • slurm_receive_msg: Zero Bytes were transmitted or received