Skip to content

Failed to read 8 bytes from input stream at first SCF iteration #6132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
16 tasks
stilldown opened this issue Apr 9, 2025 · 5 comments · May be fixed by #6194
Open
16 tasks

Failed to read 8 bytes from input stream at first SCF iteration #6132

stilldown opened this issue Apr 9, 2025 · 5 comments · May be fixed by #6194
Assignees
Labels
EXX and lr-TDDFT Related to EXX or lr-TDDFT

Comments

@stilldown
Copy link

stilldown commented Apr 9, 2025

Describe the bug

when running ABACUS with OMP_NUM_THREADS=12 nohup mpirun -n 2 --map-by socket --bind-to none abacus | tee output.log & , the program crashed at first step of SCF iterration using HSE functional. I use the -DDEBUG_INFO=ON to provide more details for debug

DIAMINODUT5-HSE.tar.gz

Expected behavior

No response

To Reproduce

before using toolchain, i have modified the script install_openmpi.sh and install_elpa.sh to enable the support of cuda awared mpi and cusolvermp and disabled compilation of gpu version of elpa.
configure of openmpi

      ./configure CFLAGS="${CFLAGS}" \
        --prefix=${pkg_install_dir} \
        --libdir="${pkg_install_dir}/lib" \
        --with-zlib=${ZLIB} \
        --with-libevent=internal \
        --with-cuda=${CUDA_PATH} \
        --with-ucx=${UCX} \
        --with-ucc=${UCC} \
        ${EXTRA_CONFIGURE_FLAGS} \
        > configure.log 2>&1 || tail -n ${LOG_LINES} configure.log

configure of elpa

      for TARGET in "cpu" ; do
        [ "$TARGET" = "nvidia" ] && [ "$ENABLE_CUDA" != "__TRUE__" ] && continue
        # disable cpu if cuda is enabled
        # [ "$TARGET" != "nvidia" ] && [ "$ENABLE_CUDA" = "__TRUE__" ] && continue
        echo "Installing from scratch into ${pkg_install_dir}/${TARGET}"
        mkdir -p "build_${TARGET}"
        cd "build_${TARGET}"
        if [ "${with_amd}" != "__DONTUSE__" ] && [ "${WITH_FLANG}" = "yes" ] ; then
        echo "AMD fortran compiler detected, enable special option operation"

the toolchain_gnu.sh

./install_abacus_toolchain.sh \
--with-gcc=install \
--with-intel=no \
--with-openblas=install \
--with-openmpi=install \
--with-cmake=install \
--with-scalapack=install \
--with-libxc=install \
--with-fftw=install \
--with-elpa=install \
--with-cereal=install \
--with-rapidjson=install \
--with-libtorch=install \
--with-libnpy=install \
--with-libri=install \
--with-libcomm=install \
--with-4th-openmpi=no \
--enable-cuda \
--gpu-ver=86 \
| tee compile.log

the build_abacus_gnu.sh

cmake -B $BUILD_DIR -DCMAKE_INSTALL_PREFIX=$PREFIX \
        -DCMAKE_CXX_COMPILER=g++ \
        -DMPI_CXX_COMPILER=mpicxx \
        -DLAPACK_DIR=$LAPACK \
        -DSCALAPACK_DIR=$SCALAPACK \
        -DUSE_ELPA=ON \
        -DELPA_DIR=$ELPA \
        -DCEREAL_INCLUDE_DIR=$CEREAL \
        -DFFTW3_DIR=$FFTW3 \
        -DLibxc_DIR=$LIBXC \
        -DENABLE_LCAO=ON \
        -DENABLE_LIBXC=ON \
        -DUSE_OPENMP=ON \
        -DENABLE_RAPIDJSON=ON \
        -DRapidJSON_DIR=$RAPIDJSON \
        -DUSE_CUDA=ON \
        -DUSE_CUDA_MPI=ON \
        -DENABLE_DEEPKS=ON \
        -DTorch_DIR=$LIBTORCH \
        -Dlibnpy_INCLUDE_DIR=$LIBNPY \
        -DENABLE_LIBRI=ON \
        -DLIBRI_DIR=$LIBRI \
        -DLIBCOMM_DIR=$LIBCOMM \
        -DENABLE_CUSOLVERMP=ON \
        -DCAL_CUSOLVERMP_PATH=$CUDA_PATH/lib64 \
        -DDEBUG_INFO=ON

Environment

No response

Additional Context

build - 副本.log

Task list for Issue attackers (only for developers)

  • Verify the issue is not a duplicate.
  • Describe the bug.
  • Steps to reproduce.
  • Expected behavior.
  • Error message.
  • Environment details.
  • Additional context.
  • Assign a priority level (low, medium, high, urgent).
  • Assign the issue to a team member.
  • Label the issue with relevant tags.
  • Identify possible related issues.
  • Create a unit test or automated test to reproduce the bug (if applicable).
  • Fix the bug.
  • Test the fix.
  • Update documentation (if necessary).
  • Close the issue and inform the reporter (if applicable).
@mohanchen mohanchen added the EXX and lr-TDDFT Related to EXX or lr-TDDFT label Apr 9, 2025
@mohanchen
Copy link
Collaborator

Thank you for proposing the issue. We will have someone to look at the issue.

@PeizeLin
Copy link
Collaborator

PeizeLin commented Apr 9, 2025

It's restart_load=True in INPUT, but no relevant restart files are provided here.
You can try as restart_load=False.

@stilldown
Copy link
Author

stilldown commented Apr 9, 2025

@PeizeLin It works. but it is confusing that I sometimes reuse the INPUT file and if there is no restart density, could it be automatically initialized the density and ignore the restart file or give some explicit warning.

@mohanchen
Copy link
Collaborator

@PeizeLin It works. but it is confusing that I sometimes reuse the INPUT file and if there is no restart density, could it be automatically initialized the density and ignore the restart file or give some explicit warning.

We will try to implement some warnings, thanks for your feedback.

@xuan112358
Copy link
Collaborator

@PeizeLin It works. but it is confusing that I sometimes reuse the INPUT file and if there is no restart density, could it be automatically initialized the density and ignore the restart file or give some explicit warning.

With #6194, if there is no restart information, an explicit warning will be output. The density will be initilized automatically and ABACUS will run as usual.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EXX and lr-TDDFT Related to EXX or lr-TDDFT
Projects
None yet
4 participants