Skip to content

Resolve the segmentation fault occurring in the pw float implementation #6130

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 43 commits into from
May 16, 2025
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
7b855e6
add unit test
A-006 Apr 8, 2025
4c779e8
add intergrate test
A-006 Apr 8, 2025
f9e7710
fix process
A-006 Apr 8, 2025
a8d72af
modify jd
A-006 Apr 8, 2025
5faf27e
update bug
A-006 Apr 8, 2025
4773008
set fftw float
A-006 Apr 8, 2025
503bf58
add the float BPCG
A-006 Apr 8, 2025
4b5df98
add float test
A-006 Apr 8, 2025
16cb172
fix compile bug
A-006 Apr 8, 2025
f5c1fc1
fix error
A-006 Apr 9, 2025
677e5d6
fix the compile test
A-006 Apr 9, 2025
98decc4
Merge branch 'develop' into fft_float2
A-006 Apr 9, 2025
f632326
Merge branch 'develop' into fft_float2
A-006 Apr 10, 2025
300713c
add
A-006 Apr 10, 2025
1dbacf8
remove the test file
A-006 Apr 18, 2025
f565945
change the file
A-006 Apr 18, 2025
e1601ee
revert bug
A-006 Apr 18, 2025
f6fd16d
set the float type
A-006 Apr 18, 2025
bed7852
Merge branch 'develop' into fft_float2
A-006 Apr 18, 2025
80344ac
reset the FFT_MEASURE
A-006 Apr 18, 2025
c60bf81
update unittest
A-006 Apr 18, 2025
ed18346
change readme
A-006 Apr 22, 2025
1f66367
update threashold
A-006 Apr 22, 2025
4c63669
Merge branch 'develop' into fft_float2
A-006 Apr 22, 2025
7553e06
use the test file
A-006 Apr 22, 2025
385b010
fix unresonable comments
A-006 Apr 22, 2025
2e13c7f
update eslover before all runners
A-006 Apr 27, 2025
2bf18b9
Merge branch 'develop' into fft_float2
A-006 Apr 27, 2025
a224da7
fix compile bug
A-006 Apr 27, 2025
59b73f5
fix bug
A-006 Apr 27, 2025
f750e10
Merge branch 'develop' into fft_float2
mohanchen Apr 29, 2025
d193075
update README
A-006 May 6, 2025
a9b53a1
change chebyshev MPI part
A-006 May 6, 2025
5c156e5
Merge branch 'develop' into fft_float2
A-006 May 8, 2025
d5084f6
add new test
A-006 May 8, 2025
aa443f1
delete old test
A-006 May 8, 2025
2d2a550
remove old tests
A-006 May 9, 2025
b1f144e
add change
A-006 May 13, 2025
c60d13f
Merge branch 'develop' into fft_float2
A-006 May 13, 2025
df3c712
update tick
A-006 May 13, 2025
ca1b0d9
add back marco
A-006 May 13, 2025
5ca14cd
update change
A-006 May 14, 2025
be074ab
Merge branch 'develop' into fft_float2
A-006 May 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:

- name: Configure
run: |
cmake -B build -DBUILD_TESTING=ON -DENABLE_DEEPKS=ON -DENABLE_MLKEDF=ON -DENABLE_LIBXC=ON -DENABLE_LIBRI=ON -DENABLE_PAW=ON -DENABLE_GOOGLEBENCH=ON -DENABLE_RAPIDJSON=ON -DCMAKE_EXPORT_COMPILE_COMMANDS=1
cmake -B build -DBUILD_TESTING=ON -DENABLE_DEEPKS=ON -DENABLE_MLKEDF=ON -DENABLE_LIBXC=ON -DENABLE_LIBRI=ON -DENABLE_PAW=ON -DENABLE_GOOGLEBENCH=ON -DENABLE_RAPIDJSON=ON -DCMAKE_EXPORT_COMPILE_COMMANDS=1 -DENABLE_FLOAT_FFTW=ON

# Temporarily removed because no one maintains this now.
# And it will break the CI test workflow.
Expand Down
4 changes: 4 additions & 0 deletions source/module_base/test/math_chebyshev_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -625,6 +625,8 @@ TEST_F(MathChebyshevTest, tracepolyA_float)

TEST_F(MathChebyshevTest, checkconverge_float)
{
#ifdef __MPI
#undef __MPI
const int norder = 100;
p_fchetest = new ModuleBase::Chebyshev<float>(norder);

Expand All @@ -648,5 +650,7 @@ TEST_F(MathChebyshevTest, checkconverge_float)

delete[] v;
delete p_fchetest;
#define __MPI
#endif
}
#endif
2 changes: 1 addition & 1 deletion source/module_basis/module_pw/pw_transform.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@ void PW_Basis::recip2real(const std::complex<FPTYPE>* in, FPTYPE* out, const boo
#endif
for (int i = 0; i < this->nst * this->nz; ++i)
{
fft_bundle.get_auxg_data<FPTYPE>()[i] = std::complex<double>(0, 0);
fft_bundle.get_auxg_data<FPTYPE>()[i] = std::complex<FPTYPE>(0, 0);
}

#ifdef _OPENMP
Expand Down
2 changes: 1 addition & 1 deletion source/module_basis/module_pw/test/pw_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ int main(int argc, char **argv)
int kpar;
kpar = 1;
#ifdef __ENABLE_FLOAT_FFTW
precision_flag = "single";
precision_flag = "mixing";
#else
precision_flag = "double";
#endif
Expand Down
18 changes: 14 additions & 4 deletions source/module_esolver/esolver_fp.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -24,17 +24,27 @@ namespace ModuleESolver
ESolver_FP::ESolver_FP()
{
std::string fft_device = PARAM.inp.device;

std::string fft_precsion;

// LCAO basis doesn't support GPU acceleration on FFT currently
if(PARAM.inp.basis_type == "lcao")
{
fft_device = "cpu";
}

pw_rho = new ModulePW::PW_Basis_Big(fft_device, PARAM.inp.precision);
if(PARAM.inp.precision == "single" || PARAM.inp.precision == "mixing")
{
fft_precsion = "mixing";
}else{
fft_precsion = "double";
}
#if (not defined(__ENABLE_FLOAT_FFTW) and (defined(__CUDA) || defined(__RCOM)))
if (fft_device == "gpu")
fft_precsion = "double";
#endif
pw_rho = new ModulePW::PW_Basis_Big(fft_device, fft_precsion);
if (PARAM.globalv.double_grid)
{
pw_rhod = new ModulePW::PW_Basis_Big(fft_device, PARAM.inp.precision);
pw_rhod = new ModulePW::PW_Basis_Big(fft_device, fft_precsion);
}
else
{
Expand Down
4 changes: 2 additions & 2 deletions source/module_hamilt_pw/hamilt_pwdft/structure_factor.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,7 @@ void Structure_Factor::setup_structure_factor(const UnitCell* Ucell, const Paral
// std::ofstream ofs( outstr.c_str() ) ;
bool usebspline;
if(nbspline > 0) { usebspline = true;
} else { usebspline = false;
}
} else { usebspline = false;}

if(usebspline)
{
Expand Down Expand Up @@ -147,6 +146,7 @@ void Structure_Factor::setup_structure_factor(const UnitCell* Ucell, const Paral
inat++;
}
}

if (device == "gpu") {
if (PARAM.globalv.has_float_data) {
resmem_cd_op()(this->c_eigts1, Ucell->nat * (2 * rho_basis->nx + 1));
Expand Down
28 changes: 28 additions & 0 deletions source/module_hamilt_pw/hamilt_pwdft/test/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ remove_definitions(-D__DEEPKS)
remove_definitions(-D__CUDA)
remove_definitions(-D__ROCM)
remove_definitions(-D__EXX)
remove_definitions(-DUSE_PAW)

AddTest(
TARGET pwdft_soc
Expand All @@ -26,4 +27,31 @@ AddTest(
TARGET radial_proj_test
LIBS parameter base device ${math_libs}
SOURCES radial_proj_test.cpp ../radial_proj.cpp
)

AddTest(
TARGET structure_factor_test
LIBS parameter ${math_libs} base device planewave
SOURCES structure_factor_test.cpp ../structure_factor.cpp ../parallel_grid.cpp
../../../module_cell/unitcell.cpp
../../../module_io/output.cpp
../../../module_cell/update_cell.cpp
../../../module_cell/bcast_cell.cpp
../../../module_cell/print_cell.cpp
../../../module_cell/atom_spec.cpp
../../../module_cell/atom_pseudo.cpp
../../../module_cell/pseudo.cpp
../../../module_cell/read_stru.cpp
../../../module_cell/read_atom_species.cpp
../../../module_cell/read_atoms.cpp
../../../module_cell/read_pp.cpp
../../../module_cell/read_pp_complete.cpp
../../../module_cell/read_pp_upf100.cpp
../../../module_cell/read_pp_upf201.cpp
../../../module_cell/read_pp_vwr.cpp
../../../module_cell/read_pp_blps.cpp
../../../module_elecstate/read_pseudo.cpp
../../../module_elecstate/cal_wfc.cpp
../../../module_elecstate/cal_nelec_nband.cpp
../../../module_elecstate/read_orb.cpp
)
128 changes: 128 additions & 0 deletions source/module_hamilt_pw/hamilt_pwdft/test/structure_factor_test.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
#include "gtest/gtest.h"
#include "gmock/gmock.h"
#include <string>
#include <cmath>
#include <complex>
#include "module_cell/unitcell.h"
#include "module_elecstate/module_dm/test/prepare_unitcell.h"
#define private public
#include "module_parameter/parameter.h"
#include "module_hamilt_pw/hamilt_pwdft/structure_factor.h"
#undef private
/************************************************
* unit test of class Structure_factor and
***********************************************/

/**
* - Tested Functions:
* - Fcoef::create to create a 5 dimensional array of complex numbers
* - Soc::set_fcoef to set the fcoef array
* - Soc::spinor to calculate the spinor
* - Soc::rot_ylm to calculate the rotation matrix
* - Soc::sph_ind to calculate the m index of the spherical harmonics
*/

//compare two complex by using EXPECT_DOUBLE_EQ()
InfoNonlocal::InfoNonlocal()
{
}
InfoNonlocal::~InfoNonlocal()
{
}

Magnetism::Magnetism()
{
}
Magnetism::~Magnetism()
{
}

class StructureFactorTest : public testing::Test
{
protected:
Structure_Factor SF;
std::string output;
ModulePW::PW_Basis* rho_basis;
UnitCell* ucell;
UcellTestPrepare utp = UcellTestLib["Si"];
Parallel_Grid* pgrid;
std::vector<int> nw = {13};
int nlocal = 0;
void SetUp()
{
rho_basis=new ModulePW::PW_Basis;
ucell = utp.SetUcellInfo(nw, nlocal);
ucell->set_iat2iwt(1);
pgrid = new Parallel_Grid;
rho_basis->npw=10;
rho_basis->gcar=new ModuleBase::Vector3<double>[10];
// for (int ig=0;ig<rho_basis->npw;ig++)
// {
// rho_basis->gcar[ig]=1.0;
// }
}
};

TEST_F(StructureFactorTest, set)
{
const ModulePW::PW_Basis* rho_basis_in;
const int nbspline_in =10;
SF.set(rho_basis_in,nbspline_in);
EXPECT_EQ(nbspline_in, 10);
}


TEST_F(StructureFactorTest, setup_structure_factor_double)
{
rho_basis->npw = 10;
SF.setup_structure_factor(ucell,*pgrid,rho_basis);

for (int i=0;i< ucell->nat * (2 * rho_basis->nx + 1);i++)
{
EXPECT_EQ(SF.z_eigts1[i].real(),1);
EXPECT_EQ(SF.z_eigts1[i].imag(),0);
}

for (int i=0;i< ucell->nat * (2 * rho_basis->ny + 1);i++)
{
EXPECT_EQ(SF.z_eigts2[i].real(),1);
EXPECT_EQ(SF.z_eigts2[i].imag(),0);
}

for (int i=0;i< ucell->nat * (2 * rho_basis->nz + 1);i++)
{
EXPECT_EQ(SF.z_eigts3[i].real(),1);
EXPECT_EQ(SF.z_eigts3[i].imag(),0);
}
}

TEST_F(StructureFactorTest, setup_structure_factor_float)
{
PARAM.sys.has_float_data = true;
rho_basis->npw = 10;
SF.setup_structure_factor(ucell,*pgrid,rho_basis);

for (int i=0;i< ucell->nat * (2 * rho_basis->nx + 1);i++)
{
EXPECT_EQ(SF.c_eigts1[i].real(),1);
EXPECT_EQ(SF.c_eigts1[i].imag(),0);
}

for (int i=0;i< ucell->nat * (2 * rho_basis->ny + 1);i++)
{
EXPECT_EQ(SF.c_eigts2[i].real(),1);
EXPECT_EQ(SF.c_eigts2[i].imag(),0);
}

for (int i=0;i< ucell->nat * (2 * rho_basis->nz + 1);i++)
{
EXPECT_EQ(SF.c_eigts3[i].real(),1);
EXPECT_EQ(SF.c_eigts3[i].imag(),0);
}
}

int main()
{
testing::InitGoogleTest();
return RUN_ALL_TESTS();
}
4 changes: 4 additions & 0 deletions source/module_hamilt_pw/hamilt_stodft/test/test_sto_tool.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,13 @@ void hamilt::HamiltSdftPW<T, Device>::hPsi_norm(const T* psi_in, T* hpsi, const

template class hamilt::HamiltPW<std::complex<double>, base_device::DEVICE_CPU>;
template class hamilt::HamiltSdftPW<std::complex<double>, base_device::DEVICE_CPU>;
template class hamilt::HamiltPW<std::complex<float>, base_device::DEVICE_CPU>;
template class hamilt::HamiltSdftPW<std::complex<float>, base_device::DEVICE_CPU>;
#if ((defined __CUDA) || (defined __ROCM))
template class hamilt::HamiltPW<std::complex<double>, base_device::DEVICE_GPU>;
template class hamilt::HamiltSdftPW<std::complex<double>, base_device::DEVICE_GPU>;
template class hamilt::HamiltPW<std::complex<float>, base_device::DEVICE_GPU>;
template class hamilt::HamiltSdftPW<std::complex<float>, base_device::DEVICE_GPU>;
#endif

/**
Expand Down
2 changes: 1 addition & 1 deletion source/module_io/read_set_globalv.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ void ReadInput::set_globalv(const Input_para& inp, System_para& sys)
bool float_cond = false;
#endif
sys.has_double_data = (inp.precision == "double") || (inp.precision == "mixing") || float_cond;
sys.has_float_data = (inp.precision == "float") || (inp.precision == "mixing") || float_cond;
sys.has_float_data = (inp.precision == "single") || (inp.precision == "mixing") || float_cond;
}

/// @note Here para.inp has not been synchronized of all ranks.
Expand Down
38 changes: 38 additions & 0 deletions tests/integrate/102_PW_BPCG_GPU_float/INPUT
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
INPUT_PARAMETERS
#Parameters (General)
suffix autotest
pseudo_dir ../../PP_ORB

gamma_only 0
calculation scf
symmetry 1
relax_nmax 1
out_level ie
smearing_method gaussian
smearing_sigma 0.02

#Parameters (3.PW)
ecutwfc 40
scf_thr 1e-7
scf_nmax 100
bndpar 2

#Parameters (LCAO)
basis_type pw
ks_solver bpcg
device gpu
precision single
chg_extrap second-order
out_dm 0
pw_diag_thr 0.00001

cal_force 1
#test_force 1
cal_stress 1
#test_stress 1

mixing_type broyden
mixing_beta 0.4
mixing_gg0 1.5

pw_seed 1
4 changes: 4 additions & 0 deletions tests/integrate/102_PW_BPCG_GPU_float/KPT
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
K_POINTS
0
Gamma
2 2 2 0 0 0
9 changes: 9 additions & 0 deletions tests/integrate/102_PW_BPCG_GPU_float/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
This test for:
*GaAs-deformation
*PW
*kpoints 2*2*2
*sg15 pseudopotential
*smearing_method gauss
*ks_solver bpcg
*mixing_type broyden-kerker
*mixing_beta 0.4
23 changes: 23 additions & 0 deletions tests/integrate/102_PW_BPCG_GPU_float/STRU
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
ATOMIC_SPECIES
As 1 As_dojo.upf upf201
Ga 1 Ga_dojo.upf upf201

LATTICE_CONSTANT
1 // add lattice constant, 10.58 ang

LATTICE_VECTORS
5.33 5.33 0.0
0.0 5.33 5.33
5.33 0.0 5.33
ATOMIC_POSITIONS
Direct //Cartesian or Direct coordinate.

As
0
1
0.300000 0.3300000 0.27000000 0 0 0

Ga //Element Label
0
1 //number of atom
0.00000 0.00000 0.000000 0 0 0
8 changes: 8 additions & 0 deletions tests/integrate/102_PW_BPCG_GPU_float/result.ref
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
etotref -4869.7470518349809936
etotperatomref -2434.8735259175
totalforceref 5.207670
totalstressref 37241.465646
pointgroupref C_1
spacegroupref C_1
nksibzref 8
totaltimeref 10.28
4 changes: 4 additions & 0 deletions tests/integrate/102_PW_BPCG_GPU_float/threshold
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
threshold 1
force_threshold 1
stress_threshold 1
fatal_threshold 1
Loading
Loading