Skip to content

Failed tests when compiling with openmp #971

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jalvesz opened this issue Mar 29, 2025 · 7 comments · May be fixed by #988
Open

Failed tests when compiling with openmp #971

jalvesz opened this issue Mar 29, 2025 · 7 comments · May be fixed by #988
Labels
bug Something isn't working build: cmake Issue with stdlib's CMake build files compiler: gfortran Specific to GCC Fortran compiler platform: Windows Build issues specific to the Windows platform

Comments

@jalvesz
Copy link
Contributor

jalvesz commented Mar 29, 2025

Description

I tested building and running the tests, including OpenMP support, by including the flag:

cmake -B build -G Ninja -DBUILD_TESTING=on -DCMAKE_Fortran_FLAGS=-fopenmp -DCMAKE_MAXIMUM_RANK:String=4 -DCMAKE_BUILD_TYPE=Release -DCMAKE_Fortran_COMPILER=gfortran
cmake --build build
ctest --test-dir build/test

several of the tests failed:

86% tests passed, 11 tests failed out of 77

Label Time Summary:
quadruple_precision    =   0.17 sec*proc (2 tests)

Total Test time (real) =  12.68 sec

The following tests FAILED:
         12 - chaining_maps (SEGFAULT)
         13 - open_maps (SEGFAULT)
         14 - maps (SEGFAULT)
         15 - intrinsics (Failed)
         30 - linalg_pseudoinverse (Failed)
         38 - blas_lapack (Failed)
         43 - sorting (Exit code 0xc0000374
)
         47 - mean (Failed)
         59 - string_intrinsic (Failed)
         64 - string_to_number (Failed)
         69 - simps (Failed)

         64 - string_to_number (Failed)
         69 - simps (Failed)
         64 - string_to_number (Failed)
         69 - simps (Failed)
         64 - string_to_number (Failed)
         69 - simps (Failed)

I wonder if one of the CI jobs should include OpenMP in order to catch such behaviours early ?

Expected Behaviour

Should pass

Version of stdlib

master

Platform and Architecture

Windows / gfortran 14.2.0

Additional Information

No response

@jalvesz jalvesz added the bug Something isn't working label Mar 29, 2025
@jvdp1
Copy link
Member

jvdp1 commented Mar 29, 2025

Strange that it failed on some procedures like blas_lapack or sorting. I agree that we should include OpenMP in at least one of the CI jobs.

@jalvesz
Copy link
Contributor Author

jalvesz commented Apr 2, 2025

I was looking at the intrinsics test and saw that they fail for sum and dot_product for xdp. I saw that there is a tolerance issue, for instance, if I print the tolerance and relative errors here

https://github.com/fortran-lang/stdlib/blob/60d0a769216322243e28a63b92ed7668d2df80d5/test/intrinsics/test_intrinsics.fypp#L213C1-L218C87

adding a print *, '${t}$ dot err:', tolerance, err(1:3)

I get without openmp:
real(xdp) dot err: 1.08420217248550443401E-0017 3.25260651745651330202E-0019 0.00000000000000000000 0.00000000000000000000

With openmp:
real(xdp) dot err: 1.08420217248550443401E-0017 2.22044604925031308085E-0016 0.00000000000000000000 5.55111512312578270212E-0016

For the latter, the errors seems to be funnily close to epsilon(0.d0) = 2.220446049250313E-016 ... I'm intrigued here, I wonder if the other tests might be suffering from something similar.

@perazz
Copy link
Member

perazz commented Apr 8, 2025

Yes, unfortunately I also noted this a while ago:

https://github.com/fortran-lang/fpm/blob/7535cab6efc89dd5a294f0d9643b5eebd6b237f0/src/fpm_meta.f90#L139-L142

I have never had time to dig into the issue, though.

I don't use openmp much, but I believe every time there is a static (save) variable somewhere, that must be declared THREADPRIVATE, otherwise all threads will write to it, causing unpredictable behavior.

@jalvesz
Copy link
Contributor Author

jalvesz commented Apr 11, 2025

On a different machine ( without the hash_functions tests #976 ) I got "only" the following fails when using openmp (here using GNU from msys2 instead of equation.com)

96% tests passed, 3 tests failed out of 73

Label Time Summary:
quadruple_precision    =   2.32 sec*proc (2 tests)

Total Test time (real) =  96.33 sec

The following tests FAILED:
         37 - sorting (SEGFAULT)
         60 - filesystem (Failed)
         63 - subprocess (Failed)

running: ctest --test-dir build/test --rerun-failed --output-on-failure

click to view log
1/3 Test #37: sorting ..........................***Exception: SegFault 10.68 sec
# Testing: sorting
  Starting char_ord_sorts ... (1/22)
  Starting string_ord_sorts ... (2/22)
  Starting bitset_large_ord_sorts ... (3/22)
  Starting bitset_64_ord_sorts ... (4/22)
  Starting int_radix_sorts ... (5/22)
  Starting real_radix_sorts ... (6/22)
  Starting int_sorts ... (7/22)
  Starting char_sorts ... (8/22)
  Starting string_sorts ... (9/22)
  Starting bitset_large_sorts ... (10/22)
  Starting bitset_64_sorts ... (11/22)
 ORD_SORT did not sort String Decrease.
 i =                     1
  Starting int_sort_indexes_default ... (12/22)
string_dummy(i-1:i) =
 ORD_SORT did not sort Bitset Random.
 i =                   235
bitset64_dummy(i-1:i) = 0000000000000000000000000000000000000000000000000000011110000110 0000000000000000000000000000000000000000000000000000000011101100
  Starting char_sort_indexes_default ... (13/22)
  Starting string_sort_indexes_default ... (14/22)
 reverse + work ORD_SORT did not sort Bitset Random.
 i =                     5
bitset64_dummy(i-1:i) = 0000000000000000000000000000000000000000000000000000111101000101 0000000000000000000000000000000000000000000000000000111101000111
  Starting bitset_large_sort_indexes_default ... (15/22)
 reverse + work ORD_SORT did not sort Bitset Decrease.
 i =                  2048
 SORT did not sort Bitset Decrease.
 i =                     1
bitsetl_dummy(i-1:i) = 00000000000000000000111111111111 00000000000000000000111111111110
bitset64_dummy(i-1:i) = 0000000000000000000000000000000000000000000000000000101000010001 0000000000000000000000000000000000000000000000000000000001100001
  Starting bitset_64_sort_indexes_default ... (16/22)
 RADIX_SORT did not sort Blocks.
 i =                    31
  Starting int_sort_indexes_low ... (17/22)
dummy(i-1:i)     83      0
  Starting char_sort_indexes_low ... (18/22)
 reverse ORD_SORT did not sort Bitset Random.
 i =                   537
bitset64_dummy(i-1:i) = 0000000000000000000000000000000000000000000000000000110111100111 0000000000000000000000000000000000000000000000000000110111100110
 reverse + work ORD_SORT did not sort Char. Decrease.
 i =                     1
       ... bitset_64_ord_sorts [FAILED]
  Message: Condition not fullfilled
char_dummy(i-1:i) =  pppp pppo
  Starting bitset_large_sort_indexes_low ... (20/22)
 SORT_INDEX did not sort Bitset Decrease.
 i =                     3
  Starting bitset_64_sort_indexes_low ... (21/22)
 reverse SORT did not sort Bitset Decrease.
 i =                     1
  Starting int_ord_sorts ... (22/22)
bitset64_dummy(i-1:i) = 0000000000000000000000000000000000000000000000000000101000110100 0000000000000000000000000000000000000000000000000000011011001111
bitsetl_dummy(i-1:i) = 0000000000000000000000000000000000000000000000000000111111111111 0000000000000000000000000000000000000000000000000000111111111110
       ... bitset_64_sort_indexes_default [FAILED]
  Message: Condition not fullfilled
  Starting string_sort_indexes_low ... (19/22)
       ... bitset_64_sorts [FAILED]
  Message: Condition not fullfilled
 ORD_SORT did not sort Blocks.
 i =                  2436
dummy(i-1:i)  46123  46124
 SORT_INDEX did not sort Blocks.
 i =                   256
a(index_low(i-1:i)   3805   3804
 SORT_INDEX did not sort Char. Decrease.
 i =                  4806
char_dummy(i-1:i) onkg dilo
       ... char_sort_indexes_default [FAILED]
  Message: Condition not fullfilled
 reverse RADIX_SORT did not sort Blocks.
 i =                   427
dummy(i-1:i)  65109  65108
       ... int_radix_sorts [FAILED]
  Message: Condition not fullfilled
       ... bitset_64_sort_indexes_low [PASSED]
 SORT_INDEX did not sort Blocks.
 i =                     2
a(index_default(i-  65534  65533
 reverse + work ORD_SORT did not sort Char. Decrease.
 i =                     1
char_dummy(i-1:i) =  afnn cfpi
       ... char_ord_sorts [FAILED]
  Message: Condition not fullfilled
 SORT did not sort Blocks.
 i =                  8437
dummy(i-1:i)   4709   4708
 SORT_INDEX did not sort String Decrease.
 i =                   229
string_dummy(i-1:
 SORT_INDEX did not sort Char. Decrease.
 i =                  7427
       ... string_sort_indexes_default [FAILED]
  Message: Condition not fullfilled
char_dummy(i-1:i) enme enif
       ... char_sort_indexes_low [FAILED]
  Message: Condition not fullfilled
 reverse + work ORD_SORT did not sort Blocks.
 i =                  2889
dummy(i-1:i)   8638  35054
       ... real_radix_sorts [PASSED]
 reverse + work ORD_SORT did not sort String Decrease.

    Start 60: filesystem
2/3 Test #60: filesystem .......................***Failed    0.65 sec
# Testing: filesystem
  Starting fs_is_directory_dir ... (1/2)
  Starting fs_is_directory_file ... (2/2)
       ... fs_is_directory_file [FAILED]
  Message: Cannot delete test file: File cannot be deleted
       ... fs_is_directory_dir [PASSED]
1 test(s) failed!
ERROR STOP

Error termination. Backtrace:
#0  0xd7e05dac in ???
#1  0xd7d819d1 in ???
#2  0xd7c7ed5b in ???
#3  0x14962cb5 in ???
#4  0x14962d01 in ???
#5  0x14961318 in __tmainCRTStartup
        at D:/M/B/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:259
#6  0x14961425 in mainCRTStartup
        at D:/M/B/src/mingw-w64/mingw-w64-crt/crt/crtexe.c:179
#7  0xadde259c in ???
#8  0xaf12af37 in ???
#9  0xffffffff in ???

    Start 63: subprocess
3/3 Test #63: subprocess .......................   Passed    1.66 sec

33% tests passed, 2 tests failed out of 3

Total Test time (real) =  13.24 sec

The following tests FAILED:
         37 - sorting (SEGFAULT)
         60 - filesystem (Failed)

@perazz
Copy link
Member

perazz commented Apr 11, 2025

Regarding the filesystem tests, it would seem like it may be enough to ensure that the test file name is different from each thread.

@PierUgit
Copy link
Contributor

I don't use openmp much, but I believe every time there is a static (save) variable somewhere, that must be declared THREADPRIVATE, otherwise all threads will write to it, causing unpredictable behavior.

As you are mentioning, the problem can arise only if the saved variable (which can be a module variable, which is saved by design) is written, there's no issue when reading the variable. But as a general rule, given the importance of multithreading in HPC nowadays, the thread-safety status of all stdlib routines should be mentioned: which ones are thread-safe, which ones are not.

@jalvesz
Copy link
Contributor Author

jalvesz commented Apr 15, 2025

Regarding the filesystem tests, it would seem like it may be enough to ensure that the test file name is different from each thread.

I would have said better to make it such that the deletion is executed by a single thread, like adding !$omp single where appropriate, no?

@jalvesz jalvesz added build: cmake Issue with stdlib's CMake build files platform: Windows Build issues specific to the Windows platform compiler: gfortran Specific to GCC Fortran compiler labels Apr 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working build: cmake Issue with stdlib's CMake build files compiler: gfortran Specific to GCC Fortran compiler platform: Windows Build issues specific to the Windows platform
Projects
None yet
4 participants