-
Notifications
You must be signed in to change notification settings - Fork 65
Remove GLib dependency within bin/ #184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
This commit is the first in a series of commits to remove gLib. This commit moves an unused thread pool library within traceAnalyzer to the utils directory, which may find its use elsewhere.
This reverts commit 1155efd.
This commit is the first in series (ignoring reverted) to remove gLib. This commit introduces [hashmap.h](https://github.com/sheredom/hashmap.h) in an effort to remove GHashTable dependencies from the project. Future commits shall introduce similar header-only libs as well.
just FYI, it might be better to start with a small PR :) |
Thanks for the suggestion. So let's begin with something like "remove GLib dependency from main executable" first? Since gLib usage within the algo part is more extensive and complicated, maybe I shall get rid of GLib in BTW the quality gate check is complaining about duplicated code on |
Move from GThreadPool to a homebrewed threadpool adapted from https://nachtimwald.com/2019/04/12/thread-pool-in-c Note that this code **only** runs with UBSan enabled. Not for production use (yet).
Sounds good! |
Finalize move from GThreadPool to a homebrewed threadpool adapted from https://nachtimwald.com/2019/04/12/thread-pool-in-c Note, that simulation results differ from original. Reason for this is under investigation.
@1a1a11a Hi! I would say that this PR is near finished. In the meantime, I'm observing an at-most 1% difference in simulation results for some algorithms (e.g. FIFO) with Right now sanitizers are not throwing any useful error, and vvverbose logs look good as well. Since PS: that |
loop in @JasonGuo98 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR is focused on removing GLib dependencies in favor of using native pthreads and custom data structures. Key changes include replacing GLib’s GMutex, GThreadPool, and GHashTable with pthread-based synchronization primitives and custom thread pool/hashmap implementations, as well as updating include paths and launch configurations.
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
libCacheSim/traceReader/reader.c | Replace non-thread safe boolean flag with a volatile qualifier |
libCacheSim/profiler/threadpool.c | Introduce custom threadpool implementation adapted from external source |
libCacheSim/profiler/simulator.c | Replace GThreadPool and GMutex with pthread equivalents and update index conversion |
libCacheSim/profiler/profilerLRU.c | Switch from g_hash_table to custom hashmap functions |
libCacheSim/profiler/dist.c | Replace GLib hash table calls with custom hashmap functions |
libCacheSim/include/libCacheSim/threadpool.h | New header for threadpool API |
libCacheSim/include/libCacheSim/hashmap_defs.in | New header for hashmap definitions |
.vscode/launch.json | Updated executable path and arguments |
sim_mt_params_t *params = (sim_mt_params_t *)user_data; | ||
int idx = GPOINTER_TO_UINT(data) - 1; | ||
// int idx = GPOINTER_TO_UINT(data) - 1; | ||
int idx = ((unsigned int)(unsigned long)(data)) - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conversion from a pointer to an unsigned integer for index calculation may be error-prone on platforms where pointer sizes differ. Consider using a safer, more portable approach for this conversion to prevent potential issues.
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either we pass pointer to int
, or we use size_t
in place of unsigned long
.
If we don't care about portability to 32-bit platforms or non amd64/aarch64, we just ignore this.
@1a1a11a What do you think?
Did a quick test, it seems that the thread pool library that I'm quoting is problematic. I switched to a battle-tested thread pool library, and (from preliminary test results) the problem is gone :) Gotta do some git rebase afterwards...
Upd: seems more like a problem with simulation steps, rather than reader problem. Introduced with the latest commit. Overlooked the problem since TSan keeps reporting the other way. |
Add these two lines to respective functions in INFO("actual progress %d\n", progress);
INFO("actual num_of_caches %d\n", num_of_caches); Before commit af3ff57:
After commit af3ff57:
The logic of the original code says: while (progress < num_of_caches - 1) {
print_progress((double)progress / (double)(num_of_caches - 1) * 100);
}
Update: it is designed for purpose, my bad. Pushing a fix. Anyways, marking this PR as ready for review. |
The previous commit af3ff57 introduced pthread_cond_t to synchronize progress. However, in some circumstances, the simulate_* functions may exit prematurely due to incorrect conditional wait. This commit fix the abovementioned problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some documentation on how to use the data structures in the repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually hashmap.h
is well documented. I shall then add code documentation and brief README for threadpool.h
@@ -267,11 +274,12 @@ void cal_working_set_size(reader_t *reader, int64_t *wss_obj, | |||
continue; | |||
} | |||
|
|||
if (g_hash_table_contains(obj_table, (gconstpointer)req->obj_id)) { | |||
if (hashmap_get(&new_obj_table, (const void *)(req->obj_id), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just use obj_id as the key?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is done as so. There is a custom hasher for obj_id
(that I copied from gLib implementation.) Instead of treating everything as pointers as gLib does, this custom hasher takes the value of obj_id
. Passing obj_id
as pointer is just a requirement by function signature (since there may be a need to use other stuff than uint64 as key)
libCacheSim/profiler/threadpool.c
Outdated
|
||
#include "../include/libCacheSim/threadpool.h" | ||
|
||
#include <pthread.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move threadpool to utils or dataStructure and add a quick README
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed
Yes, you can add a separate section for data structures in doc |
Just noted that there seems to be a deadlock problem that I didn't notice while pushing. Trying to fix |
In Release builds, access to params->n_caches is optimized to a stack access. In Debug builds however, access is by pointer. Failure to initialize this leads to infinite loop in _simulate, thus deadlock in simulate_* functions.
|
See #133.