Explanation of what's going on:
Threads are controlled by the struct thr_ctrl* threads array. It contains two MUTEXes and two Conditions, setup,done and cond_set,cond_done.
The setup MUTEX always stays locked from the worker thread, but the thread implicitly frees it with cond_wait (cond_set, setup). The setup MUTEX serves as access control to the variables in the thr_ctrl struct.
The done MUTEX always stays locked from the main thread, but it is also impl. freed by cond_wait (cond_done, done)
If you want to use multithreading, you have to do the following:
- Add init_threads(0) to begin and free_threads () to the end of the program
- Create routines that work on parts of the job, only
- Create a routine wrapper which reads the params from the thr_ctrl struct and calls the routine then
- Make the caller divide the job into several parts, call thread_start ()
- Optimization (optional): Make the num of threads dependent on size of object to avoid thread overhead for small objects.
- Maximum number of creatable threads is threads_avail () ! This func will return 0, if there are already threads working to avoid thread recursion and the following deadlock
- Make the caller wait for the job with thread_wait () Alternatively you can use thread_wait_useful () and pass a pointer to a useful_job_t job to do something useful instead of rescheduling while waiting. (Not recommended.)
- Optimization (optional): Make the caller do the last part instead of a thread
- Compile with SMP defined (-DSMP)
- Link with smp.o and -lpthread.
That's it.
See matrix.h: TVector<T> Matrix<T>::operator * (const Vector<T>& v) const for an example.
Debugging:
- If compiling this with THREAD_STAT, you will see a summary of the CPU time the threads used.
- If compiling with THREAD_DEBUG, you will get some debugging messages about CPU detection and thread setup.
- If compiling with DEBUG_THREAD, you will be swamped with thread synchronization debugging messages.