A. The big picture
B. Getting more information
C. Issues related to the C library
D. Problems, weird behaviors, potential bugs
E. Missing functions, wrong types, etc
F. C++ issues
G.  Debugging LinuxThreads programs
H. Compiling multithreaded code; errno madness
I. X-Windows and other libraries
J. Signals and threads
K. Internals of LinuxThreads
Multi-threaded programming differs from Unix-style multi-processing in that all threads share the same memory space (and a few other system resources, such as file descriptors), instead of running in their own memory space as is the case with Unix processes.
Threads are useful for two reasons. First, they allow a program to exploit multi-processor machines: the threads can run in parallel on several processors, allowing a single program to divide its work between several processors, thus running faster than a single-threaded program, which runs on only one processor at a time. Second, some programs are best expressed as several threads of control that communicate together, rather than as one big monolithic sequential program. Examples include server programs, overlapping asynchronous I/O, and graphical user interfaces.
There are also some online tutorials. Follow the links from the LinuxThreads web page: http://pauillac.inria.fr/~xleroy/linuxthreads.
linux-threads@magenet.com.
You can subscribe to the latter by writing
majordomo@magenet.com.
For Linux-specific questions, use comp.os.linux.development.apps and comp.os.linux.development.kernel. The latter is especially appropriate for questions relative to the interface between the kernel and LinuxThreads.
Very specific LinuxThreads questions, and in particular everything
that looks like a potential bug in LinuxThreads, should be mailed
directly to me (Xavier.Leroy@inria.fr).  Before mailing
me, make sure that your question is not answered in this FAQ.
On the other hand, you probably don't want to read the standard. It's very hard to read, written in standard-ese, and targeted to implementors who already know threads inside-out. A good book on POSIX threads provides the same information in a much more readable form. I can personally recommend Dave Butenhof's book, Programming with POSIX threads (Addison-Wesley). Butenhof was part of the POSIX committee and also designed the Digital Unix implementations of POSIX threads, and it shows.
Another good source of information is the X/Open Group Single Unix specification which is available both on-line and as a book and CD/ROM. That specification includes pretty much all the POSIX standards, including 1003.1c, with some extensions and clarifications.
Unfortunately, many popular Linux distributions (e.g. RedHat 4.2) come with libc 5.3.12 preinstalled -- the one that does not work with LinuxThreads. Fortunately, you can often find pre-packaged binaries of more recent versions of libc for these distributions. In the case of RedHat 4, there is a RPM package for libc-5.4 in the "contrib" area of RedHat FTP sites.
prep.ai.mit.edu and its many, many mirrors around the world.
See http://www.gnu.org/order/ftp.html
for a list of mirrors.
libc_r/dirent.c 
        libc_r/dirent.c:94: structure has no member named `dd_lock'
I haven't actually seen this problem, but several users reported it.
My understanding is that something is wrong in the include files of
your Linux installation (/usr/include/*). Make sure
you're using a supported version of the C library. (See section B).
/usr/include/sched.h: there are several occurrences of
_p that the C compiler does not understand/usr/include/sched.h that comes with libc 5.3.12 is broken.
Replace it with the sched.h file contained in the
LinuxThreads distribution.  But really you should not be using libc
5.3.12 with LinuxThreads! (See question C.1.)
fdopen() on a file
descriptor opened on a pipe.  When I link it with LinuxThreads,
fdopen() always returns NULL!
pthread_create() !
top or ps 
display N+2 processes that are running my program. What do all these
processes correspond to?pthread_create.  That leaves one process
unaccounted for.  That extra process corresponds to the "thread
manager" thread, a thread created internally by LinuxThreads to handle
thread creation and thread termination.  This extra thread is asleep
most of the time.
This is perfectly acceptable behavior with respect to the POSIX
standard: for the default scheduling policy, POSIX makes no guarantees
of fairness, such as "the thread waiting for the mutex for the longest
time always acquires it first".  This allows implementations of
mutexes to remain simple and efficient.  Properly written
multithreaded code avoids that kind of heavy contention on mutexes,
and does not run into fairness problems.  If you need scheduling
guarantees, you should consider using the real-time scheduling
policies SCHED_RR and SCHED_FIFO, which have
precisely defined scheduling behaviors. 
printf() in tight loops, and from the
printout it seems that only one thread is running, the other doesn't
print anything!printf() performs
locking on stdout, and thus your two threads contend very
heavily for the mutex associated with stdout.  But if you
do some real work between two calls to printf(), you'll
see that scheduling becomes much smoother. 
<pthread.h>
and there seems to be a gross error in the pthread_cleanup_push
macro: it opens a block with { but does not close it!
Surely you forgot a } at the end of the macro, right?
pthread_cleanup_pop macro.  The POSIX standard
requires pthread_cleanup_push and
pthread_cleanup_pop to be used in matching pairs, at the
same level of brace nesting.  This allows
pthread_cleanup_push to open a block in order to
stack-allocate some data structure, and
pthread_cleanup_pop to close that block.  It's ugly, but
it's the standard way of implementing cleanup handlers.
pthread_yield() ? How
comes LinuxThreads does not implement it?pthread_yield(),
but then the POSIX guys discovered it was redundant with
sched_yield() and dropped it.  So, just use
sched_yield() instead.
<pthread.h>.
For instance, the second argument to pthread_create()
should be a pthread_attr_t, not a
pthread_attr_t *. Also, didn't you forget to declare 
pthread_attr_default?
thr_blah to
pthread_blah.  This is very annoying.  Why did you change
all the function names?thr_* functions correspond to Solaris
threads, an older thread interface that you'll find only under
Solaris.  The pthread_* functions correspond to POSIX
threads, an international standard available for many, many platforms.
Even Solaris 2.5 and later support the POSIX threads interface.  So,
do yourself a favor and rewrite your code to use POSIX threads: this
way, it will run unchanged under Linux, Solaris, and quite a lot of
other platforms.
thr_suspend() and
thr_resume() functions to do that; why don't you?
Notice that thr_suspend() is inherently dangerous and
prone to race conditions.  For one thing, there is no control on where
the target thread stops: it can very well be stopped in the middle of
a critical section, while holding mutexes.  Also, there is no
guarantee on when the target thread will actually stop.  For these
reasons, you'd be much better off using mutexes and conditions
instead.  The only situations that really require the ability to
suspend a thread are debuggers and some kind of garbage collectors.
If you really must suspend a thread in LinuxThreads, you can send it a
SIGSTOP signal with pthread_kill. Send
SIGCONT for restarting it.
Beware, this is specific to LinuxThreads and entirely non-portable.
Indeed, a truly conforming POSIX threads implementation will stop all
threads when one thread receives the SIGSTOP signal!
One day, LinuxThreads will implement that behavior, and the
non-portable hack with SIGSTOP won't work anymore.
pthread_attr_setstacksize() nor
pthread_attr_setstackaddr().  Why? _POSIX_THREAD_ATTR_STACKSIZE and
_POSIX_THREAD_ATTR_STACKADDR (respectively) before using these
functions.
pthread_attr_setstacksize() lets the programmer specify
the maximum stack size for a thread.  In LinuxThreads, stacks start
small (4k) and grow on demand to a fairly large limit (2M), which
cannot be modified on a per-thread basis for architectural reasons.
Hence there is really no need to specify any stack size yourself: the
system does the right thing all by itself.  Besides, there is no
portable way to estimate the stack requirements of a thread, so
setting the stack size is pretty useless anyway.
pthread_attr_setstackaddr() is even more questionable: it
lets users specify the stack location for a thread.  Again,
LinuxThreads takes care of that for you.  Why you would ever need to
set the stack address escapes me.
PTHREAD_SCOPE_PROCESS value of the "contentionscope"
attribute.  Why? PTHREAD_SCOPE_PROCESS.
_POSIX_THREAD_PROCESS_SHARED
before using this facility.
The goal of this extension is to allow different processes (with
different address spaces) to synchronize through mutexes, conditions
or semaphores allocated in shared memory (either SVR4 shared memory
segments or mmap()ed files).
The reason why this does not work in LinuxThreads is that mutexes, conditions, and semaphores are not self-contained: their waiting queues contain pointers to linked lists of thread descriptors, and these pointers are meaningful only in one address space.
Matt Messier and I spent a significant amount of time trying to design a suitable mechanism for sharing waiting queues between processes. We came up with several solutions that combined two of the following three desirable features, but none that combines all three:
pthread_cond_timedwait
clone()" fails.
Until suitable kernel support is available, you'd better use traditional interprocess communications to synchronize different processes: System V semaphores and message queues, or pipes, or sockets.
pthread_create() !pthread_create().
Recall that pthread_create() is a C function, and it must
be passed a C function as third argument.
If you want to use thread, I can only suggest egcs and glibc. You can find egcs at http://www.cygnus.com/egcs. egcs has libsdtc++, which is MT safe under glibc 2. If you really want to use the libg++, I have a libg++ add-on for egcs.
For running gdb on the main thread, you need to instruct gdb to ignore the signals used by LinuxThreads. Just do:
        handle SIGUSR1 nostop pass noprint
        handle SIGUSR2 nostop pass noprint
attach command of gdb?
/proc to control debugged processes, while
under Linux it uses the traditional ptrace(). The support
for threads is built in the /proc interface, but some
work remains to be done to have it in the ptrace()
interface.  In summary, it should not be impossible to get gdb to work
with LinuxThreads, but it's definitely not trivial.
Regarding the fact that the core file does not correspond to the thread that crashed, the reason is that the kernel will not dump core for a process that shares its memory with other processes, such as the other threads of your program. So, the thread that crashes silently disappears without generating a core file. Then, all other threads of your program die on the same signal that killed the crashing thread. (This is required behavior according to the POSIX standard.) The last one that dies is no longer sharing its memory with anyone else, so the kernel generates a core file for that thread. Unfortunately, that's not the thread you are interested in.
printf() are your best friends.  Try to debug
sequential parts in a single-threaded program first.  Then, put
printf() statements all over the place to get execution traces.
Also, check invariants often with the assert() macro.  In truth,
there is no other effective way (save for a full formal proof of your
program) to track down concurrency bugs.  Debuggers are not really
effective for concurrency problems, because they disrupt program
execution too much.
_REENTRANT defined. What difference does it make?gethostbyname_r() as a reentrant equivalent to
gethostbyname().
_REENTRANT is defined, some
<stdio.h> functions are no longer defined as macros,
e.g. getc() and putc(). In a multithreaded
program, stdio functions require additional locking, which the macros
don't perform, so we must call functions instead.
<errno.h> redefines errno when
_REENTRANT is 
defined, so that errno refers to the thread-specific errno location
rather than the global errno variable.  This is achieved by the
following #define in <errno.h>:
        #define errno (*(__errno_location()))
which causes each reference to errno to call the
__errno_location() function for obtaining the location
where error codes are stored.  libc provides a default definition of
__errno_location() that always returns
&errno (the address of the global errno variable). Thus,
for programs not linked with LinuxThreads, defining
_REENTRANT makes no difference w.r.t. errno processing.
But LinuxThreads redefines __errno_location() to return a
location in the thread descriptor reserved for holding the current
value of errno for the calling thread.  Thus, each thread operates on
a different errno location.
-D_REENTRANT?getc() or
putc(), it will perform I/O without proper interlocking
of the stdio buffers; this can cause lost output, duplicate output, or
just crash other stdio functions.  If the code consults errno, it will
get back the wrong error code.  The following code fragment is a
typical example:
        do {
          r = read(fd, buf, n);
          if (r == -1) {
            if (errno == EINTR)   /* an error we can handle */
              continue;
            else {                /* other errors are fatal */
              perror("read failed");
              exit(100);
            }
          }
        } while (...);
Assume this code is not compiled with -D_REENTRANT, and
linked with LinuxThreads.  At run-time, read() is
interrupted.  Since the C library was compiled with
-D_REENTRANT, read() stores its error code
in the location pointed to by __errno_location(), which
is the thread-local errno variable.  Then, the code above sees that
read() returns -1 and looks up errno.  Since
_REENTRANT is not defined, the reference to errno
accesses the global errno variable, which is most likely 0.  Hence the
code concludes that it cannot handle the error and stops.
SIGUSR1 and SIGUSR2 in my programs! Why? SIGUSR1 and SIGUSR2, LinuxThreads has no
other choice than using them.  I know this is unfortunate, and hope
this problem will be addressed in future Linux kernels, either by
freeing some of the regular signals (unlikely), or by providing more
than 32 signals (as per the POSIX 1003.1b realtime extensions).
In the meantime, you can try to use kernel-reserved signals either in
your program or in LinuxThreads.  For instance,
SIGSTKFLT and SIGUNUSED appear to be
unused in the current Linux kernels for the Intel x86 architecture.
To use these in LinuxThreads, the only file you need to change
is internals.h, more specifically the two lines:
        #define PTHREAD_SIG_RESTART SIGUSR1
        #define PTHREAD_SIG_CANCEL SIGUSR2
Replace them by e.g.
        #define PTHREAD_SIG_RESTART SIGSTKFLT
        #define PTHREAD_SIG_CANCEL SIGUNUSED
Warning: you're doing this at your own risks.
So, you can take the address of an "auto" variable and pass it to other threads via shared data structures. However, you need to make absolutely sure that the function doing this will not return as long as other threads need to access this address. It's the usual mistake of returning the address of an "auto" variable, only made much worse because of concurrency. It's much, much safer to systematically heap-allocate all shared data structures.
-D_REENTRANT.  It happens Xlib contains a
piece of code very much like the one in question H.2.  So, your Xlib fetches the error code from the
wrong errno location and concludes that an error it cannot handle
occurred.
README.Xfree3.3 in the LinuxThreads
distribution for patches and info on how to compile thread-safe X
libraries from the Xfree3.3 distribution.  The Xfree3.3 sources are
readily available in most Linux distributions, e.g. as a source RPM
for RedHat.  Be warned, however, that X Windows is a huge system, and
recompiling even just the libraries takes a lot of time and disk
space.
Another, less involving solution is to call X functions only from the
main thread of your program.  Even if all threads have their own errno
location, the main thread uses the global errno variable for its errno
location.  Thus, code not compiled with -D_REENTRANT
still "sees" the right error values if it executes in the main thread
only. 
-D_REENTRANT to avoid
the errno problems explained in question H.2.
-D_REENTRANT is needed.
-D_REENTRANT.
SIGUSR1 and SIGUSR2.  One of the two should
be recompiled to use different signals.  See question H.4.
sigaction(), it sets how the signal is handled not only
for itself, but for all other threads in the program as well.
On the other hand, signal masks are per-thread: each thread chooses
which signals it blocks independently of others.  At thread creation
time, the newly created thread inherits the signal mask of the thread
calling pthread_create().  But afterwards, the new thread
can modify its signal mask independently of its creator thread.
SIGKILL to a
particular thread using pthread_kill, all my threads are
killed!SIGKILL or SIGINT
when no handler is installed on that signal).  This behavior makes a
lot of sense: when you type "ctrl-C" at the keyboard, or when a thread
crashes on a division by zero or a segmentation fault, you really want
all threads to stop immediately, not just the one that caused the
segmentation violation or that got the SIGINT signal.
(This assumes default behavior for those signals; see question
J.3 if you install handlers for those signals.)
If you're trying to terminate a thread without bringing the whole
process down, use pthread_cancel().
SIGFPE signal), then the handler is executed by that
thread.  This also applies to signals generated by
raise().
If the signal is sent to a particular thread using
pthread_kill(), then that thread executes the handler.
If the signal is sent via kill() or the tty interface
(e.g. by pressing ctrl-C), then the POSIX specs say that the handler
is executed by any thread in the process that does not currently block
the signal.  In other terms, POSIX considers that the signal is sent
to the process (the collection of all threads) as a whole, and any
thread that is not blocking this signal can then handle it.
The latter case is where LinuxThreads departs from the POSIX specs. In LinuxThreads, there is no real notion of ``the process as a whole'': in the kernel, each thread is really a distinct process with a distinct PID, and signals sent to the PID of a thread can only be handled by that thread. As long as no thread is blocking the signal, the behavior conforms to the standard: one (unspecified) thread of the program handles the signal. But if the thread to which PID the signal is sent blocks the signal, and some other thread does not block the signal, then LinuxThreads will simply queue in that thread and execute the handler only when that thread unblocks the signal, instead of executing the handler immediately in the other thread that does not block the signal.
This is to be viewed as a LinuxThreads bug, but I currently don't see any way to implement the POSIX behavior without kernel support.
pthread_* functions are not async-signal safe, meaning
that you should not call them from signal handlers.  This
recommendation is not to be taken lightly: your program can deadlock
if you call a pthread_* function from a signal handler!
The only sensible things you can do from a signal handler is set a
global flag, or call sem_post on a semaphore, to record
the delivery of the signal.  The remainder of the program can then
either poll the global flag, or use sem_wait() and
sem_trywait() on the semaphore.
Another option is to do nothing in the signal handler, and dedicate
one thread (preferably the initial thread) to wait synchronously for
signals, using sigwait(), and send messages to the other
threads accordingly.
sigwait(), other threads no longer receive the signals
sigwait() is waiting for!  What happens? sigwait().  Basically, it installs signal handlers on all
signals waited for, in order to record which signal was received.
Since signal handlers are shared with the other threads, this
temporarily deactivates any signal handlers you might have previously
installed on these signals.
Though surprising, this behavior actually seems to conform to the
POSIX standard.  According to POSIX, sigwait() is
guaranteed to work as expected only if all other threads in the
program block the signals waited for (otherwise, the signals could be
delivered to other threads than the one doing sigwait(),
which would make sigwait() useless).  In this particular
case, the problem described in this question does not appear.
One day, sigwait() will be implemented in the kernel,
along with others POSIX 1003.1b extensions, and sigwait()
will have a more natural behavior (as well as better performances).
clone() system call, which is a generalization of
fork() allowing the new process to share the memory
space, file descriptors, and signal handlers of the parent.Advantages of the "one-to-one" model include:
The "many-to-many" model combines both kernel-level and user-level scheduling: several kernel-level threads run concurrently, each executing a user-level scheduler that selects between user threads. Most commercial Unix systems (Solaris, Digital Unix, IRIX) implement POSIX threads this way. This model combines the advantages of both the "many-to-one" and the "one-to-one" model, and is attractive because it avoids the worst-case behaviors of both models -- especially on kernels where context switches are expensive, such as Digital Unix. Unfortunately, it is pretty complex to implement, and requires kernel support which Linux does not provide. Linus Torvalds and other Linux kernel developers have always been pushing the "one-to-one" model in the name of overall simplicity, and are doing a pretty good job of making kernel-level context switches between threads efficient. LinuxThreads is just following the general direction they set.