Measuring Stack Usage in Multi-threaded uClinux Apps

In regular Linux the MMU allows the stack to grow dynamically, the MMU just allocates more physical pages. However in uClinux, the correct amount of stack for each thread must be allocated before the thread is created.

Too little stack and your program will corrupt the system in nasty, unpredictable ways. Thread stack gets malloced from the system heap, so an overflow means writes to an arbitrary address just outside the block of memory allocated to the thread. This memory could possibly be in use by other parts of the system, perhaps for a different application. These sorts of bugs can be very difficult to track down.

If you allocate too much stack, then you are wasting memory, a valuable resource on embedded systems. For example I discovered I was allocating far too much stack and wasting Mbytes of memory, especially when multiple threads were running.

The standard approach is to try random values of stack until you find one that works. However I thought it might be a better idea to actually measure the amount of stack used by each thread. Then I could tweak the stack allocation to optimise memory usage and even check for stack overflows at run time.

Threadstack Library

I have written a small library (called threadstack):
unsigned int threadstack_free(pthread_t *thread);
unsigned int threadstack_used(pthread_t *thread);

Here is a sample run on my Blackfin BF537 STAMP, when a 100k stack was allocated to a thread:
root:/var/tmp> ./test_threadstack
stack used: 1300
stack free: 100620
root:/var/tmp>

How it Works

The functions work by examining memory allocated to the stack. The theory is that if the memory is non-zero, then it must have been used by the thread at some time (the entire block of memory used for the stack is initially set to 0 before the thread starts). So the routines search the stack memory for the first non-zero value, and that is declared the “high water mark” – the point where the stack reached it’s maximum.
0xff <- stack top
0xff
0xfe <- high water mark
0x00
0x00
0x00 <- stack bottom

The high water mark will change over time, so after your thread has been running for a while is the best time to measure stack usage.

One weakness with this approach is that if stack allocation is way too low your program may bomb before these routines get a chance to run. However in that case you will at least know there is a problem, and can increase stack to some high number (e.g. Mbytes) to get the program running, before using these functions to determine actual stack requirements.

Usage

In my uClinux Asterisk port I have added code to check for stack overflow just before a thread ends:
pthread_t thread = pthread_self();
assert(threadstack_free(&thread) > 10*1024);

This code checks that while the thread was running, the minimum free stack was 10k. The assert will kill the program with an error message and tell me straight away I need more stack. Much nicer than getting an obscure bug in the system due to a stack overflow on a thread. Now the program finds stack overflow bugs for me!

This example above runs from within the actual thread itself, hence the call to pthread_self() to discover the threads handle. You can also call the functions from another thread (e.g. the main thread), for example to periodically meter stack usage.

Links

More information on multi-threaded applications for uClinux
movies hardcoremovies ebony girls buttclips sybian moviemovies homemade sexmovie stars nudesamples movie hot xxxmovies free handjobmovies anal free Mapstar porn blonde galleryporn gallereis thumnail blondeporn videos blondepornstar blonde pussyporno teen blondepornstar blonde videoporn blondesblondes pornstars Mapporn cliphunterclipmaster pornfree clips pornclips anal porn ofof clips porn girlclips porngratuit porno clipsclipsporn Mapringtones boltblueboombastic ringtone freebreakfast club ringtonesringtone brewerspride brown ringtonescoupon ringtone cingluarcingular phones ringtones recordsurvive circa ringtones Map

Porting multi-threaded apps to uClinux

I have recently been working on improving the stability of uCasterisk, a port of Asterisk to uClinux. This required some research into memory management for multi-threaded apps on uClinux. I didn’t find any one resource that had everything I needed to know so I thought I would collate some of the information I found here as a resource for others. Thanks to all those (especially on the Blackfin forums) who helped answer my questions.

I am using the Blackfin flavour of uClinux and the uCasterisk application as an example, but this information should apply equally to other uClinux systems/applications.

MMU versus no-MMU

Asterisk is a pretty big application for uClinux, the executable is about 2.5M and when running several calls can consume 32M of system memory. The big difference between uCasterisk and other Asterisk implementations is the lack of MMU. A MMU is handy when working with large, multi-threaded apps. For example when a thread is kicked off you can allocate a virtual stack of say 2M, but physical memory will only be allocated as and when it is actually required (say due to a write to a previously unused part of the stack). If your thread never uses all of the stack, then the physical memory is available for other users.

On a MMU-less system you need to work out the maximum stack your thread may need, and allocate that. If you get it wrong, your application (and possibly the whole system) will bomb. This generally means you are wasting memory compared to the MMU case, as you always need to allocate the worst case amount of memory required.

One possible advantage of MMU-less systems is no nasty surprises – any memory allocated really does exist, and no over-commitment is possible. On a MMU-based system physical memory isn’t actually allocated until you write to it, and it may be paged to disk just when you need it (although I understand there are options to control this behaviour).

Stacks for Threads

When you start an app, you get allocated a stack for the application. This is actually a stack for the main thread of the application. When you start a new thread (say with pthread_create()) the thread gets allocated a new stack from the system heap. The two stacks are completely unrelated. The size of each stack is independent, you control the size in different ways (see below).

Tips for Porting to uClinux

Don’t enable stack checking. This feature is very useful for single-threaded apps; it causes the operating system to kill the app when it uses all of it’s stack space. Very useful, as it tells you straight away to increase the stack size. Unfortunately at present this feature hasn’t been extended to multi-thread applications; using it with multi-threaded apps (at least on the Blackfin) causes problems as pointed out in the 2005R4 RC2 release notes and discussed here.

You control the application (main thread) stack with the -s option, on my Blackfin system the command line is:

bfin-uclinux-gcc -Wl,-elf2flt='-s 1000000' \
-o thread thread.c -pthread

In this example the stack is set to 1000000 bytes.

You control the size of the stack for each thread you create using pthread_attr_setstacksize(), for example (from the Asterisk utils.c file):

pthread_attr_init(&attr);
pthread_attr_setstacksize(&attr, 0x1000000);
pthread_create(&thread, &attr, thread_func, NULL);

Monitoring Memory Usage

cat /proc/meminfo can be very useful, here is the output from my Blackfin STAMP BF533 board, taken while uCasterisk was running with several SIP calls in progress:

root:/var/log/asterisk> cat /proc/meminfo
MemTotal: 59784 kB
MemFree: 11084 kB
Buffers: 100 kB
Cached: 4172 kB
SwapCached: 0 kB
Active: 3828 kB
Inactive: 444 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 59784 kB
LowFree: 11084 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 4 kB
Writeback: 0 kB
Mapped: 0 kB
Slab: 43744 kB
CommitLimit: 29892 kB
Committed_AS: 0 kB
PageTables: 0 kB
VmallocTotal: 0 kB
VmallocUsed: 0 kB
VmallocChunk: 0 kB

The most important fields are MemFree (total system memory free) and Slab (system wide heap in use).

In earlier versions of Linux the CommitLimit field indicated the maximum Slab was allowed to reach before processes were killed (with Out-Of-Memory errors). However on my distro I discovered by experiment that you can actually increase the Slab well beyond this limit, as indicated above. Looking at the kernel source file uClinux-dist/linux-2.6.x/mm/nommu.c, __vm_enough_memory() function it appears that the memory allocator uses the OVERCOMMIT_GUESS method, which ignores the CommitLimit and allows up to 97% of memory to be allocated.

It is interesting to observe MemFree as you perform different operations. For example on uCasterisk when a new SIP call starts, a thread is created, which requires stack and heap space. I also noticed MemFree decreasing when I copied files on a ram file system – this caught me for a while as uCasterisk was chewing through available system memory writing Call Data Records to the ram disk and eventually causing Out of Memory errors.

ps an top are also useful, as they indicate the amount of memory allocated to the system/application.

Links

CommitLimit and OOM Killer
Why malloc is different under uClinux
Application Debugging on the Blackfin
Intro to Linux Apps on the Blackfin (skip to bottom of page)
Blackfin forum thread where I asked some questions on this topic

Summary

I hope this was useful – pls email me or add a comment below if you have any comments/suggestions/corrections.
payday loan 6 8 australiapayday loan 8 day pay loanprocessor account mortgage loan manager processorachieve and loans studentadult loan site personals personalscredit loan secured unsecured adverse onlineloan direct student aid federal moneydirect aid financial loan student Mapringtones lg freelg ringtones howtoringtones lg4650wow basketball ringtones bow lilof ringtones funny listlocomotive ringtonesgood long friday ringtoneslow ringtones rider Mapstarfire hentaisexy secretarieshentai shemaleproposal xxxporn trailerswrestling nudepussy catanime pussy Map

The YouBox – Hardware for YouOS

I have been following an idea I originally discovered in a Paul Graham essay about the advantages of placing applications on the web server rather than the desktop PC. Hot mail and Gmail are good examples. The application and the data for that application (your email) are stored on a server.

YouOS is an interesting development along this path – it is an entire operating system that runs on a server, complete with an IDE for application development and some really powerful collaboration models. One very powerful feature is the ability to move from one computer to another, fire up YouOS on the browser, and there are all your applications and data – just as you left them.

Looking into the future a little there will come a day where ALL your applications, and ALL your data can be stored on a server.

This transition seems to be already happening in some parts of the world. This point from the Ajax web site really interested me:

  • Web as the only Platform Thanks to the widespread adoption of public internet access, the so-called technology gap between countries and between socioeconomic groups is closing. Many people don’t actually own a PC, but do have regular access to the web at internet cafes or schools or friends’ homes. For this diverse category of user, there’s no point installing applications and keeping their data locally. The web is their only platform.

So we have a large number of people who use the web as a platform for economic reasons. YouOS will increase the power of the web platform. What would also help is lowering the cost of web access.

With YouOS the only application you need to run on your PC is a browser. Which suggests to me that you don’t need a PC anymore, just some bare-bones hardware with internet connectivity capable of running a browser.

One thought I have had is adding a monochrome LCD display and keyboard to an embedded linux platform (like the hardware in a WRT54G router). These little routers retail for $60, so must cost about $20 to build. Add a keyboard and LCD and you still have a device that still costs less than $50 to build. Then you would have a small, Wifi connected computer with plenty of CPU power/memory to run a browser, basic command line tools etc.

Combined with ubiquitous connectivity we have a YouOS-Access-Device (YAD? or maybe “the YouBox”) and can replace the desktop in many applications. In a laptop form factor the YouBox could be really light and thin and almost disposable. It would be very portable and lightweight and would use much less power than a regular laptop. It’s more like a larger version of a Palm. For a few extra dollars you could add sound-blaster type audio and the device is also a telephone.

I know there is a sub-$100 laptop project out there, the YouBox is another approach that uses the web as a platfrom paradigm to optimise the hardware. One advantage of the YouBox is that it can be put together in small quantities using off the shelf components.

The YouBox concept still has many questions – for example the need for an internet backbone to connect the YouBox to a server and also the YouOS servers. However these may be a little easier to solve, for example if one backbone/server can handle X YouBox clients, you amortise the backbone/server cost by X.

free movies asswatcherbehind door green movie thesex movies bimovie tits bigroderick brande moviecelebrity archive browse moviemovies children pornfootjob movieslesbian free asian moviescheerleader sex of hardcore movies freecredit and $1000 loan bad1 unsecured loan hourvalue 125 loan 2nd mortgageno credit 20,000 loans check4.5 rate interest loansloan servicing access grouploan accredited problems homecar alabama loans Map