Building an Embedded Asterisk PBX Part 1

Over the last few days I have been bringing some telephony hardware to life. I have finally obtained all the parts I need and am assembling, testing, and blogging as I go!

This work is part of a project to develop and build “open” IP-PBX hardware. Now by building I mean really building. Like designing the circuits and Printed Circuit Boards (PCBs) and then hand-loading the PCBs with a soldering iron. The PBX is an embedded Asterisk design running on a Blackfin STAMP platform. This work is part of the Free Telephony Project.

Now the first priority is to start with a clean, tidy, professional work area:

Mmmmmmmmm. Oh well, I will tidy it up one day.

Lets start with the 4fx board. The 4fx board interfaces the Blackfin STAMP to the FXS & FXO modules. It has a little Xilinx XC9536 programmable logic chip (or CPLD). I had previously designed and simulated the Verilog code for this CPLD so all I had to do was program the chip using a JTAG cable that connects between my PC and a header on the 4fx card:

It actually took me a few hours of head scratching to get the chip to program. The problem was I has accidentally selected the wrong chip type when I synthesised the CPLD code. DOH! Anyway once I worked that out it programmed straight away which was a relief – you never know with a new design if you have messed up something fundamental like connecting power in reverse. So the first “sign of life” you get from a new board is always a big relief.

I then poked around with the scope while running a unit test program on the Blackfin that put the CPLD through a few tests. Just like software, it is very important to make sure the components of a hardware design a working before integrating the components into a larger design. The typical trap is that we get excited and try to move forward too fast, for example testing several new and unknown parts of the design all at once. Simple errors compound to tough bugs when combined with other untested hardware and software.

So I always try to test thoroughly at the earliest possible stage. In fact I often organise my designs so they can be broken apart into little chunks and tested, rather than thinking about testing as an after thought. In the case of the CPLD I ran many simulations using the Icarus Verilog simulation tools before even going near the hardware. Experience (OK plenty of screw-ups) has taught me that it takes much less effort to test carefully earlier than to debug later.

Anyway, back to the story. On the CPLD I messed up one pin’s position in the pin-locking file (easily fixed by recompiling the CPLD image), but apart from that the CPLD appears to be working fine. All the chip select signals are being generated in response to the commands from my test software.

OK, the next step is to see if I can make some LEDs on the board light under software control. The LEDs are connected to the CPLD and will be used to show the status of each telephony port. So if we can make the LEDs do their thing this will prove another chunk of the CPLD code is OK.

I modified the unit test program to write to the register that controls the LEDS:
bfsi_spi_init(baud, (1<<NCS_A) | (1<<NCS_B));

for(i=0; i<tests; i ) {
bfsi_spi_write_8_bits(NCS_B, select);
bfsi_spi_write_8_bits(NCS_A, data);

In the for loop, the first write sets the “destination” of the data (which SPI device we wish to write to). The second write sets the actual value. The way the LED is wired up if we write a 01 (binary) we should get the LED to glow red, and 10 (binary) to make it glow green. The for loop makes it repeat many times, just so I can see what is going on with my ancient analog scope. Only one write is actually neeeded.

I peer at the LED. It stares back, blank and just daring me to try:

I hit the magic command line:
root:~> insmod tspi_4fx.ko data=0x1

Hey – it worked! Thats not meant to happen! Not first time! WHOO-HOO! OK, lets try making it green:
root:~> insmod tspi_4fx.ko data=0x2


It is hard to explain feeling of achievement you can get from just making a LED light. You never really understand how much complex technology is between the vision and reality of making a simple LED come on – until you start to build chunks of that technology, solder the LED yourself, write the driver etc. Then you realise, and a simple LED turning on when you tell it to seems like an unlikely miracle! Anyone who has ever worked on making computers talk to hardware will understand what I mean.

Especially if you have had your share of times when that LED wouldn’t turn on. For like days or weeks.

OK so the next step was to test the FXO and FXS modules. Here they are all soldered and ready to smoke up, errr I mean test. The large, ugly resistors hanging off them are because I couldn’t easily source some very high (15M) and very low (0.5 ohm) resistors I needed in 0603/0805 packages. Can anyone send me a few please?

First I wanted to test the FXO module. I connected it directly to the Blackfin STAMP card, rather than using the 4fx card just yet. Golden rule – always test the minimum possible:

I already had some Asterisk software for the Blackfin running and tested (using other hardware). That meant I had tested and working software to test the unknown hardware. So it was just a matter of firing that up and seeing if it detected the card:
Welcome to:
____ _ _
/ __| ||_| _ _
_ _| | | | _ ____ _ _ \ \/ /
| | | | | | || | _ \| | | | \ /
| |_| | |__| || | | | | |_| | / \
| ___\____|_||_|_| |_|\____|/_/\_\

For further information see:

BusyBox v1.00 (2006.08.25-23:13 0000) Built-in shell (msh)
Enter 'help' for a list of built-in commands.

root:~> eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
Zapata Telephony Interface Registered on major 196
Registered Span 1 ('WCTDM/0') with 1 channels
Span ('WCTDM/0') is new master
iRxBuffer1 = 0xff800000
iTxBuffer1 = 0xff800080
ISR installed OK
Testing for ProSLIC
ProSLIC not loaded...
Testing for DAA...
VoiceDAA System: 04
ISO-Cap is now up, line side: 03 rev 06
Module 0: Installed -- AUTO FXO (FCC mode)
Found: Blackfin STAMP (1 modules)
Registered tone zone 0 (United States / North America)
4294895942 Polarity reversed (0 -> 1)

root:~> /var/tmp/asterisk -vc

Thats a pretty good result – the FXO port was detected OK. So then I started Asterisk and put a few calls through it. I placed a call into the PBX (using another Asterisk PBX running on an x86 box) and it detected the ring signal and went off hook OK:
*CLI> RING on 1/1!
NO RING on 1/1!
RING on 1/1!
NO RING on 1/1!
Jan 1 02:51:59 NOTICE[96]: chan_zap.c:5406 ss_thread: Got event 2 (Ring/Answer)

However the audio had lots of sharp clicks and pops. Crack-Crack-Crack every few seconds. Damn.

I spent half a day chasing this bug. I puzzled me a bit as I knew the circuit was straight out of the Silicon Labs data sheet and that I (and a few others) had carefully checked it. So I figured it must have been an assembly error like a wrong component or bad solder joint. Actually I wasn’t quite that logical: in the real world bugs tend to get your emotions involved. You really want it to work so you get a little stressed and start doing and thinking stupid things. So you end up checking a bunch of things you don’t need to (like the schematic five times) and perhaps missing some other more sensible checks – you don’t always think straight when your emotions are in play. Such is the psychology of bug hunts.

I started checking signals on the header and had trouble getting a good contact with my scope probe. I looked at the pin and there was some flux residue stuck to it. So I gave that part of the board a scrub with a fine brush and some solvent and then fired it up again to check that signal. Huh – now the audio is OK – clicks gone! WTF? I am still now sure what happened here – perhaps the brush dislodged a small short or the flux was conducting a little.

So anyway the FXO module (fxomod) seems to work OK now.

I then tried the FXS module and it worked on the first try. I was really happy about that – I was placing calls over it 5 minutes after the first time I applied power. Hardware development isn’t meant to work like that! Anyway I guess I will get my fair share of bugs later (it’s the conservation of bugs law), there is still plenty of development to go.

My next step is to integrate the FXS and FXO modules with the 4fx board. More on that in a later post.

This is what the whole thing looks like when put together with the STAMP, 4fx, and (for now) a single FXO module:

The idea is that you can stack more 4fx boards to get multiples of 4 ports. You could also stack other cards, for example BRI-ISDN, E1/T1, or cards that give you additional DSP horsepower.

You might have also noticed the SD-card. The driver for that was developed by Hans Eklund and the team at Rubico. They have done a fantastic job. I compiled the latest uClinux version with SD/MMC card support and it worked perfectly first time. It is really cool to read and write files to a SD card on the Blackfin, then transfer the card to a PC and find the files all there and readable. Such a simple hardware interface too (just a few wires).

Geekiness is contagious. Just last week I convinced my wife Rosemary to help me with some board stuffing. I started here off on a simple thru-hole kit to teach here soldering. A few days later here she is soldering tiny 0603 resistors and doing a fine job:

Thats all for now. I’ll might blog some more later as I work through the steps to bring up the rest of the board.


  1. Building an Embedded Asterisk PBX Part 2
  2. Building an Embedded Asterisk PBX Part 3
  3. More information on the Free Telephony Project here.
  4. Blackfin MMC/SD card how-to.
  5. More information on the design I am building here.
  6. Here is the (current) 4fx schematic in PDF form. It will probably change as the bugs are found and fixed.
  7. You can download source files for the schematics, PCB design, CPLD code here. Grab the latest hardware-x.y.tar.gz file. In the cpld directory there is a README that explains the CPLD code as well as “test benches” – Verilog code that tests other Verilog code.

mortgage calculator rate loan 2ndremortgage http home advice uk loanloan abacuspayday loans 30 dayhour faxing loans no 1loan physician bank america$3000 loan credit with badloan education acsa bad loan with personal creditloan 1003 applicationdaphne pornporn daphnyporn vain star darienangel bio dark pornbbs porn darkporn collection dark bbsporn dark girlporn dark portal Map

Open Source Hardware

An important part of the Free Telephony Project is the idea of “open hardware”. The hardware designs that are being developed are being placed in the public domain under the GPL.

As a business model, it’s a bit of an experiment.

The technology being developed has very strong business possibilities – for example the ability to build a powerful IP-PBX for a couple of hundred dollars, much less than current IP-PBXes and even less than a low end analog PBX.

I have had to fend off several corporate dudes who wanted me to join them in ventures to make lots of money. It’s hard to turn them down but as I was not interested in “closing” the hardware they ran away fast. This makes sense if you want to build a large business, you need “secret sauce”. In other words: Intellectual Property (IP).

What I would really like is some sponsors for my work who can work with the open hardware concept, but so far they all want to lock up the IP. Which of course is the right thing to do if you want to make lots of money.

This has made me think through the concept of open hardware:

  1. I think open software has been a good thing for the world, so I think open hardware is also good.
  2. If closed IP makes a small amount of people a lot of money – does opening the IP make a moderate amount of money for a large amount of people? The latter seems a better outcome to me. It also suggests that open hardware benefits small companies more than large ones.
  3. I think the specific benefit of open hardware is lower R&D costs. This is what is happening with my project – there is a small team of people designing DSP boards, BRI-ISDN hardware, doing Asterisk ports etc. So far I would estimate about 5 man-years of hardware R&D I now have available for free. If I like I can now re-use this open hardware in my local market, potentially without hurting the business of my co-developers. There is a spirit of cooperation rather than competition.
  4. A common perception is that “if the hardware design is open, people will just copy it and put you out of business”. Well after some thought I disagree. A business is much more than just the product design – for example you need support, capital, manufacture, service, and relationships with customers. So even if the whole design is open, you can still build a nice little business (but perhaps not a $100M business). You can also add proprietary components and build on the open technology, or focus on your local market.
  5. My pet favourite – open hardware allows us to invent new business models, for example developing countries could build advanced telephone systems for cost price. This is so much better than buying technology from a first-world profit-oriented business that must charge a 70% mark up to cover their overheads. This is the business model behind the one laptop per child project. A $100 laptop is possible if u remove the overheads, use community input and sponsership for R&D and build volume. Well a $100 IP-PBX is also possible. Another benefit is that the hardware can be built locally (remember the hardware design is free) overcoming import tariff problems and building local industry. Combining these elements means lots of people getting connected cheaply. And that is a very good thing for the world.


I have had some great discussions on this topic with Rich Bodo. He has coined the phrase Intellectual Antiproperty on his blog.
sex wmv free moviesprev movie xxxnaked movie stars youngmovies 1 cumshotsmovies porn 19703d adult moviesfree adult thumb moviehomemade adult moviesfull sites adult download movieadult galleries moviecredit 3000 loan check noscore loan 400 creditneed 5000 credit loan bad9 6 loan payday directlstudent consolidate steps loan 7 simple80 texas hard money loanus aa personal loanspay college accept loan iloan personal acceptance guaranteed onlineaccess bar group loanssex movies bondagemovies boylovemovie bunch bradybanks briana movies freeskye threesome britany movieschasethehottie movieschicago theatre moviechinese movies sexmovie chocolatchristian teen for retreats moviesst accredited funeral servicesholder account adult credit card merchantcrunch 1990 creditenglish grade credit assignments 9th extraprogram aacsb accreditedstatus university of coast california accreditationsociety saskatchewan of credit agriculturalunion community credit 1st garland Mapporn review 1280×720girl porn 13lesbian porn 14porn 14v year oldmin fere 15 pornmin 15 porn previews15 minute clips pornporn minutes 15 Map

Icarus Verilog Mini How To

For my Free Telephony Hardware project I need to program a small Xilinx CPLD to handle some SPI bus decoding.

I have been using the gEDA tools for schematic entry and PCB design. gEDA also includes Icarus Verilog so I decided to check it out.

Verilog is a hardware description language. Instead of drawing schematic circuit diagrams you use source code to describe digital logic. Like this:
module d_ff( d, clk, q, q_bar);
input d, clk;
output q, q_bar;
reg q;
reg q_bar;

always @ (posedge clk)
q <= d;
q_bar <= !d;

This is the Verilog source for d_ff.vl, a D flip-flop. Here is a “test bench” (d_ff_tb.vl) to drive the D flip-flop:
module d_ff_tb;

reg clock, reset, d;
wire q, q_bar;

initial begin
$dumpfile ("d_ff_tb.vcd");
$dumpvars (1, d_ff_tb);
$monitor ("clock=%b, d=%b, q=%b, q_bar=%b", clock, d, q, q_bar);
clock = 0;
d = 1;
#10 d = 0;
#20 $finish;

always begin
#5 clock = !clock;

d_ff d0(
.d (d),
.clk (clock),
.q (q),
.q_bar (q_bar)


You can see the instance of the d_ff at the bottom, and the notation used to tie its “terminals” to nets in the test bench. You compile and run like this:
$ iverilog -o d_ff_tb d_ff_tb.vl d_ff.vl
$ vvp d_ff_tb
VCD info: dumpfile d_ff_tb.vcd opened for output.
clock=0, d=1, q=x, q_bar=x
clock=1, d=1, q=1, q_bar=0
clock=0, d=0, q=1, q_bar=0
clock=1, d=0, q=0, q_bar=1
clock=0, d=0, q=0, q_bar=1
clock=1, d=0, q=0, q_bar=1

You can see the flip-flop being put through its paces. First we set d=1, then on the next rising edge of the clock you can see q and q_bar change. Then we try d=0 for a few clock cycles. There is also a companion program called GTKWave that allows you to plot waveforms:

BTW I gotta say I just love the splash screen from GTKwave! I always knew engineering was as cool as surfing:

I have done a little VHDL coding (VHDL is a cousin of Verilog) using Windows GUI based tools (using a Xilinx IDE and ModelSim) and actually found it quite painful to get started and run simple simulations. So I was pleasantly surprised with how easy it was to use Icarus to develop and test simple logic designs.

Even though I had never used Verilog before it only took me about 3 hours of research on the web to get the simple simulation above written and running. This is very cool, I actually found this much easier than my first VHDL experience on a Windows GUI. Thank you very much and kudos to Stephen Williams who wrote Icarus Verilog (and thanks also to the developers of GTKWave).

I think one of the great features of Icarus is that it uses the command line and text files for everything, rather than the typical bloated Windows GUI approach. I am a big fan of the command line & text files, for example in the other gEDA tools like gschem (schematic entry) and PCB (PCB layout) text files are used to store the designs. This makes powerful processing and automation possible like manipulating PCB designs using Perl.

As an aside the Pragmatic Programmer guys have a good article on the benefits of text files here.

I am a raw beginner at Verilog and found the tutorial information on the Asic World site to be very useful. Thanks Deepak Kumar Tala for putting that site together.

How To Make Your Kids Rich

This page describes a simple plan to generate $150,000 for your child by the time they turn 21.

The Plan

What you need to supply is:

  • Time: About 20 years is great, more or less time can work if you adjust the plan. The plan works very well if you start when your child is born.
  • Saving. You need to save $8 each work-day. You get weekends off so thats $40 a week. This is the tough bit. Most people can spare $8/day but very few can save it over the long term.

It works like this:

  1. You save $8 a day, $40 a week, or around $2000 a year.
  2. You invest this in a managed share fund that is indexed to the share market. Based on historical performance this will mean an average of around 10% appreciation every year. The share market has many ups and downs, but over the long term, 10% is about right if you reinvest dividends. You are interested in the long term, so all the swings will average out. If you don’t like the share market you could choose some other investment with similar returns like a property fund.
  3. You increase your savings each year based on inflation. For example if the average inflation is 3%, you increase your savings from $40/week in the first year to $41.20 a week in the second year.
  4. You keep this up for 20 years. Your saving and the magic of compound interest means your child now has over $150,000 at age 21. Even with inflation, that still a lot of money, just when they need it the most. Invested wisely, it could make a big difference to your childs life. The cool thing is that unlike you and me kids have a pretty long time horizon, which lets compound interest go to work with a vengeance.

The Spreadsheet

Here is an Excel spreadsheet that shows you how it all works. Here is the same table in PDF form. I wanted to put the table directly in this post but couldn’t work out a nice way of getting a formatted spreadsheet table into WordPress.

BTW I find Excel (or the spreadsheet of your choice) really cool for basic financial modelling like this. It’s a lot of fun to experiment with. If you think part of my model is wrong, try entering your own parameters and see what develops. What is the effect of 5% inflation, or putting $80/week into the plan?

The Results

Does it work? I think so. I started it about 8 years ago (when my first son was born and my daughter was 2) and while not exactly matching the model results both of them could now buy a (small) new car outright. My 8 year old probably has more savings than many 28 year olds. Still, my kids are a few years behind the curves in the spreadsheet which makes me wish I had put in more money earlier. Doh!

On problem is that I am struggling to put the $40/week in. I like to do it in lump sums of a couple of $k a year and my wife often objects when I suggest it. It’s kind of hard to visualise the compounded effect $2k can have over 20 years for many people, especially if you are focussed on paying down a mortgage or have bills to pay. However I can’t think of a better use for my money that my kids future. Can you?

In Australia the government is giving $4,000 to the parents of every new baby born. So when our 3rd child was born last year I invested this baby bonus it in my newborn sons name. I am not sure what most parents do with their baby bonus however I suspect a lot of it ends up being used for flat screen TVs. Now my 11 month old son needs to file a tax return as he made a few hundred dollars in dividends! As my daughter says “he can’t even talk and he is making money”!

capitalization of loan rate interest amountcompanies 100 top loan direct worldstafford 2008 rules loan texasadrienne sloanealberta legal default student loans courtaba loan calculator autoloan bad auto for credit $9,000payday collection loan ambassadorarticles loan adjustable postcard clients rateamerican microloan3 g ringtonetime 1 ringtones purchase100 free downloads ringtone100 free download ringtonetv ringtone 24 seriesmobile to phone ringtones add24 mp3 ringtoneg ringtone for free 3 Mappayoffs loanporthole loanmissouri quarters loanag loans minnesota loan on ratesrepayment formula loanrescue loanuk search loanloan industry servicing Map

Roads and Wealth Creation

For years I have been interested in travel to some of the more remote areas of Australia. Recently I bought an old (but tough) 4WD, and with my children Amy and William we set out for the Flinders Ranges, about 500km north of our home in Adelaide, to spend a few days camping and doing a little off road driving.

One day we visited Arkaba Station, a private property where for $35 we could spend a few hours on a self guided 4WD tour through the station. There was a mixture of terrains; creeks, some very steep (at least for me) and very rough roads. I was really an ideal trip for me, just right for my (very basic) 4WD skill level. We were the only ones on the track for the entire time, and the scenery was very nice, although I was focused mostly on the road. The kids really liked the steep bits and being shaken to pieces on some of the rough roads. They also spotted lots of wild life (Emus and Kangaroos, as well as sheep that are run on the station).

One section was covered with large rocks sticking out at all angles. Our speed was limited to about 5km/hr, and inside the car we were shaking and bouncing all over the place. We could have got out and walked faster. It occurred to me that this was probably a typical road for 99% of history, and that travel at walking pace (and being shaken to pieces if in a vehicle) was probably the norm. This must have made communications very difficult, expensive and therefore very rare.

Driving off road made me appreciate that a simple flat ribbon of bitumen that enables us to travel at highway speeds really is a miracle. The distances we can cover in a few hours on a modern road must have taken days 100 years ago using the poor roads and horse and carts of the time.

Roads as a creater of wealth

One question I have been thinking about lately is “where does wealth come from”. I mean, we all talk about economic growth, which to me means more money in a country/region as a whole, but if an economy grows, we all get richer, so where does the wealth come from? Who is pumping the money in? If we are getting richer, is some one else getting poorer?

Thinking about roads gave me a good example of wealth creation.

Roads allow us to move more efficiently, less effort is required than without them. The government took our taxes and invested them in this infrastructure, now it costs much less to get from A to B. The cost of travel has been reduced.

Lets look at an example. In 1850 Farmer Joe needs to travel a round trip of 100km. He uses his horse and cart and spends two days to travel. In those two days he cannot be doing any other work. Now in 2006 Farmer Joe jumps in his car and travels the 100km in 1 hour. He returns to the farm and can work for 2 full days (minus the 1 hour). That 2 days of extra work that Joe earns income from. This is additional wealth that has been created by the road.

The interesting thing is that no one else had lost anything to make Joe richer. It is a win-win situation. Communications infrastructure (the road) has created wealth, and no one loses.

What about the cost of the road? Well I am assuming that the government paid for this out of the same taxes Joe would have paid anyway. They just decided to spend the taxes on a road rather than a warship or new town hall or something. I am also guessing that the cost of the road is far less than it’s economic benefit to those who use it over the years.

Wealth Creation

Looking around at the people around me I notice that most of are much better off in a material sense that we were say 20 years ago. Much of the stuff we buy (especially consumer goods) is also much cheaper than it was 20 years ago. For example in 1989 I bought a VCR for $1100, I just bought a combined VCR/DVD player in 2006 for $150. Adjusting for inflation thats a cost reduction of about 15 times! Fifteen VCRs for the price on one.

Somehow, wealth has been created. We all have a lot more money. My mother tells my stories about “stretching” the food money to last until pay day in the early 70’s. Now our biggest problem is when are we going to get a flat screen LCD TV?

The only exception to this rule seems to be real estate. My theory here is that real estate prices have been pushed up by demand as people have enough discretionary money now to cover huge interest repayments. Real estate has soaked up excess wealth through the laws of supply and demand.

But I digress. The question I was looking into is “how is wealth created”?

The best answer I have worked out so far (worked out in discussions with several friends) is that: we all get richer as the cost of producing stuff is reduced.

So I think it works like this:

  • Technology is developed that allows something to be produced cheaper than before. For example fertilisers make food production more efficient. Silicon chips lower the cost of electronics.
  • This makes some product or service cheaper to deliver.
  • So we spend less to obtain a given product or service, and have more money free for other things. Like flat screen TVs, or huge interest payments on your mortgage. Or sending our kids to expensive private schools.

Advancing technology is not good for everybody. Some industries get displaced. The development or roads and cars probably put a lot of the horse and cart industry out of business. This must have caused a small group of people a lot of distress, despite the huge benefit to a large number of other people. And yet the world is clearly a better place for having efficient transport.

It is also interesting to see what we do with our increased material wealth. Somehow soaking it up in big interest payments or mobile phone bills doesn’t seem right when there are 10,000 children dying each day from preventable disease. But thats another story. I am currently reading a couple of books on these subjects, and recommend them to you:

Communications Technology for Wealth Creation

I see a lot of parallels in physical communications assets like roads and communications technologies like VOIP and the Internet. They all let us communicate faster and better.

They are engines for wealth creation.

So reducing the cost of communications is a good thing. It lets us communicate faster and cheaper. Reduce the cost enough and maybe some kid in a disadvantaged country can make his first telephone call or obtain an education over the web.


So on balance I think “decreasing the cost of stuff” is a good thing. It is a mechanism for creating wealth. Like the example of roads, it can be win-win – wealth can be created like magic without disadvantaging a large group of others.

I must admit I find this pretty cool. It suggests that we can actually increase wealth (and the material well being of everyone on the planet) in a win-win situation. For us fortunate people in the 1st world this means more gadgets or a nicer house. For the less fortunate it means hope that one day they will be able to eat 3 meals a day and not die of Malaria.
ringtone para nokia 11083589i ringtone nokiasprint free sanyo 4900 ringtonefree sanyo ringtone sprint pcs 8200ringtones vx3200 lg alltelalex etheringtonalltel ringtones lg vx3200composable free ringtone motorola 120e Mapfamosas actrices pornoadvantages analysis gravimetric ofteen porn videos amatuarsex photos amputeesex adora35p phone sexalicia machado porn3movies porn Mapfree porn cartoon moviesmovies porn privatesexy stars moviefree movie horse sexaebn streaming moviesporn movies cartoon freeasian free movies sexjob movie hand Mapnipples youngshemale analpussy bareflintstones hentaiphotos peniscock cummingunderage preteenmature wives Map

A Bit Exact DSP Fairy Tale

Once upon a time in a far off land there was an underpaid pale skinny guy working in an impoverished start up who wanted to build products with fixed point DSP chips in them. The problem was he had just about killed himself doing the same thing many times in the past.

In particular, he seemed to spend a zillion hours debugging intricate fixed point DSP assembler to get products working. All while some project manager who hadn’t written a line of code for 10 years was hovering behind him asking questions like “is it ready” and mumbling something about milestone dates that were ridiculous when proposed, and soon became hilarious after the project started.

So, our hero wondered, “is there a better way”.

How to Build a DSP Based Product

Many DSP products are developed like this:

The academics “brains trust” guys work out a clever new algorithm. Now these guys are way-smart, but tend to express themselves in algebra with lots of Greek letters. They usually have PhDs, and cost a lot, and there aren’t many of them. They might test their brain-child in a simulation algorithm like Matlab:

x = [1.1 2.0 3.0];
y = [2.5 3.0 4.0];
dot = x * y’;

Now this lets them test the algorithm quickly (dot = 20.75), but you can’t run code like this on a real world product due to (a) it runs 10 million times slower than real time and (b) Matlab licences cost $5,000 each and only run on $3000 PCs. Which is a shame when u are building a $99 product.

So some poor engineer then has to make the DSP algorithm work in real time (i.e. fast) on the “target” hardware, a $10 DSP chip. Actually, in general, these guys actually like the detail work this involves. The first step they might do is convert to floating point C:
float x = {1.1,2.0,3.0};
float y = {2.5,3.0,4.0};

float dot(float x[], float y[], int n) {
float d = 0.0;
int i;

for(i=0; i<n; i )
d = x[i]*y[i];

return d;

But the guys & gals in Marketing (they are a breed closely related to Managers, but tend to be more attractive and use airports more) have decided that this must be a low cost product, so a $10 fixed point DSP is chosen.

Now a fixed point DSP is a chip that can perform multiplies and adds quickly, as long as you only use integers. They don’t do floats. So the engineer converts the algorithm to fixed point:
short x = {11,20,30}; /* x*10 */
short y = {25,30,40}; /* x*10 */

int dot(short x[], short y[], int n) {
int d = 0; int i;

for(i=0; i<n; i ) d = x[i]*y[i];

return d; /* returns dot*100 */

Now look what has happened to the x[] and y[] arrays. We have multiplied each number by 10 so we can use an integer to hold the value. If we didn’t scale each number then we would lose information (for example converting the 1.1 to 1) which on certain algorithms would cause problems.

However because each input value is multiplied by 10, that means the result is multiplied by 100 ( 10x * 10y = 100xy ). So we need to take this output scaling into account in any other stages of the algorithm. One more piece of information we need carry around with us as we code. One more source of bugs.

This is a very simple scaling example. In practice it gets much scarier and means you need to hold a lot more information in your head when you program in fixed point (see complex sample below).

There is also the potential of overflow problems. As the output is scaled up by 100, it is easy to overflow the 32 bit range of the integer output, for example if we used x[] and y[] arrays with larger numbers.

I Feel the Need for Speed

The product requires that this routine must run as fast as possible on the DSP chips. So we need to code an optimised version of this routine in assembler:
int dot(short *x, short *y, int len)
int d;

"I0 = %3;\n\t"
"I1 = %4;\n\t"
"A1 = A0 = 0;\n\t"
"R0 = [I0 ] || R1 = [I1 ];\n\t"
"LOOP dot%= LC0 = %5 >> 1;\n\t"
"LOOP_BEGIN dot%=;\n\t"
"A1 = R0.H * R1.H, A0 = R0.L * R1.L
|| R0 = [I0 ]
|| R1 = [I1 ];\n\t"
"LOOP_END dot%=;\n\t"
"R0 = (A0 = A1);\n\t"
"%0 = R0 >> 1;\n\t"
: "=&d" (d), "=&d" (before), "=&d" (after)
: "a" (x), "a" (y), "a" (len)
: "I0", "I1", "A1", "A0", "R0", "R1"

return d;

This is assembler for the Blackfin DSP, the actual code will of course vary for different DSP chips. The important bit is this:
A1 = R0.H * R1.H, A0 = R0.L * R1.L
|| R0 = [I0 ]
|| R1 = [I1 ]

In a single clock cycle, our intrepid DSP performs two parallel 16-bit multiplies, two 16-bit adds, and fetches 2 32 bit words from memory, post incrementing the pointers after each fetch. It does all this 500 million times a second, and sometimes it even gives you the right answer.

But not often. First you need to spend a lot of time getting it all to work right. The assembler is much harder to grok than the previous versions of the algorithm, and you use up a lot of brain power on a myriad of fine details. Get any one of the details wrong and the product is a dud.

The target development environment is also tough. In many cases the feedback you use to debug might be an analogue signal flowing into or out of the product (that is processed internally by the DSP). Or a blinking LED. Sometimes you don’t even know when its really working, bugs can simply slip through. You can’t always stop the code and single step using a debugger as parts of the algorithm depend on real-time activity.

It’s a development/testing/maintenance nightmare.

Bit Exact Assembler

There is a way to ease the pain a little. Lets say there is a bug in the assembler version. The problem is that to debug the algorithm you may need to understand both the assembler and the DSP algorithm, I mean to debug something you need to know how it works, right?

This is OK for a simple algorithm, but what if it is even moderately complex? Then you are dealing with (i) a very complex algorithm and (ii) very complex assembler all at the same time. This is usually too much for mere mortals (and even minor deities). The result is long, painful development, failed projects, late nights, angry spouses, and lots of pizza (well its not all bad I guess).

The trick is to divide and conquer. Make sure we are only hitting one tough problem at any given time.

One approach I have found very useful is bit exact porting to assembler. There are two important steps:

1. At the fixed point C stage, you test very very carefully. Run batteries of tests and simulations. These can be performed on a regular PC in non real time, using powerful debuggers. The idea is to verify that (i) the algorithm is OK and (ii) the fixed point port works OK. In particular (ii) is very tough, so its nice to handle this in a relatively benign environment like a PC or workstation.

2. Port each function to real time assembler one by one. Test each function against the fixed point C reference. Make sure the functions give IDENTICAL output – right down to the last bit. This takes a lot of discipline – near enough is NOT good enough.

The benefits of bit exact coding are:

1. When coding assembler, you don’t have to worry or even understand the algorithm. You just have to make sure that the C version matches the assembler version. When they do, you know that the overall system will perform as expected. This lightens the load on your brain, leading to less bugs.

2. As you integrate modules of code together, problems are easy to spot, just look for where the output doesn’t match the fixed point C simulation. This is much simpler that performing batteries of tests in a real time environment.

3. A less senior engineer can be used to perform the assembly coding, as they don’t need algorithm knowledge. This makes it easier to find people for the project, lowering costs. Adding people to the project is less of an issue, as they can get up to speed and be productive quickly.

4. Any bugs are exposed early (while unit testing), rather than late (after integration). You get an honest measure of progress, rather than nasty surprises late in the project.

Where it fails:

1. It takes a lot of discipline to perform (1) CAREFULLY. There is a lot of pressure from managers (and engineers) to jump onto the sexy real time/hardware integration stage. If you are part way through (1), and there is a milestone due, it is very tempting to “declare victory” and move onto the next stage, WAY before you are ready. The results are predictable…

2. It takes discipline to make the assembler bit exact. I mean, look, it’s nearly the same, just a few bits are different? The problem is these little errors may lead to much bigger flaws in performance later on. Try changing a single bit in your operating system binary image and see how well it runs. In some (most) cases bit-exactness matters!

3. It is honest. It gives you a black and white go/no-go indication at various stages of the project. For example the fixed point C stage is not complete until it has shown perfect performance against the spec. This reduces the “room to move” managers have, you can’t declare a phase “complete” just to suit a milestone date. It really has to work. This sort of honesty doesn’t sit well with some managers, for example they may have to admit problems to a customer early on in a project.

Case Study – Non Bit Exact

Now here is what typically happens. Our hero is under pressure from his manager to progress, so he short circuits the fixed point C simulation stage and codes straight into assembler. The C simulation and bit exact assembler have results that are “close” but not really the same. No effort is made to synchronise the results, there isn’t enough time!

Time is also short so he can’t “waste time” on unit testing, there is a milestone with a big fat cash payment attached.

The assembly code is loaded into flash on some brand-new, untested hardware, power is applied, and some basic tests performed.

Damn. It doesn’t work.

So our hero and his team start trying to debug the real time assembler code. However debugging in real time is very time consuming. The best brains in the company spend months tracking down bugs that end up being simple one line fixes. Bugs constantly interact with each other, leading the team off in false directions.

He doesn’t go back to the C simulation as it now gives different results to the assembler and “we don’t have enough time” to make them match.

As the project falls behind, there is no one who can be added, as only people who understand both the algorithms and the assembler can be used. The team is debugging leading edge algorithms in assembler.

A year passes. Senior management is fuming. They keep getting told “real soon now”. However no-one really knows how long it will be, because no one really knows just how many bugs there are. Frustration and stress is very high all around.

Eventually the company runs out of funds and the project is cancelled. There are mass departures of some of the best and most capable people from the company. Our hero leaves to join another start up where he repeats the same process.

Case Study – Bit Exact

Our hero and some members of the brains trust spend many months carefully porting the algorithm to fixed point and testing the entire product in a robust simulation. This apparent lack of progress is tough for the managers to handle, but they understand the benefits and bravely support the team.

Lots of complex bugs are thrown up, and the team works through them one by one. They don’t take too long to fix because these tough problems are at least occurring in a PC environment, where it easy to track them down.

In parallel, the fixed point C modules are passed to a team to perform the assembler coding. This team is made of up many junior engineers, who are instructed to make sure each module is bit exact with the reference C simulation. A few of them don’t like this, but the more senior members keep them on track. The junior members do a good job, as all they need to focus on is the assembler, not the algorithm details.

A few bugs slip through unit testing to integration. A senior member who is a veteran of previous “death march” DSP projects is assigned to fix the bug. She discovers that the the problem was due to an exceptional case not covered in the unit test and sure enough the assembler module wasn’t bit-exact under that case. She had previously been a critic of the extra effort required for bit exactness but the speed in which this bug was found and fixed impresses her.

Early in the project some unforeseen problems are discovered during the work on the fixed point C simulation. Some parts are running slower than expected which means more assembler may be required in the final product. This throws up an early indication that the project is at risk of falling behind.

This critical management information is flagged early, while there was time to do something about it. It is decided to add some extra engineers to the assembler side. As soon as they learn the DSP instruction set they quickly come up to speed. They don’t need to know or understand the algorithm details.

The schedule still slips by a few weeks, but all the stakeholders understand that this is a better outcome than delivering a lemon on time. No further slips occur, and the project manager earns a great deal of trust and respect by delivering “bad news early” rather than trying to hit arbitrary milestone dates and risking the entire project.

When the real-time DSP software is integrated with the hardware there are still a few bugs to find. However by this stage the real time software is at a very high quality level so the bug count is low. Integration actually proceeds on schedule, amazing everybody, especially veterans of past projects.

The product is a success, the start up has a great IPO, and the pale skinny guy retires at 28 years old.

Complex Example

Here is a rather more complex example of fixed point C code from the Speex project:
static inline spx_word32_t cheb_poly_eva(
spx_word16_t *coef,
spx_word16_t x,
int m,
char *stack
int i;
spx_word16_t b0, b1;
spx_word32_t sum;

/*Prevents overflows*/
if (x>16383)
x = 16383;
if (x<-16383)
x = -16383;

/* Initialise values */

/* Evaluate Chebyshev series formulation
using an iterative approach */
sum = ADD32(EXTEND32(coef[m]),
for(i=2;i<=m;i )
spx_word16_t tmp=b0;
b0 = SUB16(MULT16_16_Q13(x,b0), b1);
b1 = tmp;
sum = ADD32(sum,

return sum;

In English (sort of), this function works the value of a polynomial at various points around a circle. Don’t ask me why. DSP engineers just like doing things like that. For more information/punishment see the file lsp.c from the Speex project.

Macros like ADD32() simulate the instruction set of the DSP chip in the simulation code, which makes porting this function actual DSP assembler a simpler step. However they also make the code much harder to read.

Renumbering components using Perl

I am designing a telephony version of the BlackfinOne DSP/uClinux board. To develop the design I am using the gEDA CAD tools. This board runs uClinux on a Blackfin BF532 processor, and has been designed using the open source gEDA tools which makes it really easy for anyone to re-use the design. Thank you very much to Dimitar Penev and Ivan Danov who designed and published this board. They have done an excellent job.

The first stage in any electronic design is usually to enter the schematic. The schematic shows the components and their wiring in a form that is easy to follow:

I wanted to add two Ethernet ports to the BlackfinOne design, so I found a reference design and entered the schematic using the gschem program.

After I entered some sheets for the Ethernet chips, I had a little problem. In the schematic above you can see that every component has a letter & number next to it, for example, C1, C2 for capacitors, or U1, U2 etc for chips. This is a unique ID for each part. The letter generally indicates what sort of part is is (capacitor, resistor, chip etc). These IDs are know as reference designators or as a “refdes”.

Now I wanted the refdes on my new Ethernet sheets to merge nicely with those on the existing BlackfinOne design. For example if the last capacitors refdes on the BlackfinOne design stopped at C51, then my C’s should start at C52.

There is a program to do this (refdes_renum described here in the gEDA tutorial) but unfortunately it renumbers all of the refdes in the design, and I only wanted to renumber the new parts that I had added. I didn’t want to mess up anything in the existing design. Please also see the Links section below for a new alternative – grenum.

Now of course it is possible to number the components manually in gschem via the GUI, but if you know a little Perl then its much more fun to hack out a solution! Now fortunately gschem uses text files to store the schematic. The refdes are stored in the schematic file like this:
C 7000 87800 1 0 0 bf-93LC46.sym
T 8200 89100 5 10 1 1 0 0 1

I think this means that the component (leading C) that is drawn using the bf-93LC46.sym symbol file has a refdes called U19. The line starting with T tells gschem where to place the refdes text on the sheet. Anyway all I am really interested in is reading and changing the refdes lines.

So the Perl script first scans all of the existing schematic files for lines that have the text /refdes=/. The number of the refdes is then extracted, and stored in a hash if it is the largest refdes so far. On the new schematics, I initially enter each refdes as C?, U? etc. The script scans these, and wherever it finds a ? it replaces it with the next refdes in that series.

Improvements and Reflections

There is one obvious improvement. The current script isn’t smart enough to use a refdes that hasn’t been used yet, for example if C1,C2,C4 are currently used it isn’t smart enough to use C3, it will use C5 instead.

This script took me about 70 minutes to write and it worked perfectly the first time. To be honest it didn’t really save me any time (I could have manually entered the refdes in less time) but it was lots more fun to do it this way. And I guess it will be useful the next time I have this problem. Perhaps now that I have posted on it some one else might also find it useful. Come to think of it writing this blog post took longer than writing the script! DOH!

BTW every time I work with Perl (which is maybe once every two months) I spend the first 20 minutes looking up web sites to work out which regexp to use, how to iterate through a hash, etc. If I don’t work on Perl and regexps constantly it all looks like line noise to me after a few weeks:
if ($line =~ /refdes=(\w)(\?)/) {

But hey, its no big deal to refresh my memory every few months.

Solving little problems like this with Perl is fun. You almost hope there isn’t a solution out there already, just so you have an excuse to write one!


I have also written a few Perl scripts for PCB (the PCB design application in the gEDA suite) here.

After I posted the above, Levente Kovacs kindly pointed out his grenum utility which is now included with gEDA! This has been recently added to gEDA and performs the same job as the Perl script above, but can also spot gaps in refdes sequences. As grenum is now part of the official gEDA distribution I would recommend using it rather than I share a trait common among engineers – a strong desire to “do it myself” rather than checking carefully for existing solutions! Thanks Levente!
14 payday 20 loan directory2500 cash quick loansquick 26 loan 37 paydaycredit bad loan 2b personal45 loan payday 31 advance uksites 37 payday loan 53loan cash payday advance 5 5006 loan online kentucky payday 8loan 6 cost 20,8 payday lowlink 60 payday loanmovies hentaimovie lesbiansex moviesmovies adultfree hentai moviesporn moviesmovie clips porn freemovies gay free Mapmp3 achha ji1990 mp3 temptationsnation acumen mp3drame mp3 adamatecktonik mp3 tepr acdgmp3 ackermann sucheachhe mp3 lagtemp3 blog 1990s Mapfree naked moviesdildo movies hugehardcore clips moviemovies dans pornblow free movies job bestwomen movies nakedgirls free teen moviesdouble penetration free movies Mapporn cliphunterporn clipmasterclips free pornporn anal of clipsof girl porn clipsporn clipsporno clips gratuitclipsporn Map

Using Make to build hardware

I have been working on some telephony hardware using the gEDA open source CAD software. Now designing Printed Circuit Boards (PCBs) is fun, but very GUI-centric. Click-Click-Click with the mouse all day. After a few frustrating experiences (OK I messed up the design a few times) I thought “there must be a better way” and “why can’t I use the command line and Make for this?”

Lucky for me the gEDA software uses text files to store the PCB layout so as an experiment I wrote some Perl scripts to handle some boring repetitive tasks. The bottom line is I am using Make to assemble a Printed Circuit Board design from smaller “modules” of PCB – just like software. I have written about it (with lots of fancy screen shots) enemafacesit moviessex women movies fatmovie favorite quotesfemale movies strippershitting females moviesfighting movie website the temptationfinal xi fantasy moviemovies free wmv adultfree sex anal movie downloadsringtone sanyo 5400 sprint freesamsung ringtone a680 midiamc 30 barrington moviesnokia easy 3585i text ringtone freeaudiovox ringtone 8910 cricketclowns 1000 ringtone nextelfree a950 ringtone samsung scha660 samsung sprint free ringtone Mapto credit allowing friends use2bcredit accept 2bcardsunion credit affinity federal 07481accreditation masters divinity aabc inmaine federal union credit acadia madawaskaaccreditation ahrpcard processingcom gambling merchant credit accountbureas 3 credit Map

How to make your Blackfin fly Part 2

This article describes how to write fast DSP code for your Blackfin.

For the last month or so I have been working with Jean-Marc Valin from the Speex project to optimise Speex for the Blackfin. Jean-Marc had previously performed some optimisation work in the middle of 2005 (sponsored by Analog Devices).

We built on this work, reducing the complexity for the encode operation from about 40 MIPs to 23 MIPs. On a typical 500MHz Blackfin this means you can now run (500/23) = 21 Speex encoders in real time.

The really cool thing is that you can compress 21 channels of toll-quality speech using open source hardware (the STAMP boards), using an open source voice codec.

We obtained gains from:

1. Algorithm improvements, for example Jean-Marc ported large parts of the code from 32-bit to 16-bit fixed point.

2. Profiling the code and optimising the implementation of the most CPU intensive parts, for example LSP encoding and decoding, vector quantisation.

3. Experimenting with and learning about the Blackfin (e.g. care and feeding of instruction and data cache). It took us a while to work out just how to make code run fast on this processor.

4. The gcc 4.1 compiler, which uses the Blackfin hardware loop instruction, making for() loops much faster.

Why The Blackfin is Different

Most DSPs just have a relatively small amount (say 64k) of very fast internal memory. In a uClinux environment, the Blackfin has a large amount (say 64M) of slow memory, and small amounts of fast cache and internal memory.

The advantage of this arrangement is that you can run big programs (like an operating system) on the same chip while also performing hard-core DSP operations. This really reduces systems costs over designs that need a separate DSP and micro controller.

The disadvantages for crusty old DSP programmers like me is that things don’t always run as fast as you would like them to, for example if your precious DSP code doesn’t happen to be in cache when it is called then you get hit with a big performance penalty.

Some examples

To get a feel for the Blackfin I have written a bunch of test programs, some of them based around code from Speex. They can be downloaded here.

The cycles program shows how to optimise a simple dot product routine, I have previously blogged on this here.

A Simple Library for Profiling

To work out where to optimise Speex I developed a simple library to help profile the code. It works like this. You include the samcycles.h header file and insert macros:

for(i=0; i<10; i );


around the functions you wish to profile. Then, when you run the program it dumps the number of cycles executed between each macro:
root:/var/tmp> ./test_samcycles
start, 0
end, 503
TOTAL, 503

Which shows that between “start” and “end” 503 cycles were executed. Here is a more complex output from the “dark interior” of the Speex algorithm:
root:/var/tmp> ./testenc male.wav male.out
start nb_encode, 0
move, 1352
autoc, 16149
lpc, 3180
lpc_to_lsp, 21739
whole frame analysis, 17797

Ignoring the magical DSP incantations here, we can see that some routines are much heavier on the cycles than others. So those are the ones that get targeted for optimisation. You often get big gains by optimising a small number of “inner loop” operations that are hogging
all of the CPU.

Care and Feeding of your Blackfin Cache

One interesting test was “writethru” – this simply tests writing to external memory using a really tight inner loop:
"P0 = %2;\n\t"
"R0 = 0;\n\t"
"%0 = CYCLES;\n\t"
"LOOP dot%= LC0 = %3;\n\t"
"LOOP_BEGIN dot%=;\n\t"
"W[P0 ] = R0;\n\t"
"LOOP_END dot%=;\n\t"

This also illustrates why DSPs are so good at number crunching – that inner “W[P0 ] = R0” instruction executes in one cycle, and the hardware loop means 0-cycles for the loop overhead. Try doing that on your Pentium.

However look at what happens when we try this on the Blackfin target which has “write through” data-cache enabled:
root:/var/tmp> ./writethru
Test 1: Write 100 16-bit shorts
686 542 597 542 542 542 542 597 542 542

The write test runs 10 times. On each run we print out the number of cycles it took to write 100 shorts. You can see the execution time decreasing as the instruction code and data gets placed into cache.

However there is something funny going on here. Even in the best case (542 cycles) we are taking something like 5.4 cycles for each write, and it should be executing in a single cycle. My 500 MHz DSP is performing a like a 100 MHz DSP. I think I am going to cry.

The reason is that in “write through” mode every write must flow through the “narrow pipe” that connects to the Blackfins external memory. This external memory operates at 100 MHz (at least on my STAMP), so a burst of writes gets throttled to this speed.

This is not good news for a DSP programmer, where you often have lots of vectors that need to get written to memory. Very quickly.

There are a couple of solutions here. One is to take a hammer to your Blackfin STAMP hardware and go buy a Texas Instruments DSP (just kidding).

Another less exciting way is to enable “write back” cache (a kernel configuration option):
root:/var/tmp> ./writethru
Test 1: Write 100 16-bit shorts
119 102 102 102 102 102 102 102 102 102

Now we are getting somewhere. Writing 100 shorts is taking about 100 cycles as expected. Note that the first run takes a little longer, this is probably because the program code had to be loaded into the instruction cache. In “write back” cache the values get stored in fast cache until the cache-line is flushed to external memory some time later.

On a system like the Blackfin, we may run a lot of other code between calls to the DSP routines. This effectively means that the instruction and data caches are often “flushed” between calls to our DSP routines. In practice this leads to extra overhead as our DSP instructions and data need to be reloaded into cache.

In the example above the overhead was about 20%. This is very significant in DSP coding. A way to reduce this overhead is to use internal memory…..

Internal Memory

The Blackfin has a small amount of internal data (e.g. 32k) and instruction memory (e.g. 16k). Internal memory has single cycle access for reads and writes. The Blackfin uClinux-dist actually has kernel-mode alloc() functions that allow internal memory to be accessed.

The Blackfin toolchain developers are busy working on support for using internal memory in user mode programs, see this thread from the Blackfin forums.

In the mean time I have written a kernel mode driver l1 alloc that allows user-mode programs to access internal memory:
/* try alloc-ing and freeing some memory */

pa = (void*)l1_data_A_sram_alloc(0x4);
printf("pa = 0xx\n",(int)pa);
ret = l1_data_A_sram_free((unsigned int)pa);

which produces the output:
root:~> /var/tmp/test_l1_alloc
pa = 0xff803e30

i.e. a chunk of memory with an address in internal memory bank A.

To see the effect of internal versus cache/external memory:
Test 1: data in external memory
ret = 100: run time: 173 103 103 103 103 103 103 103 103 103
Test 2: data in internal memory
ret = 100: run time: 103 103 103 103 103 103 103 103 103 103

After a few runs there is no difference – i.e. on Test 1 the data from external memory has been loaded into cache. However check out the difference in the first run – Test 2 is much faster. This means that by using internal memory we avoid the overhead where the DSP code/data is out of cache, for example when your DSP code is part of a much larger program.

I should mention that to make this driver work I needed to add a an entry to my
uClinux-dist/vendors/AnalogDevices/BF537-STAMP/device_table.txt file:
/dev/l1alloc c 664 0 0 254 0 0 0 -

then rebuild Linux as for some reason I couldn’t get mknod to work. Then:
root:~> ls /dev/l1alloc -l
crw-rw-r-- 1 0 0 254, 0 /dev/l1alloc
root:~> cp var/tmp/l1_alloc_k.ko .
root:~> insmod l1_alloc_k.ko
Using l1_alloc_k.ko
root:~> /var/tmp/test_l1_alloc

Another problem I had was that insmod wouldn’t load device drivers in /var/tmp, which is where I download files from my host. Hence the copy to / above.

Speex Benchmarks

Here are the current results for Speex on the Blackfin, operating at Quality=8 (15 kbit/s), Complexity=1, non-VBR (variable bit rate) mode:

The terms ext/int memory refers to where the Speex state and automatic variables are stored. The units are k-cycles to encode a single 20ms frame, averaged over a 6 second sample (male.wav).

(1) Write through cache, ext memory: 564
(2) Write through cache, int memory: 455
(3) Write back cache , ext memory: 465
(4) Write back cache , int memory: 438

So you can see that write-back cache (3) gave us performance close to that of using internal memory (2 & 4) – quite a significant gain.

Optimisation work is in progress so we hope to reduce these numbers a little further in the near future. Also, there is still plenty of scope for optimisation of the decoder, which currently consumes about 5 MIPs with the enhancer enabled.

To test out the current Speex code for the Blackfin (or other processors for that matter) you can download from Speex SVN:
svn co

Or you can download a current snapshot from here.

movies sapphic contentfree adult trailers moviemovies free voyeursapphic petite moviesdancing dirty moviefree clips movie ebony pornmovie pornosex movie hidden disneyfucking granny moviesmovie the blowbarrington alfreds of island rhodetheme g ali ringtonesringtones 3ringtones 311ringtones 99missed call ringtone 1mobile ringtones absolutely freeac ringtone dc Mapviagra pills 100mg priceviagra acapulcoviagra achetertramadol $79 180200 cod overnight tramadolper hcl tramadol acetaminophentramadol all aboutversus ambien xanax Mapringtone acdc midi12 ringtone adamringtones australia fair advancesecond ringtone 203310 ringtone composebalut abs cbn ringtone64 poliphony ringtones1000 words ringtone Mapporn 69 position69ing porngrade 6th porn7 penis porn inchporno 70mivies 70s pornporn pictures 70sporn s fan 80 Map

GSM Port for the Blackfin

For my uCasterisk project I needed a couple of optimised codecs for the Blackfin. This post discusses the steps taken to port GSM to the Blackfin.

The GSM codec for the Blackfin can be downloaded here.


1/ To make:

2/ To test:

Download tgsm (test program produced by make) to your target and also download a source speech file like:

to your Blackfin hardware and type:
root:/var/tmp> ./tgsm male.wav male.out
SNR = 10.4591 dB enc 114 dec 39 k cycles/frame

When it runs it prints out the number of cycles it took to execute each 20ms encode and decode frame.

You can then upload the output file (male.out) to your host and listen to it. On my Linux box I use “play male.sw”, the sw lets “play” recognise it as a 16-bit signed-word file.


I spent a day or so optimising the code, for example:

a) I wrote Blackfin versions of the macros in gsm/inc/private.h

b) Applied the profiling macros SAMCYCLES and worked out which parts of the code needed the most optimisation.

c) I looked at the assembler output of various functions (gcc -S or -save-temps options) and modified the C code for better output, such as using the hardware loop supported by gcc 4.1. A lot of the original GSM code was written for older x86 compilers, and lots of compiler-specific mods were evident. In many cases to speed up code I just went back to vanilla C and the Blackfin compiler did a better job!

e) By inspecting the assembler I found some important routines were making function calls inside their inner loops which is very inefficient. These were modified to remove the function calls.

f) Use some assembler in the tightest, most cycle-hungry loops.


Using gcc 4.1 and testing on a Blackfin STAMP BF533 board:
encode: 114,000 cycles/fr: (114,000/0.02s) = 5.7 MIPs
decode: 39,000 cycles/fr: (39,000/0.02s) = 1.95 MIPs

The initial number of cycles per encode was 274,000, decode 82,000.

Further Work

My gut feel is it might be possible to reduce the total (encode plus decode) cycles by perhaps another 30% with further optimisation.

a) The analysis and synthesis filter functions consume about 50,000 cycles per encode/decode cycle, they could be converted to assembler.

b) The RPE algorithm (rpe.c) could be optimised.

c) Blackfin internal memory might speed some operations, such as autocorrelation.

How To Profile

I have written a set of macros (samcycles.h) to sample the Blackfin cycles counter. Here is an example on how to use them:

a) Patch code.c:
patch -p0 < code_profile.patch

b) make, download tgsm and re-run on the target:
root:/var/tmp> ./tgsm male.wav male.out
start Gsm_Coder, 0
Gsm_Preprocess, 5312
Gsm_LPC_Analysis, 11406
Gsm_Short_Term_Analysis_Filter, 23483
Gsm_Long_Term_Predictor, 11525
Gsm_RPE_Encoding, 8308
Gsm_Long_Term_Predictor, 10947
Gsm_RPE_Encoding, 5411
Gsm_Long_Term_Predictor, 10701
Gsm_RPE_Encoding, 5422
Gsm_Long_Term_Predictor, 10696
Gsm_RPE_Encoding, 5409
end Gsm_Coder, 521
TOTAL, 109141
SNR = 10.4591 dB enc 115 dec 39 k cycles/frame

c) To investigate further, just add more SAMCYCLES() macros. Its a good idea to remove or disable the macros when you are finished, as they use a few thousand cycles:
patch -R -p0 < code_profile.patch


To Jean-Marc Valin and the Speex project, I used some of their assembler code (see COPYING.xiph for the copyright message related to this code).
loans $100loan state alaskapercent 125 home equity loanloan home alabama mobilefast 10,000 loanequity accelerated loansfraud advance fee loana loan paper c b Mapporn review 1280×72013 girl porn14 lesbian pornporn year old 14vfere min 15 pornpreviews 15 porn minclips minute porn 1515 minutes porn Mapagreement laptop loanlaserpro origination loan systemva loans about learnlegal templates document loan personalrepayment loan educational for legislationlenders in loan bad wi makingof loans lenght used rvlibrary loan Map