PDA

View Full Version : tip: transfering data faster than async_memcpy can do.


Solomon
11-29-2007, 08:39 PM
Folks,

In the inner loop of my application, I need to transfer 4 consecutive doubles. i.e. 32 bytes.

I make these 32 byte aligned in mono memory to improve the transfer rate, and also use async_memcpy() rather than memcpy() even though I can't do any further calcs until I have these 4 numbers in poly.

Performance with async_memcpy() is good - around 900 cycles.
But when I worked out the mono memory bandwidth , it came out at only 716.8 MBytes/s
(900 cycles at 210 MHz to transfer 96 time 32 bytes)

This is much lower than the 3.2 GB/s that the mono DRAM is rated at.

For large transfers, I do get much better performance - in the region of 2.4 GB/s

I suspect that async_memcpy has a high set up cost.
I discovered that there are also lower level calls you can make to transfer data from mono to poly - documented in the ClearSpeed Instruction set reference manual.

I have wrapped these up as inline assembler as 2 routines:
cs_scatter_m2p ()
and
cs_gather_p2m ()
which takes as arguments the source and destination array locations, the size (32 in my case), and a semaphore to use, with a caveat that the mono array (the source for cs_scatter_m2p()) is stored in a small poly array of 3 ints that also holds 2 bitmasks - both of which are set to all 1 bits.


Using these the time taken to transfer 32 bytes falls to only c. 400 cycles - over 2x faster :-)

If are interested, I will post my code to this list.

Solomon

clear-cut
12-03-2007, 09:14 PM
There is definitely some overhead to async_memcpyX2Y.

The 3.0 release adds a new header "async_functions.h" which provides low overhead access to the full range of PIO capabilities at a variety of different levels via inline assembly.

Paul
03-28-2008, 05:14 PM
Solomon can you post the code . I am a new user and still using async_memcpy