├── LICENSE ├── Makefile ├── README.md ├── dma-example.c ├── dma-gpio.c └── hw-addresses.h /LICENSE: -------------------------------------------------------------------------------- 1 | This is free and unencumbered software released into the public domain. 2 | 3 | Anyone is free to copy, modify, publish, use, compile, sell, or 4 | distribute this software, either in source code form or as a compiled 5 | binary, for any purpose, commercial or non-commercial, and by any 6 | means. 7 | 8 | In jurisdictions that recognize copyright laws, the author or authors 9 | of this software dedicate any and all copyright interest in the 10 | software to the public domain. We make this dedication for the benefit 11 | of the public at large and to the detriment of our heirs and 12 | successors. We intend this dedication to be an overt act of 13 | relinquishment in perpetuity of all present and future rights to this 14 | software under copyright law. 15 | 16 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 17 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 18 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 19 | IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR 20 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 21 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 22 | OTHER DEALINGS IN THE SOFTWARE. 23 | 24 | For more information, please refer to 25 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | CC=gcc 2 | CFLAGS=-g $(DEFINES) 3 | 4 | all: help 5 | rpi1: DEFINES:=$(DEFINES) -DRPI_V1 6 | rpi2: DEFINES:=$(DEFINES) -DRPI_V2 7 | rpi3: DEFINES:=$(DEFINES) -DRPI_V3 8 | rpi1 rpi2 rpi3: dma-example dma-gpio 9 | 10 | dma-example: dma-example.c 11 | $(CC) $(CFLAGS) -o dma-example dma-example.c 12 | dma-gpio: dma-gpio.c 13 | $(CC) $(CFLAGS) -O2 -std=gnu99 -o dma-gpio dma-gpio.c -lrt 14 | 15 | clean: 16 | rm -rf dma-example dma-gpio 17 | 18 | help: 19 | @echo "Type make where target is one of:" 20 | @echo " * rpi1 (for the original Raspberry Pi A/B/A+/B+)" 21 | @echo " * rpi2 (for the Raspberry Pi A2/B2)" 22 | @echo " * rpi3 (for Raspberry Pi B3)" 23 | @echo " Pi Zero users will most likely have success building for rpi1, since the Pi Zero uses its chipset." 24 | @echo " Note: rpi2/3/zero support should be considered experimental." 25 | 26 | .PHONY: clean help 27 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Raspberry-Pi-DMA-Example 2 | ======================== 3 | 4 | Simplest example of copying memory from one region to another using DMA ("Direct Memory Access") in userland 5 | 6 | Just type `make ` where `target` is either `rpi1`, `rpi2`, or `rpi3`, and then `sudo ./dma-example` (must use sudo to get permissions for writing to DMA peripheral). 7 | Type `make help` for more info. 8 | 9 | The example simply copies the string "hello world" from one place in memory to another through the use of the Raspberry Pi's DMA peripheral. 10 | 11 | Run `sudo ./dma-gpio` for an example which toggles a GPIO output pin at 500Hz using DMA. This code (dma-gpio.c) creates a 8ms circular buffer of future output states for all 64 IOs and uses DMA to sequentially copy this buffer into the memory-mapped GPIO registers at a rate of 250,000 frames per second. This allows one to output precise waveforms to any GPIO pin without worrying about Linux task scheduling. The PWM peripheral is used for pacing the DMA transaction, so simultaneous audio output will likely cause errors. Heavy network or USB usage will decrease the timing accuracy for frame rates of 500,000+ fps, due to bus-contention, but even downloading a file at 1MB/sec only has a *very* small impact at 250,000 fps. 12 | 13 | Some code, namely for translating virtual addresses to physical ones within dma-example.c, was based on that found in the Raspberry Pi FM Transmitter which I *think* is by either Oliver Mattos or Oskar Weigl, but their website has been down for a while now. Some of the code can still be found here: http://www.raspians.com/turning-the-raspberry-pi-into-an-fm-transmitter/ 14 | 15 | Problems 16 | ====== 17 | 18 | The virtual->physical mapping function in `dma-example.c` is not cache-coherent. That means that the dma engine might see different data than the cpu. The equivalent functions in `dma-gpio.c` behave correctly, so it is only a matter of porting that code. 19 | 20 | The code hasn't been tested extensively on non-Pi v1 (e.g. Pi2, Pi3, Pi Zero). There may be some latent bugs on other hardware versions. Please report any if found. 21 | 22 | License 23 | ====== 24 | 25 | I'm putting this code in the public domain. However, the two functions in dma-example.c - `makeVirtPhysPage` and `freeVirtPhysPage` - were based on code found in the FM Transmitter, which was GPL-licensed. If you want to use this code under a non-GPL license, I would recommend replacing those functions with your own code, just to be extra safe. **Disclaimer**: I am not a lawyer. 26 | -------------------------------------------------------------------------------- /dma-example.c: -------------------------------------------------------------------------------- 1 | /* 2 | * https://github.com/Wallacoloo/Raspberry-Pi-DMA-Example : DMA Raspberry Pi Examples 3 | * Author: Colin Wallace 4 | 5 | This is free and unencumbered software released into the public domain. 6 | 7 | Anyone is free to copy, modify, publish, use, compile, sell, or 8 | distribute this software, either in source code form or as a compiled 9 | binary, for any purpose, commercial or non-commercial, and by any 10 | means. 11 | 12 | In jurisdictions that recognize copyright laws, the author or authors 13 | of this software dedicate any and all copyright interest in the 14 | software to the public domain. We make this dedication for the benefit 15 | of the public at large and to the detriment of our heirs and 16 | successors. We intend this dedication to be an overt act of 17 | relinquishment in perpetuity of all present and future rights to this 18 | software under copyright law. 19 | 20 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 21 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 22 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 23 | IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR 24 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 25 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 26 | OTHER DEALINGS IN THE SOFTWARE. 27 | 28 | For more information, please refer to 29 | */ 30 | 31 | #include //for mmap 32 | #include //for NULL 33 | #include //for printf 34 | #include //for exit 35 | #include //for file opening 36 | #include //for uint32_t 37 | #include //for memset 38 | 39 | #include "hw-addresses.h" // for DMA addresses, etc. 40 | 41 | #define PAGE_SIZE 4096 //mmap maps pages of memory, so we must give it multiples of this size 42 | 43 | 44 | //-------- Relative offsets for DMA registers 45 | //DMA Channel register sets (format of these registers is found in DmaChannelHeader struct): 46 | #define DMACH(n) (0x100*(n)) 47 | //Each DMA channel has some associated registers, but only CS (control and status), CONBLK_AD (control block address), and DEBUG are writeable 48 | //DMA is started by writing address of the first Control Block to the DMA channel's CONBLK_AD register and then setting the ACTIVE bit inside the CS register (bit 0) 49 | //Note: DMA channels are connected directly to peripherals, so physical addresses should be used (affects control block's SOURCE, DEST and NEXTCONBK addresses). 50 | #define DMAENABLE 0x00000ff0 //bit 0 should be set to 1 to enable channel 0. bit 1 enables channel 1, etc. 51 | 52 | //flags used in the DmaChannelHeader struct: 53 | #define DMA_CS_RESET (1<<31) 54 | #define DMA_CS_ACTIVE (1<<0) 55 | 56 | #define DMA_DEBUG_READ_ERROR (1<<2) 57 | #define DMA_DEBUG_FIFO_ERROR (1<<1) 58 | #define DMA_DEBUG_READ_LAST_NOT_SET_ERROR (1<<0) 59 | 60 | //flags used in the DmaControlBlock struct: 61 | #define DMA_CB_TI_DEST_INC (1<<4) 62 | #define DMA_CB_TI_SRC_INC (1<<8) 63 | 64 | //set bits designated by (mask) at the address (dest) to (value), without affecting the other bits 65 | //eg if x = 0b11001100 66 | // writeBitmasked(&x, 0b00000110, 0b11110011), 67 | // then x now = 0b11001110 68 | void writeBitmasked(volatile uint32_t *dest, uint32_t mask, uint32_t value) { 69 | uint32_t cur = *dest; 70 | uint32_t new = (cur & (~mask)) | (value & mask); 71 | *dest = new; 72 | *dest = new; //added safety for when crossing memory barriers. 73 | } 74 | 75 | struct DmaChannelHeader { 76 | uint32_t CS; //Control and Status 77 | //31 RESET; set to 1 to reset DMA 78 | //30 ABORT; set to 1 to abort current DMA control block (next one will be loaded & continue) 79 | //29 DISDEBUG; set to 1 and DMA won't be paused when debug signal is sent 80 | //28 WAIT_FOR_OUTSTANDING_WRITES; set to 1 and DMA will wait until peripheral says all writes have gone through before loading next CB 81 | //24-27 reserved 82 | //20-23 PANIC_PRIORITY; 0 is lowest priority 83 | //16-19 PRIORITY; bus scheduling priority. 0 is lowest 84 | //9-15 reserved 85 | //8 ERROR; read as 1 when error is encountered. error can be found in DEBUG register. 86 | //7 reserved 87 | //6 WAITING_FOR_OUTSTANDING_WRITES; read as 1 when waiting for outstanding writes 88 | //5 DREQ_STOPS_DMA; read as 1 if DREQ is currently preventing DMA 89 | //4 PAUSED; read as 1 if DMA is paused 90 | //3 DREQ; copy of the data request signal from the peripheral, if DREQ is enabled. reads as 1 if data is being requested, else 0 91 | //2 INT; set when current CB ends and its INTEN=1. Write a 1 to this register to clear it 92 | //1 END; set when the transfer defined by current CB is complete. Write 1 to clear. 93 | //0 ACTIVE; write 1 to activate DMA (load the CB before hand) 94 | uint32_t CONBLK_AD; //Control Block Address 95 | uint32_t TI; //transfer information; see DmaControlBlock.TI for description 96 | uint32_t SOURCE_AD; //Source address 97 | uint32_t DEST_AD; //Destination address 98 | uint32_t TXFR_LEN; //transfer length. 99 | uint32_t STRIDE; //2D Mode Stride. Only used if TI.TDMODE = 1 100 | uint32_t NEXTCONBK; //Next control block. Must be 256-bit aligned (32 bytes; 8 words) 101 | uint32_t DEBUG; //controls debug settings 102 | }; 103 | 104 | struct DmaControlBlock { 105 | uint32_t TI; //transfer information 106 | //31:27 unused 107 | //26 NO_WIDE_BURSTS 108 | //21:25 WAITS; number of cycles to wait between each DMA read/write operation 109 | //16:20 PERMAP; peripheral number to be used for DREQ signal (pacing). set to 0 for unpaced DMA. 110 | //12:15 BURST_LENGTH 111 | //11 SRC_IGNORE; set to 1 to not perform reads. Used to manually fill caches 112 | //10 SRC_DREQ; set to 1 to have the DREQ from PERMAP gate requests. 113 | //9 SRC_WIDTH; set to 1 for 128-bit moves, 0 for 32-bit moves 114 | //8 SRC_INC; set to 1 to automatically increment the source address after each read (you'll want this if you're copying a range of memory) 115 | //7 DEST_IGNORE; set to 1 to not perform writes. 116 | //6 DEST_DREG; set to 1 to have the DREQ from PERMAP gate *writes* 117 | //5 DEST_WIDTH; set to 1 for 128-bit moves, 0 for 32-bit moves 118 | //4 DEST_INC; set to 1 to automatically increment the destination address after each read (Tyou'll want this if you're copying a range of memory) 119 | //3 WAIT_RESP; make DMA wait for a response from the peripheral during each write. Ensures multiple writes don't get stacked in the pipeline 120 | //2 unused (0) 121 | //1 TDMODE; set to 1 to enable 2D mode 122 | //0 INTEN; set to 1 to generate an interrupt upon completion 123 | uint32_t SOURCE_AD; //Source address 124 | uint32_t DEST_AD; //Destination address 125 | uint32_t TXFR_LEN; //transfer length. 126 | uint32_t STRIDE; //2D Mode Stride. Only used if TI.TDMODE = 1 127 | uint32_t NEXTCONBK; //Next control block. Must be 256-bit aligned (32 bytes; 8 words) 128 | uint32_t _reserved[2]; 129 | }; 130 | 131 | //allocate a page & simultaneously determine its physical address. 132 | //virtAddr and physAddr are essentially passed by-reference. 133 | //this allows for: 134 | //void *virt, *phys; 135 | //makeVirtPhysPage(&virt, &phys) 136 | //now, virt[N] exists for 0 <= N < PAGE_SIZE, 137 | // and phys+N is the physical address for virt[N] 138 | //based on http://www.raspians.com/turning-the-raspberry-pi-into-an-fm-transmitter/ 139 | void makeVirtPhysPage(void** virtAddr, void** physAddr) { 140 | *virtAddr = valloc(PAGE_SIZE); //allocate one page of RAM 141 | 142 | //force page into RAM and then lock it there: 143 | ((int*)*virtAddr)[0] = 1; 144 | mlock(*virtAddr, PAGE_SIZE); 145 | memset(*virtAddr, 0, PAGE_SIZE); //zero-fill the page for convenience 146 | 147 | //Magic to determine the physical address for this page: 148 | uint64_t pageInfo; 149 | int file = open("/proc/self/pagemap", 'r'); 150 | lseek(file, ((size_t)*virtAddr)/PAGE_SIZE*8, SEEK_SET); 151 | read(file, &pageInfo, 8); 152 | 153 | *physAddr = (void*)(size_t)(pageInfo*PAGE_SIZE); 154 | printf("makeVirtPhysPage virtual to phys: %p -> %p\n", *virtAddr, *physAddr); 155 | } 156 | 157 | //call with virtual address to deallocate a page allocated with makeVirtPhysPage 158 | void freeVirtPhysPage(void* virtAddr) { 159 | munlock(virtAddr, PAGE_SIZE); 160 | free(virtAddr); 161 | } 162 | 163 | //map a physical address into our virtual address space. memfd is the file descriptor for /dev/mem 164 | volatile uint32_t* mapPeripheral(int memfd, int addr) { 165 | ///dev/mem behaves as a file. We need to map that file into memory: 166 | void *mapped = mmap(NULL, PAGE_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, memfd, addr); 167 | //now, *mapped = memory at physical address of addr. 168 | if (mapped == MAP_FAILED) { 169 | printf("failed to map memory (did you remember to run as root?)\n"); 170 | exit(1); 171 | } else { 172 | printf("mapped: %p\n", mapped); 173 | } 174 | return (volatile uint32_t*)mapped; 175 | } 176 | 177 | int main() { 178 | //cat /sys/module/dma/parameters/dmachans gives a bitmask of DMA channels that are not used by GPU. Results: ch 1, 3, 6, 7 are reserved. 179 | //dmesg | grep "DMA"; results: Ch 2 is used by SDHC host 180 | //ch 0 is known to be used for graphics acceleration 181 | //Thus, applications can use ch 4, 5, or the LITE channels @ 8 and beyond. 182 | int dmaChNum = 5; 183 | //First, open the linux device, /dev/mem 184 | //dev/mem provides access to the physical memory of the entire processor+ram 185 | //This is needed because Linux uses virtual memory, thus the process's memory at 0x00000000 will NOT have the same contents as the physical memory at 0x00000000 186 | int memfd = open("/dev/mem", O_RDWR | O_SYNC); 187 | if (memfd < 0) { 188 | printf("Failed to open /dev/mem (did you remember to run as root?)\n"); 189 | exit(1); 190 | } 191 | //now map /dev/mem into memory, but only map specific peripheral sections: 192 | volatile uint32_t *dmaBaseMem = mapPeripheral(memfd, DMA_BASE); 193 | 194 | //configure DMA: 195 | //allocate 1 page for the source and 1 page for the destination: 196 | void *virtSrcPage, *physSrcPage; 197 | makeVirtPhysPage(&virtSrcPage, &physSrcPage); 198 | void *virtDestPage, *physDestPage; 199 | makeVirtPhysPage(&virtDestPage, &physDestPage); 200 | 201 | //write a few bytes to the source page: 202 | char *srcArray = (char*)virtSrcPage; 203 | srcArray[0] = 'h'; 204 | srcArray[1] = 'e'; 205 | srcArray[2] = 'l'; 206 | srcArray[3] = 'l'; 207 | srcArray[4] = 'o'; 208 | srcArray[5] = ' '; 209 | srcArray[6] = 'w'; 210 | srcArray[7] = 'o'; 211 | srcArray[8] = 'r'; 212 | srcArray[9] = 'l'; 213 | srcArray[10] = 'd'; 214 | srcArray[11] = 0; //null terminator used for printf call. 215 | 216 | //allocate 1 page for the control blocks 217 | void *virtCbPage, *physCbPage; 218 | makeVirtPhysPage(&virtCbPage, &physCbPage); 219 | 220 | //dedicate the first 8 words of this page to holding the cb. 221 | struct DmaControlBlock *cb1 = (struct DmaControlBlock*)virtCbPage; 222 | 223 | //fill the control block: 224 | cb1->TI = DMA_CB_TI_SRC_INC | DMA_CB_TI_DEST_INC; //after each byte copied, we want to increment the source and destination address of the copy, otherwise we'll be copying to the same address. 225 | cb1->SOURCE_AD = (uint32_t)physSrcPage; //set source and destination DMA address 226 | cb1->DEST_AD = (uint32_t)physDestPage; 227 | cb1->TXFR_LEN = 12; //transfer 12 bytes 228 | cb1->STRIDE = 0; //no 2D stride 229 | cb1->NEXTCONBK = 0; //no next control block 230 | 231 | printf("destination was initially: '%s'\n", (char*)virtDestPage); 232 | 233 | //enable DMA channel (it's probably already enabled, but we want to be sure): 234 | writeBitmasked(dmaBaseMem + DMAENABLE/4, 1 << dmaChNum, 1 << dmaChNum); 235 | 236 | //configure the DMA header to point to our control block: 237 | volatile struct DmaChannelHeader *dmaHeader = (volatile struct DmaChannelHeader*)(dmaBaseMem + (DMACH(dmaChNum))/4); //dmaBaseMem is a uint32_t ptr, so divide by 4 before adding byte offset 238 | dmaHeader->CS = DMA_CS_RESET; //make sure to disable dma first. 239 | sleep(1); //give time for the reset command to be handled. 240 | dmaHeader->DEBUG = DMA_DEBUG_READ_ERROR | DMA_DEBUG_FIFO_ERROR | DMA_DEBUG_READ_LAST_NOT_SET_ERROR; // clear debug error flags 241 | dmaHeader->CONBLK_AD = (uint32_t)physCbPage; //we have to point it to the PHYSICAL address of the control block (cb1) 242 | dmaHeader->CS = DMA_CS_ACTIVE; //set active bit, but everything else is 0. 243 | 244 | sleep(1); //give time for copy to happen 245 | 246 | printf("destination reads: '%s'\n", (char*)virtDestPage); 247 | 248 | //cleanup 249 | freeVirtPhysPage(virtCbPage); 250 | freeVirtPhysPage(virtDestPage); 251 | freeVirtPhysPage(virtSrcPage); 252 | return 0; 253 | } 254 | -------------------------------------------------------------------------------- /dma-gpio.c: -------------------------------------------------------------------------------- 1 | /* 2 | * https://github.com/Wallacoloo/Raspberry-Pi-DMA-Example : DMA Raspberry Pi Examples 3 | * Author: Colin Wallace 4 | 5 | This is free and unencumbered software released into the public domain. 6 | 7 | Anyone is free to copy, modify, publish, use, compile, sell, or 8 | distribute this software, either in source code form or as a compiled 9 | binary, for any purpose, commercial or non-commercial, and by any 10 | means. 11 | 12 | In jurisdictions that recognize copyright laws, the author or authors 13 | of this software dedicate any and all copyright interest in the 14 | software to the public domain. We make this dedication for the benefit 15 | of the public at large and to the detriment of our heirs and 16 | successors. We intend this dedication to be an overt act of 17 | relinquishment in perpetuity of all present and future rights to this 18 | software under copyright law. 19 | 20 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 21 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 22 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 23 | IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR 24 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 25 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 26 | OTHER DEALINGS IN THE SOFTWARE. 27 | 28 | For more information, please refer to 29 | */ 30 | /* 31 | * processor documentation is at: http://www.raspberrypi.org/wp-content/uploads/2012/02/BCM2835-ARM-Peripherals.pdf 32 | * pg 38 for DMA 33 | * pg 61 for DMA DREQ PERMAP 34 | * pg 89 for gpio 35 | * pg 119 for PCM 36 | * pg 138 for PWM 37 | * pg 172 for timer info 38 | * Addendum is http://www.scribd.com/doc/127599939/BCM2835-Audio-clocks 39 | * 40 | * A few annotations for GPIO/DMA/PWM are available here: https://github.com/626Pilot/RaspberryPi-NeoPixel-WS2812/blob/master/ws2812-RPi.c 41 | * https://github.com/metachris/raspberrypi-pwm/blob/master/rpio-pwm/rpio_pwm.c 42 | * https://github.com/richardghirst/PiBits/blob/master/ServoBlaster/user/servod.c 43 | * 44 | * Cache info can be found here: http://www.freelists.org/post/raspi-internals/caches,18 45 | * 0x00000000 - L1 & L2 cache 46 | * 0x40000000 - L2 cache coherent (ie L1 writes are propagated to L2?) 47 | * 0x80000000 - L2 cache only 48 | * 0xc0000000 - direct uncached 49 | * 50 | * Useful DMA timings, etc: http://www.raspberrypi.org/forums/viewtopic.php?f=37&t=7696&start=50 51 | * 52 | * The general idea is to have a buffer of N blocks, where each block is the same size as the gpio registers, 53 | * and have the DMA module continually copying the data in this buffer into those registers. 54 | * In this way, we can have (say) 32 blocks, and then be able to buffer the next 32 IO frames. 55 | * 56 | * How is DMA transfer rate controlled? 57 | * We can use the DREQ (data request) feature. 58 | * PWM supports a configurable data consumption clock (defaults to 100MHz) 59 | * PWM (and SPI, PCM) can fire a DREQ signal any time its fifo falls below a certain point. 60 | * But we are never filling the FIFO, so DREQ would be permanently high. 61 | * Could feed PWM with dummy data, and use 2 DMA channels (one to PWM, one to GPIO, both gated), but the write-time to GPIOs may vary from the PWM, so gating may be improper 62 | * Or we can use the WAITS portion of the CB header. This allows up to 31 cycle delay -> ~25MHz? 63 | * Will have to manually determine timing characteristics though. 64 | * Or use 2 dma channels: 65 | * Have one sending the data into PWM, which is DREQ limited 66 | * Have another copying from PWM Fifo to GPIOs at a non-limited rate. This is peripheral -> peripheral, so I think it will have its own data bus. 67 | * Unfortunately, the destination can only be one word. Luckily, we have 2 PWM channels - one for setting & one for clearing GPIOs. All gpios that are broken out into the header are in the first register (verified) 68 | * Sadly, it appears that the PWM FIFO cannot be read from. One can read the current PWM output, but only if the FIFO is disabled, in which case the DREQ is too. 69 | * 70 | **Or use 1 dma channel, but additionally write to a dreq-able peripheral (PWM): 71 | * By using control-blocks, one can copy a word to the GPIOs, then have the next CB copy a word to the PWM fifo, and repeat 72 | * By having BOTH control-blocks be dreq-limited by the PWM's dreq, they can BOTH be rate-limited. 73 | * PWM clock works as so: 500MHz / clock_div = PWM_BITRATE (note: bitrate!) 74 | * PWM_BITRATE / PWM_RNG1 = #of FIFO writes/sec 75 | * Max PWM_BITRATE = 25MHz 76 | * Also, dest_addr = 0x7e20b000 // the testbus interface which is a dump peripheral that goes nowhere (http://www.raspberrypi.org/forums/viewtopic.php?f=37&t=7696&start=25 ) 77 | * 78 | * DMA Control Block layout: 79 | * repeat #srcBlock times: 80 | * 1.copy srcBlock to gpios 81 | * 2.zero srcBlock 82 | * 3.move byte to PWM (paced via DREQ) 83 | * These are largely redundant; it may be possible to use less memory (each cb uses 32 bytes of memory) 84 | * 85 | * Problem: each "frame" is currently 6 words (but the last word is padding), and 1 PAGE_SIZE is not an integer multiple of 6*4 86 | * Thus, the very last frame on each page cannot be used with DMA. Because of this, too, the virtual addressing of each frame is messed up - we must skip one frame per page. 87 | * One solution is to append 2 more pad words to each frame (so that it is 8 words in length). This fixes all issues, but increases ram usage and potentially cache problems (L2 is 128KB). However, since data reads are sequential, even if all data doesn't fit in cache, it will be prefetched. 88 | * Another solution is to decrease frame size to 4 words, and use 2 control blocks for each frame (thus eliminating the 1-byte padding in the center). This will have an even LARGER impact on ram usage - effectively using 20 words/frame vs current 14 words/frame & alternative 16words/frame 89 | * Another solution is to directly mix src data with CB data. Each CB has 2 words of padding, and a data frame is 5 words, and each CB must be aligned to 8 words. Therefore, the following is possible, assuming each frame requires 3 CBs: 90 | * CB1.1(padded) | CB1.2(padded) | CB1.3(padded) | CB2.1(padded) | CB2.2(padded) | CB2.3(unpadded) | SRC(5) | SRC(5) <- uses 56 words per 2 frames 91 | * HOWEVER, PAGE_SIZE is not an integral multiple of 56 words 92 | * Although, 1 of those CBs (the one which zeros the previous source) could be shared amongst multiple frames - that is, only zero every, say, 4 frames. The effect is: 93 | * *32 words for 1 frame grouped (5 src words - 2 means pad to 8 words for src) 94 | * 48 words for 2 frames grouped (10 src words - 2 means pad to 8 words for src) 95 | * 72 words for 3 frames grouped (15 src words - 2 means pad to 16 words for src) 96 | * 96 words for 4 frames grouped (20 src words - 2 means pad to 24 words for src) 97 | * 112 words for 5 frames grouped(25 src words - 2 means pad to 24 words for src) 98 | * 136 words for 6 frames grouped(30 src words - 2 means pad to 32 words for src) 99 | * 160 words for 7 frames grouped(35 src words - 2 means pad to 40 words for src) 100 | * 176 words for 8 frames grouped(40 src words - 2 means pad to 40 words for src) 101 | * 200 words for 9 frames grouped(45 src words - 2 means pad to 48 words for src) 102 | * 216 words for 10frames grouped(50 src words - 2 means pad to 48 words for src) 103 | * 240 words for 11frames grouped(55 src words - 2 means pad to 56 words for src) 104 | * 264 words for 12frames grouped(60 src words - 2 means pad to 64 words for src) 105 | * ...432 words for 20frames grouped(100src words - 2 means pad to 104 words for src) 106 | * ...*512 words for 24frames grouped(120src words - 2 means pad to 120 words for src) 107 | * As can be seen, this still requires extra padding. Could do 128 words for 5 frames, or 256 words for 11 frames (23.3 words/frame), and that requires funky math. 108 | * The 24 frame option would work OK. 24 is a relatively easy number to work with, and 21.3 words/frame (limit is 21 words/frame) 109 | * Another solution is to use the 2D stride functionality. The source frame is really 4 words and the destination is really 2 words, a 1 word gap, and then the other 2 words. Thus 2d stride can be used to skip over that one word gap. 110 | * 111 | * How to determine the current source word being processed? 112 | * dma header points to the physical CONBLOCK_AD. This can be linked to the virtual source address via a map. 113 | * OR: STRIDE register is unused in 1D mode. Could write the src index that this block is linked to in that register. But then we can't use stride feature. 114 | * Rather, we can't use the stride feature on ONE cb per frame. So, use stride on the buffer->GPIO cb, and use the stride register to indicate index on the zeros-copy and the PWM cb. Can tell which CB we're looking at based on the 2DEN flag. If we're looking at the buffer->GPIO cb, then instead look at NEXTCON_BK 115 | * NOTE: if 2d stride is disabled, it appears that the DMA engine doesn't even load the STRIDE register (it's read as garbage). It may PERHAPS display the last loaded word. 116 | * Note: unused fields are read as "Don't care", meaning we can't use them to store user-data. 117 | * 118 | * http://www.raspberrypi.org/forums/viewtopic.php?f=44&t=26907 119 | * Says gpu halts all DMA for 16us every 500ms. To bypass, add 'disable_pvt=1' to /boot/cmdline.txt 120 | * http://www.raspberrypi.org/forums/viewtopic.php?f=37&t=7696&start=25 121 | * Says it's possible to get access to a 250MHz clock. 122 | * How to make DMA more consistent (ie reduce bus contention?): 123 | * disable interrupts 1 uS before any 'real' transaction, enable them afterwards 124 | * Make sure dummy writes DON'T READ FROM RAM (ie, use src_ignore = 1) 125 | * boot with disable_pvt=1 (prevents gpu from halting everything to adjust ram refresh rate twice per second) in /boot/cmdline.txt. Does this affect system stability? 126 | */ 127 | 128 | #include //for mmap 129 | #include //for timespec 130 | #include //for timespec / nanosleep (need -std=gnu99) 131 | #include //for sigaction 132 | #include //for NULL 133 | #include //for printf 134 | #include //for exit, valloc 135 | //#include //some implementations declare valloc inside malloc.h 136 | #include //for file opening 137 | #include //for uint32_t 138 | #include //for memset 139 | #include //for errno 140 | #include //for pthread_setschedparam 141 | 142 | #include "hw-addresses.h" 143 | 144 | //config settings: 145 | #define PWM_FIFO_SIZE 1 //The DMA transaction is paced through the PWM FIFO. The PWM FIFO consumes 1 word every N uS (set in clock settings). Once the fifo has fewer than PWM_FIFO_SIZE words available, it will request more data from DMA. Thus, a high buffer length will be more resistant to clock drift, but may occasionally request multiple frames in a short succession (faster than FRAME_PER_SEC) in the presence of bus contention, whereas a low buffer length will always space frames AT LEAST 1/FRAMES_PER_SEC seconds apart, but may experience clock drift. 146 | #define SOURCE_BUFFER_FRAMES 8192 //number of gpio timeslices to buffer. These are processed at ~1 million/sec. So 1000 framse is 1 ms. Using a power-of-two is a good idea as it simplifies some of the arithmetic (modulus operations) 147 | #define SCHED_PRIORITY 30 //Linux scheduler priority. Higher = more realtime 148 | 149 | #define NOMINAL_CLOCK_FREQ 500000000 //PWM Clock runs at 500 MHz, unless overclocking 150 | #define BITS_PER_CLOCK 10 //# of bits to be used in each PWM cycle. Effectively acts as a clock divisor for us, since the PWM clock is in bits/second 151 | #define CLOCK_DIV 200 //# to divide the NOMINAL_CLOCK_FREQ by before passing it to the PWM peripheral. 152 | //gpio frames per second is a product of the nominal clock frequency divided by BITS_PER_CLOCK and divided again by CLOCK_DIV 153 | //At 500,000 frames/sec, memory bandwidth does not appear to be an issue (jitter of -1 to +2 uS) 154 | //attempting 1,000,000 frames/sec results in an actual 800,000 frames/sec, though with a lot of jitter. 155 | //Note that these numbers might very with heavy network or usb usage. 156 | // eg at 500,000 fps, with 1MB/sec network download, jitter is -1 to +30 uS 157 | // at 250,000 fps, with 1MB/sec network download, jitter is only -3 to +3 uS 158 | #define FRAMES_PER_SEC NOMINAL_CLOCK_FREQ/BITS_PER_CLOCK/CLOCK_DIV 159 | #define SEC_TO_FRAME(s) ((int64_t)(s)*FRAMES_PER_SEC) 160 | #define USEC_TO_FRAME(u) (SEC_TO_FRAME(u)/1000000) 161 | #define FRAME_TO_SEC(f) ((int64_t)(f)*BITS_PER_CLOCK*CLOCK_DIV/NOMINAL_CLOCK_FREQ) 162 | #define FRAME_TO_USEC(f) FRAME_TO_SEC((int64_t)(f)*1000000) 163 | 164 | #define TIMER_CLO 0x00000004 //lower 32-bits of 1 MHz timer 165 | #define TIMER_CHI 0x00000008 //upper 32-bits 166 | 167 | 168 | #define PAGE_SIZE 4096 //mmap maps pages of memory, so we must give it multiples of this size 169 | #define GPFSEL0 0x00000000 //gpio function select. There are 6 of these (32 bit registers) 170 | #define GPFSEL1 0x00000004 171 | #define GPFSEL2 0x00000008 172 | #define GPFSEL3 0x0000000c 173 | #define GPFSEL4 0x00000010 174 | #define GPFSEL5 0x00000014 175 | //bits 2-0 of GPFSEL0: set to 000 to make Pin 0 an output. 001 is an input. Other combinations represent alternate functions 176 | //bits 3-5 are for pin 1. 177 | //... 178 | //bits 27-29 are for pin 9. 179 | //GPFSEL1 repeats, but bits 2-0 are Pin 10, 27-29 are pin 19. 180 | //... 181 | #define GPSET0 0x0000001C //GPIO Pin Output Set. There are 2 of these (32 bit registers) 182 | #define GPSET1 0x00000020 183 | //writing a '1' to bit N of GPSET0 makes that pin HIGH. 184 | //writing a '0' has no effect. 185 | //GPSET0[0-31] maps to pins 0-31 186 | //GPSET1[0-21] maps to pins 32-53 187 | #define GPCLR0 0x00000028 //GPIO Pin Output Clear. There are 2 of these (32 bits each) 188 | #define GPCLR1 0x0000002C 189 | //GPCLR acts the same way as GPSET, but clears the pin instead. 190 | #define GPLEV0 0x00000034 //GPIO Pin Level. There are 2 of these (32 bits each) 191 | 192 | //physical addresses for the DMA peripherals, as found in the processor documentation: 193 | #define DMACH(n) (0x100*(n)) 194 | //DMA Channel register sets (format of these registers is found in DmaChannelHeader struct): 195 | //#define DMACH0 0x00000000 196 | //#define DMACH1 0x00000100 197 | //#define DMACH2 0x00000200 198 | //#define DMACH3 0x00000300 199 | //... 200 | //Each DMA channel has some associated registers, but only CS (control and status), CONBLK_AD (control block address), and DEBUG are writeable 201 | //DMA is started by writing address of the first Control Block to the DMA channel's CONBLK_AD register and then setting the ACTIVE bit inside the CS register (bit 0) 202 | //Note: DMA channels are connected directly to peripherals, so physical addresses should be used (affects control block's SOURCE, DEST and NEXTCONBK addresses). 203 | #define DMAENABLE 0x00000ff0 //bit 0 should be set to 1 to enable channel 0. bit 1 enables channel 1, etc. 204 | 205 | //flags used in the DmaChannelHeader struct: 206 | #define DMA_CS_RESET (1<<31) 207 | #define DMA_CS_ABORT (1<<30) 208 | #define DMA_CS_DISDEBUG (1<<28) //DMA will not stop when debug signal is asserted 209 | #define DMA_CS_PRIORITY(x) ((x)&0xf << 16) //higher priority DMA transfers are serviced first, it would appear 210 | #define DMA_CS_PRIORITY_MAX DMA_CS_PRIORITY(7) 211 | #define DMA_CS_PANIC_PRIORITY(x) ((x)&0xf << 20) 212 | #define DMA_CS_PANIC_PRIORITY_MAX DMA_CS_PANIC_PRIORITY(7) 213 | #define DMA_CS_END (1<<1) 214 | #define DMA_CS_ACTIVE (1<<0) 215 | 216 | #define DMA_DEBUG_READ_ERROR (1<<2) 217 | #define DMA_DEBUG_FIFO_ERROR (1<<1) 218 | #define DMA_DEBUG_READ_LAST_NOT_SET_ERROR (1<<0) 219 | 220 | //flags used in the DmaControlBlock struct: 221 | #define DMA_CB_TI_NO_WIDE_BURSTS (1<<26) 222 | #define DMA_CB_TI_PERMAP_NONE (0<<16) 223 | #define DMA_CB_TI_PERMAP_DSI (1<<16) 224 | //... (more found on page 61 of BCM2835 pdf 225 | #define DMA_CB_TI_PERMAP_PWM (5<<16) 226 | //... 227 | #define DMA_CB_TI_SRC_DREQ (1<<10) 228 | #define DMA_CB_TI_SRC_INC (1<<8) 229 | #define DMA_CB_TI_DEST_DREQ (1<<6) 230 | #define DMA_CB_TI_DEST_INC (1<<4) 231 | #define DMA_CB_TI_TDMODE (1<<1) 232 | 233 | 234 | //https://dev.openwrt.org/browser/trunk/target/linux/brcm2708/patches-3.10/0070-bcm2708_fb-DMA-acceleration-for-fb_copyarea.patch?rev=39770 says that YLENGTH should actually be written as # of copies *MINUS ONE* 235 | #define DMA_CB_TXFR_LEN_YLENGTH(y) (((y-1)&0x4fff) << 16) 236 | #define DMA_CB_TXFR_LEN_XLENGTH(x) ((x)&0xffff) 237 | #define DMA_CB_TXFR_YLENGTH_MASK (0x4fff << 16) 238 | #define DMA_CB_STRIDE_D_STRIDE(x) (((x)&0xffff) << 16) 239 | #define DMA_CB_STRIDE_S_STRIDE(x) ((x)&0xffff) 240 | 241 | 242 | //Dma Control Blocks must be located at addresses that are multiples of 32 bytes 243 | #define DMA_CONTROL_BLOCK_ALIGNMENT 32 244 | 245 | #define PWM_CTL 0x00000000 //control register 246 | #define PWM_STA 0x00000004 //status register 247 | #define PWM_DMAC 0x00000008 //DMA control register 248 | #define PWM_RNG1 0x00000010 //channel 1 range register (# output bits to use per sample) 249 | #define PWM_DAT1 0x00000014 //channel 1 data 250 | #define PWM_FIF1 0x00000018 //channel 1 fifo (write to this register to queue an output) 251 | #define PWM_RNG2 0x00000020 //channel 2 range register 252 | #define PWM_DAT2 0x00000024 //channel 2 data 253 | 254 | #define PWM_CTL_USEFIFO2 (1<<13) 255 | #define PWM_CTL_REPEATEMPTY2 (1<<10) 256 | #define PWM_CTL_ENABLE2 (1<<8) 257 | #define PWM_CTL_CLRFIFO (1<<6) 258 | #define PWM_CTL_USEFIFO1 (1<<5) 259 | #define PWM_CTL_REPEATEMPTY1 (1<<2) 260 | #define PWM_CTL_ENABLE1 (1<<0) 261 | 262 | #define PWM_STA_BUSERR (1<<8) 263 | #define PWM_STA_GAPERRS (0xf << 4) 264 | #define PWM_STA_FIFOREADERR (1<<3) 265 | #define PWM_STA_FIFOWRITEERR (1<<2) 266 | #define PWM_STA_ERRS PWM_STA_BUSERR | PWM_STA_GAPERRS | PWM_STA_FIFOREADERR | PWM_STA_FIFOWRITEERR 267 | 268 | #define PWM_DMAC_EN (1<<31) 269 | #define PWM_DMAC_PANIC(P) (((P)&0xff)<<8) 270 | #define PWM_DMAC_DREQ(D) (((D)&0xff)<<0) 271 | 272 | //The following is undocumented :( Taken from http://www.scribd.com/doc/127599939/BCM2835-Audio-clocks 273 | #define CM_PWMCTL 0xa0 274 | #define CM_PWMDIV 0xa4 275 | //each write to CM_PWMTL and CM_PWMDIV requires the password to be written: 276 | #define CM_PWMCTL_PASSWD 0x5a000000 277 | #define CM_PWMDIV_PASSWD 0x5a000000 278 | //MASH is used to achieve fractional clock dividers by introducing artificial jitter. 279 | //if you want constant frequency (even if it may not be at 100% CORRECT frequency), use MASH0 280 | //if clock divisor is integral, then there's no need to use MASH, and anything above MASH1 can introduce jitter. 281 | #define CM_PWMCTL_MASH(x) (((x)&0x3) << 9) 282 | #define CM_PWMCTL_MASH0 CM_PWMTRL_MASH(0) 283 | #define CM_PWMCTL_MASH1 CM_PWMTRL_MASH(1) 284 | #define CM_PWMCTL_MASH2 CM_PWMTRL_MASH(2) 285 | #define CM_PWMCTL_MASH3 CM_PWMTRL_MASH(3) 286 | #define CM_PWMCTL_FLIP (1<<8) //use to inverse clock polarity 287 | #define CM_PWMCTL_BUSY (1<<7) //read-only flag that indicates clock generator is running. 288 | #define CM_PWMCTL_KILL (1<<5) //write a 1 to stop & reset clock generator. USED FOR DEBUG ONLY 289 | #define CM_PWMCTL_ENAB (1<<4) //gracefully stop/start clock generator. BUSY flag will go low once clock is off. 290 | #define CM_PWMCTL_SRC(x) ((x)&0xf) //clock source. 0=gnd. 1=oscillator. 2-3=debug. 4=PLLA per. 5=PLLC per. 6=PLLD per. 7=HDMI aux. 8-15=GND 291 | #define CM_PWMCTL_SRC_OSC CM_PWMCTL_SRC(1) 292 | #define CM_PWMCTL_SRC_PLLA CM_PWMCTL_SRC(4) 293 | #define CM_PWMCTL_SRC_PLLC CM_PWMCTL_SRC(5) 294 | #define CM_PWMCTL_SRC_PLLD CM_PWMCTL_SRC(6) 295 | 296 | //max clock divisor is 4095 297 | #define CM_PWMDIV_DIVI(x) (((x)&0xfff) << 12) 298 | #define CM_PWMDIV_DIVF(x) ((x)&0xfff) 299 | 300 | struct DmaChannelHeader { 301 | //Note: dma channels 7-15 are 'LITE' dma engines (or is it 8-15?), with reduced performance & functionality. 302 | //Note: only CS, CONBLK_AD and DEBUG are directly writeable 303 | volatile uint32_t CS; //Control and Status 304 | //31 RESET; set to 1 to reset DMA 305 | //30 ABORT; set to 1 to abort current DMA control block (next one will be loaded & continue) 306 | //29 DISDEBUG; set to 1 and DMA won't be paused when debug signal is sent 307 | //28 WAIT_FOR_OUTSTANDING_WRITES(0x10000000); set to 1 and DMA will wait until peripheral says all writes have gone through before loading next CB 308 | //24-27 reserved 309 | //20-23 PANIC_PRIORITY; 0 is lowest priority 310 | //16-19 PRIORITY; bus scheduling priority. 0 is lowest 311 | //9-15 reserved 312 | //8 ERROR; read as 1 when error is encountered. error can be found in DEBUG register. 313 | //7 reserved 314 | //6 WAITING_FOR_OUTSTANDING_WRITES; read as 1 when waiting for outstanding writes 315 | //5 DREQ_STOPS_DMA(0x20); read as 1 if DREQ is currently preventing DMA 316 | //4 PAUSED(0x10); read as 1 if DMA is paused 317 | //3 DREQ; copy of the data request signal from the peripheral, if DREQ is enabled. reads as 1 if data is being requested (or PERMAP=0), else 0 318 | //2 INT; set when current CB ends and its INTEN=1. Write a 1 to this register to clear it 319 | //1 END; set when the transfer defined by current CB is complete. Write 1 to clear. 320 | //0 ACTIVE(0x01); write 1 to activate DMA (load the CB before hand) 321 | volatile uint32_t CONBLK_AD; //Control Block Address 322 | volatile uint32_t TI; //transfer information; see DmaControlBlock.TI for description 323 | volatile uint32_t SOURCE_AD; //Source address 324 | volatile uint32_t DEST_AD; //Destination address 325 | volatile uint32_t TXFR_LEN; //transfer length. ONLY THE LOWER 16 BITS ARE USED IN LITE DMA ENGINES 326 | volatile uint32_t STRIDE; //2D Mode Stride. Only used if TI.TDMODE = 1. NOT AVAILABLE IN LITE DMA ENGINES 327 | volatile uint32_t NEXTCONBK; //Next control block. Must be 256-bit aligned (32 bytes; 8 words) 328 | volatile uint32_t DEBUG; //controls debug settings 329 | //29-31 unused 330 | //28 LITE (0x10000000) 331 | //25-27 VERSION 332 | //16-24 DMA_STATE (dma engine state machine) 333 | //8-15 DMA_ID (AXI bus id) 334 | //4-7 OUTSTANDING_WRITES 335 | //3 unused 336 | //2 READ_ERROR 337 | //1 WRITE_ERROR 338 | //0 READ_LAST_NOT_SET_ERROR 339 | }; 340 | void logDmaChannelHeader(struct DmaChannelHeader *h) { 341 | printf("Dma Ch Header:\n CS: 0x%08x\n CONBLK_AD: 0x%08x\n TI: 0x%08x\n SOURCE_AD: 0x%08x\n DEST_AD: 0x%08x\n TXFR_LEN: %u\n STRIDE: 0x%08x\n NEXTCONBK: 0x%08x\n DEBUG: 0x%08x\n", h->CS, h->CONBLK_AD, h->TI, h->SOURCE_AD, h->DEST_AD, h->TXFR_LEN, h->STRIDE, h->NEXTCONBK, h->DEBUG); 342 | } 343 | 344 | struct DmaControlBlock { 345 | volatile uint32_t TI; //transfer information 346 | //31:27 unused 347 | //26 NO_WIDE_BURSTS 348 | //21:25 WAITS; number of cycles to wait between each DMA read/write operation 349 | //16:20 PERMAP(0x000Y0000); peripheral number to be used for DREQ signal (pacing). set to 0 for unpaced DMA. 350 | //12:15 BURST_LENGTH 351 | //11 SRC_IGNORE; set to 1 to not perform reads. Used to manually fill caches 352 | //10 SRC_DREQ; set to 1 to have the DREQ from PERMAP gate requests. 353 | //9 SRC_WIDTH; set to 1 for 128-bit moves, 0 for 32-bit moves 354 | //8 SRC_INC; set to 1 to automatically increment the source address after each read (you'll want this if you're copying a range of memory) 355 | //7 DEST_IGNORE; set to 1 to not perform writes. 356 | //6 DEST_DREQ; set to 1 to have the DREQ from PERMAP gate *writes* 357 | //5 DEST_WIDTH; set to 1 for 128-bit moves, 0 for 32-bit moves 358 | //4 DEST_INC; set to 1 to automatically increment the destination address after each read (Tyou'll want this if you're copying a range of memory) 359 | //3 WAIT_RESP; make DMA wait for a response from the peripheral during each write. Ensures multiple writes don't get stacked in the pipeline 360 | //2 unused (0) 361 | //1 TDMODE; set to 1 to enable 2D mode 362 | //0 INTEN; set to 1 to generate an interrupt upon completion 363 | volatile uint32_t SOURCE_AD; //Source address 364 | volatile uint32_t DEST_AD; //Destination address 365 | volatile uint32_t TXFR_LEN; //transfer length. 366 | //in 2D mode, TXFR_LEN is separated into two half-words to indicate Y transfers of length X, and STRIDE is added to the src/dest address after each transfer of length X. 367 | //30:31 unused 368 | //16-29 YLENGTH 369 | //0-15 XLENGTH 370 | volatile uint32_t STRIDE; //2D Mode Stride (amount to increment/decrement src/dest after each 1d copy when in 2d mode). Only used if TI.TDMODE = 1 371 | //16-31 D_STRIDE; signed (2's complement) byte increment/decrement to apply to destination addr after each XLENGTH transfer 372 | //0-15 S_STRIDE; signed (2's complement) byte increment/decrement to apply to source addr after each XLENGTH transfer 373 | volatile uint32_t NEXTCONBK; //Next control block. Must be 256-bit aligned (32 bytes; 8 words) 374 | uint32_t _reserved[2]; 375 | }; 376 | 377 | void logDmaControlBlock(struct DmaControlBlock *b) { 378 | printf("Dma Control Block:\n TI: 0x%08x\n SOURCE_AD: 0x%08x\n DEST_AD: 0x%08x\n TXFR_LEN: 0x%08x\n STRIDE: 0x%08x\n NEXTCONBK: 0x%08x\n unused: 0x%08x %08x\n", b->TI, b->SOURCE_AD, b->DEST_AD, b->TXFR_LEN, b->STRIDE, b->NEXTCONBK, b->_reserved[0], b->_reserved[1]); 379 | } 380 | 381 | struct PwmHeader { 382 | volatile uint32_t CTL; // 0x00000000 //control register 383 | //16-31 reserved 384 | //15 MSEN2 (0: PWM algorithm, 1:M/S transmission used) 385 | //14 reserved 386 | //13 USEF2 (0: data register is used for transmission, 1: FIFO is used for transmission) 387 | //12 POLA2 (0: 0=low, 1=high. 1: 0=high, 1=low (inversion)) 388 | //11 SBIT2; defines the state of the output when no transmission is in place 389 | //10 RPTL2; 0: transmission interrupts when FIFO is empty. 1: last data in FIFO is retransmitted when FIFO is empty 390 | //9 MODE2; 0: PWM mode. 1: serializer mode 391 | //8 PWMEN2; 0: channel is disabled. 1: channel is enabled 392 | //7 MSEN1; 393 | //6 CLRF1; writing a 1 to this bit clears the channel 1 (and channel 2?) fifo 394 | //5 USEF1; 395 | //4 POLA1; 396 | //3 SBIT1; 397 | //2 RPTL1; 398 | //1 MODE1; 399 | //0 PWMEN1; 400 | volatile uint32_t STA; // 0x00000004 //status register 401 | //13-31 reserved 402 | //9-12 STA1-4; indicates whether each channel is transmitting 403 | //8 BERR; Bus Error Flag. Write 1 to clear 404 | //4-7 GAPO1-4; Gap Occured Flag. Write 1 to clear 405 | //3 RERR1; Fifo Read Error Flag (attempt to read empty fifo). Write 1 to clear 406 | //2 WERR1; Fifo Write Error Flag (attempt to write to full fifo). Write 1 to clear 407 | //1 EMPT1; Reads as 1 if fifo is empty 408 | //0 FULL1; Reads as 1 if fifo is full 409 | volatile uint32_t DMAC; // 0x00000008 //DMA control register 410 | //31 ENAB; set to 1 to enable DMA 411 | //16-30 reserved 412 | //8-15 PANIC; DMA threshold for panic signal 413 | //0-7 DREQ; DMA threshold for DREQ signal 414 | uint32_t _padding1; 415 | volatile uint32_t RNG1; // 0x00000010 //channel 1 range register (# output bits to use per sample) 416 | //0-31 PWM_RNGi; #of bits to modulate PWM. (eg if PWM_RNGi=1024, then each 32-bit sample sent through the FIFO will be modulated into 1024 bits.) 417 | volatile uint32_t DAT1; // 0x00000014 //channel 1 data 418 | //0-31 PWM_DATi; Stores the 32-bit data to be sent to the PWM controller ONLY WHEN USEFi=0 (FIFO is disabled) 419 | volatile uint32_t FIF1; // 0x00000018 //channel 1 fifo (write to this register to queue an output) 420 | //writing to this register will queue a sample into the fifo. If 2 channels are enabled, then each even sample (0-indexed) is sent to channel 1, and odd samples are sent to channel 2. WRITE-ONLY 421 | uint32_t _padding2; 422 | volatile uint32_t RNG2; // 0x00000020 //channel 2 range register 423 | volatile uint32_t DAT2; // 0x00000024 //channel 2 data 424 | //0-31 PWM_DATi; Stores the 32-bit data to be sent to the PWM controller ONLY WHEN USEFi=1 (FIFO is enabled). TODO: Typo??? 425 | }; 426 | 427 | struct GpioBufferFrame { 428 | //custom structure used for storing the GPIO buffer. 429 | //These BufferFrame's are DMA'd into the GPIO memory, potentially using the DmaEngine's Stride facility 430 | uint32_t gpset[2]; 431 | uint32_t gpclr[2]; 432 | }; 433 | 434 | struct DmaChannelHeader *dmaHeader; //must be global for cleanup() 435 | 436 | void setSchedPriority(int priority) { 437 | //In order to get the best timing at a decent queue size, we want the kernel to avoid interrupting us for long durations. 438 | //This is done by giving our process a high priority. Note, must run as super-user for this to work. 439 | struct sched_param sp; 440 | sp.sched_priority=priority; 441 | int ret; 442 | if (ret = pthread_setschedparam(pthread_self(), SCHED_FIFO, &sp)) { 443 | printf("Warning: pthread_setschedparam (increase thread priority) returned non-zero: %i\n", ret); 444 | } 445 | } 446 | 447 | void writeBitmasked(volatile uint32_t *dest, uint32_t mask, uint32_t value) { 448 | //set bits designated by (mask) at the address (dest) to (value), without affecting the other bits 449 | //eg if x = 0b11001100 450 | // writeBitmasked(&x, 0b00000110, 0b11110011), 451 | // then x now = 0b11001110 452 | uint32_t cur = *dest; 453 | uint32_t new = (cur & (~mask)) | (value & mask); 454 | *dest = new; 455 | *dest = new; //best to be safe when crossing memory boundaries 456 | } 457 | 458 | 459 | uint64_t readSysTime(volatile uint32_t *timerBaseMem) { 460 | return ((uint64_t)*(timerBaseMem + TIMER_CHI/4) << 32) + (uint64_t)(*(timerBaseMem + TIMER_CLO/4)); 461 | } 462 | 463 | size_t ceilToPage(size_t size) { 464 | //round up to nearest page-size multiple 465 | if (size & (PAGE_SIZE-1)) { 466 | size += PAGE_SIZE - (size & (PAGE_SIZE-1)); 467 | } 468 | return size; 469 | } 470 | 471 | uintptr_t virtToPhys(void* virt, int pagemapfd) { 472 | uintptr_t pgNum = (uintptr_t)(virt)/PAGE_SIZE; 473 | int byteOffsetFromPage = (uintptr_t)(virt)%PAGE_SIZE; 474 | uint64_t physPage; 475 | ///proc/self/pagemap is a uint64_t array where the index represents the virtual page number and the value at that index represents the physical page number. 476 | //So if virtual address is 0x1000000, read the value at *array* index 0x1000000/PAGE_SIZE and multiply that by PAGE_SIZE to get the physical address. 477 | //because files are bytestreams, one must explicitly multiply each byte index by 8 to treat it as a uint64_t array. 478 | int err = lseek(pagemapfd, pgNum*8, SEEK_SET); 479 | if (err != pgNum*8) { 480 | printf("WARNING: virtToPhys %p failed to seek (expected %i got %i. errno: %i)\n", virt, pgNum*8, err, errno); 481 | } 482 | read(pagemapfd, &physPage, 8); 483 | if (!physPage & (1ull<<63)) { //bit 63 is set to 1 if the page is present in ram 484 | printf("WARNING: virtToPhys %p has no physical address\n", virt); 485 | } 486 | physPage = physPage & ~(0x1ffull << 55); //bits 55-63 are flags. 487 | uintptr_t mapped = (uintptr_t)(physPage*PAGE_SIZE + byteOffsetFromPage); 488 | return mapped; 489 | } 490 | 491 | uintptr_t virtToUncachedPhys(void *virt, int pagemapfd) { 492 | return virtToPhys(virt, pagemapfd) | 0x40000000; //bus address of the ram is 0x40000000. With this binary-or, writes to the returned address will bypass the CPU (L1) cache, but not the L2 cache. 0xc0000000 should be the base address if L2 must also be bypassed. However, the DMA engine is aware of L2 cache - just not the L1 cache (source: http://en.wikibooks.org/wiki/Aros/Platforms/Arm_Raspberry_Pi_support#Framebuffer ) 493 | } 494 | 495 | 496 | //allocate some memory and lock it so that its physical address will never change 497 | void* makeLockedMem(size_t size) { 498 | //void* mem = valloc(size); //memory returned by valloc is not zero'd 499 | size = ceilToPage(size); 500 | void *mem = mmap( 501 | NULL, //let kernel place memory where it wants 502 | size, //length 503 | PROT_WRITE | PROT_READ, //ask for read and write permissions to memory 504 | MAP_SHARED | 505 | MAP_ANONYMOUS | //no underlying file; initialize to 0 506 | MAP_NORESERVE | //don't reserve swap space 507 | MAP_LOCKED, //lock into *virtual* ram. Physical ram may still change! 508 | -1, // File descriptor 509 | 0); //no offset into file (file doesn't exist). 510 | if (mem == MAP_FAILED) { 511 | printf("makeLockedMem failed\n"); 512 | exit(1); 513 | } 514 | memset(mem, 0, size); //simultaneously zero the pages and force them into memory 515 | mlock(mem, size); 516 | return mem; 517 | } 518 | 519 | //free memory allocated with makeLockedMem 520 | void freeLockedMem(void* mem, size_t size) { 521 | size = ceilToPage(size); 522 | munlock(mem, size); 523 | munmap(mem, size); 524 | } 525 | 526 | void* makeUncachedMemView(void* virtaddr, size_t bytes, int memfd, int pagemapfd) { 527 | //by default, writing to any virtual address will go through the CPU cache. 528 | //this function will return a pointer that behaves the same as virtaddr, but bypasses the CPU L1 cache (note that because of this, the returned pointer and original pointer should not be used in conjunction, else cache-related inconsistencies will arise) 529 | //Note: The original memory should not be unmapped during the lifetime of the uncached version, as then the OS won't know that our process still owns the physical memory. 530 | bytes = ceilToPage(bytes); 531 | //first, just allocate enough *virtual* memory for the operation. This is done so that we can do the later mapping to a contiguous range of virtual memory: 532 | void *mem = mmap( 533 | NULL, //let kernel place memory where it wants 534 | bytes, //length 535 | PROT_WRITE | PROT_READ, //ask for read and write permissions to memory 536 | MAP_SHARED | 537 | MAP_ANONYMOUS | //no underlying file; initialize to 0 538 | MAP_NORESERVE | //don't reserve swap space 539 | MAP_LOCKED, //lock into *virtual* ram. Physical ram may still change! 540 | -1, // File descriptor 541 | 0); //no offset into file (file doesn't exist). 542 | //now, free the virtual memory and immediately remap it to the physical addresses used in virtaddr 543 | munmap(mem, bytes); //Might not be necessary; MAP_FIXED indicates it can map an already-used page 544 | for (int offset=0; offsetCS, DMA_CS_ACTIVE, 0); 587 | usleep(100); 588 | writeBitmasked(&dmaHeader->CS, DMA_CS_RESET, DMA_CS_RESET); 589 | } 590 | //could also disable PWM, but that's not imperative. 591 | } 592 | 593 | void cleanupAndExit(int sig) { 594 | cleanup(); 595 | printf("Exiting with error; caught signal: %i\n", sig); 596 | exit(1); 597 | } 598 | 599 | void sleepUntilMicros(uint64_t micros, volatile uint32_t* timerBaseMem) { 600 | //Note: cannot use clock_nanosleep with an absolute time, as the process clock may differ from the RPi clock. 601 | //this function doesn't need to be super precise, so we can tolerate interrupts. 602 | //Therefore, we can use a relative sleep: 603 | uint64_t cur = readSysTime(timerBaseMem); 604 | if (micros > cur) { //avoid overflow caused by unsigned arithmetic 605 | uint64_t dur = micros - cur; 606 | //usleep(dur); //nope, causes problems! 607 | struct timespec t; 608 | t.tv_sec = dur/1000000; 609 | t.tv_nsec = (dur - t.tv_sec*1000000)*1000; 610 | nanosleep(&t, NULL); 611 | } 612 | } 613 | 614 | 615 | //int64_t _lastTimeAtFrame0; 616 | 617 | void queue(int pin, int mode, uint64_t micros, struct GpioBufferFrame* srcArray, volatile uint32_t* timerBaseMem, struct DmaChannelHeader* dmaHeader) { 618 | //This function takes a pin, a mode (0=off, 1=on) and a time. It then manipulates the GpioBufferFrame array in order to ensure that the pin switches to the desired level at the desired time. It will sleep if necessary. 619 | //Sleep until we are on the right iteration of the circular buffer (otherwise we cannot queue the command) 620 | uint64_t callTime = readSysTime(timerBaseMem); //only used for debugging 621 | uint64_t desiredTime = micros - FRAME_TO_USEC(SOURCE_BUFFER_FRAMES); 622 | sleepUntilMicros(desiredTime, timerBaseMem); 623 | uint64_t awakeTime = readSysTime(timerBaseMem); //only used for debugging 624 | 625 | //get the current source index at the current time: 626 | //must ensure we aren't interrupted during this calculation, hence the two timers instead of 1. 627 | //Note: getting the curTime & srcIdx don't have to be done for every call to queue - it could be done eg just once per buffer. 628 | // It should be calculated regularly though, to counter clock drift & PWM FIFO underflows 629 | // It is done in this function only for simplicity 630 | int srcIdx; 631 | uint64_t curTime1, curTime2; 632 | int tries=0; 633 | do { 634 | curTime1 = readSysTime(timerBaseMem); 635 | srcIdx = dmaHeader->STRIDE; //the source index is stored in the otherwise-unused STRIDE register, for efficiency 636 | curTime2 = readSysTime(timerBaseMem); 637 | ++tries; 638 | } while (curTime2-curTime1 > 1 || (srcIdx & DMA_CB_TXFR_YLENGTH_MASK)); //allow 1 uS variability. 639 | //Uncomment the following lines and the above declaration of _lastTimeAtFrame0 to log jitter information: 640 | //int64_t curTimeAtFrame0 = curTime2 - FRAME_TO_USEC(srcIdx); 641 | //printf("Timing diff: %lli\n", (curTimeAtFrame0-_lastTimeAtFrame0)%FRAME_TO_USEC(SOURCE_BUFFER_FRAMES)); 642 | //_lastTimeAtFrame0 = curTimeAtFrame0; 643 | //if timing diff is positive, then then curTimeAtFrame0 > _lastTimeAtFrame0 644 | //curTime2 - srcIdx2 > curTime1 - srcIdx1 645 | //curTime2 - curTime2 > srcIdx2 - srcIdx1 646 | //more uS have elapsed than frames; DMA cannot keep up 647 | 648 | //calculate the frame# at which to place the event: 649 | int usecFromNow = micros - curTime2; 650 | int framesFromNow = USEC_TO_FRAME(usecFromNow); 651 | if (framesFromNow < 10) { //Not safe to schedule less than ~10uS into the future (note: should be operating on usecFromNow, not framesFromNow) 652 | printf("Warning: behind schedule: %i (%i uSec) (tries: %i) (sleep %llu -> %llu (wanted %llu))\n", framesFromNow, usecFromNow, tries, callTime, awakeTime, desiredTime); 653 | framesFromNow = 10; 654 | } 655 | int newIdx = (srcIdx + framesFromNow)%SOURCE_BUFFER_FRAMES; 656 | //Now queue the command: 657 | if (mode == 0) { //turn output off 658 | srcArray[newIdx].gpclr[pin>31] |= 1 << (pin%32); 659 | } else { //turn output on 660 | srcArray[newIdx].gpset[pin>31] |= 1 << (pin%32); 661 | } 662 | } 663 | 664 | int main() { 665 | volatile uint32_t *gpioBaseMem, *dmaBaseMem, *pwmBaseMem, *timerBaseMem, *clockBaseMem; 666 | //emergency clean-up: 667 | for (int i = 0; i < 64; i++) { //catch all signals (like ctrl+c, ctrl+z, ...) to ensure DMA is disabled 668 | struct sigaction sa; 669 | memset(&sa, 0, sizeof(sa)); 670 | sa.sa_handler = cleanupAndExit; 671 | sigaction(i, &sa, NULL); 672 | } 673 | setSchedPriority(SCHED_PRIORITY); 674 | //First, open the linux device, /dev/mem 675 | //dev/mem provides access to the physical memory of the entire processor+ram 676 | //This is needed because Linux uses virtual memory, thus the process's memory at 0x00000000 will NOT have the same contents as the physical memory at 0x00000000 677 | int memfd = open("/dev/mem", O_RDWR | O_SYNC); 678 | if (memfd < 0) { 679 | printf("Failed to open /dev/mem (did you remember to run as root?)\n"); 680 | exit(1); 681 | } 682 | int pagemapfd = open("/proc/self/pagemap", O_RDONLY); 683 | //now map /dev/mem into memory, but only map specific peripheral sections: 684 | gpioBaseMem = mapPeripheral(memfd, GPIO_BASE); 685 | dmaBaseMem = mapPeripheral(memfd, DMA_BASE); 686 | pwmBaseMem = mapPeripheral(memfd, PWM_BASE); 687 | timerBaseMem = mapPeripheral(memfd, TIMER_BASE); 688 | clockBaseMem = mapPeripheral(memfd, CLOCK_BASE); 689 | 690 | int outPin = 10; 691 | //now set our pin as an output: 692 | volatile uint32_t *fselAddr = (volatile uint32_t*)(gpioBaseMem + GPFSEL0/4 + outPin/10); 693 | writeBitmasked(fselAddr, 0x7 << (3*(outPin%10)), 0x1 << (3*(outPin%10))); 694 | //Note: PWM pacing still works, even with no physical outputs, so we don't need to set gpio pin 18 to its alternate function. 695 | 696 | //Often need to copy zeros with DMA. This array can be the source. Needs to all lie on one page 697 | void *zerosPageCached = makeLockedMem(PAGE_SIZE); 698 | void *zerosPage = makeUncachedMemView(zerosPageCached, PAGE_SIZE, memfd, pagemapfd); 699 | 700 | //configure DMA... 701 | //First, allocate memory for the source: 702 | size_t numSrcBlocks = SOURCE_BUFFER_FRAMES; //We want apx 1M blocks/sec. 703 | size_t srcPageBytes = numSrcBlocks*sizeof(struct GpioBufferFrame); 704 | void *virtSrcPageCached = makeLockedMem(srcPageBytes); 705 | void *virtSrcPage = makeUncachedMemView(virtSrcPageCached, srcPageBytes, memfd, pagemapfd); 706 | printf("mappedPhysSrcPage: %p\n", virtToPhys(virtSrcPage, pagemapfd)); 707 | 708 | //cast virtSrcPage to a GpioBufferFrame array: 709 | struct GpioBufferFrame *srcArray = (struct GpioBufferFrame*)virtSrcPage; //Note: calling virtToPhys on srcArray will return NULL. Use srcArrayCached for that. 710 | struct GpioBufferFrame *srcArrayCached = (struct GpioBufferFrame*)virtSrcPageCached; 711 | //srcArray[0].gpset[0] = (1 << outPin); //set pin ON 712 | //srcArray[numSrcBlocks/2].gpclr[0] = (1 << outPin); //set pin OFF; 713 | 714 | //configure PWM clock: 715 | *(clockBaseMem + CM_PWMCTL/4) = CM_PWMCTL_PASSWD | ((*(clockBaseMem + CM_PWMCTL/4))&(~CM_PWMCTL_ENAB)); //disable clock 716 | do {} while (*(clockBaseMem + CM_PWMCTL/4) & CM_PWMCTL_BUSY); //wait for clock to deactivate 717 | *(clockBaseMem + CM_PWMDIV/4) = CM_PWMDIV_PASSWD | CM_PWMDIV_DIVI(CLOCK_DIV); //configure clock divider (running at 500MHz undivided) 718 | *(clockBaseMem + CM_PWMCTL/4) = CM_PWMCTL_PASSWD | CM_PWMCTL_SRC_PLLD; //source 500MHz base clock, no MASH. 719 | *(clockBaseMem + CM_PWMCTL/4) = CM_PWMCTL_PASSWD | CM_PWMCTL_SRC_PLLD | CM_PWMCTL_ENAB; //enable clock 720 | do {} while (*(clockBaseMem + CM_PWMCTL/4) & CM_PWMCTL_BUSY == 0); //wait for clock to activate 721 | 722 | //configure rest of PWM: 723 | struct PwmHeader *pwmHeader = (struct PwmHeader*)(pwmBaseMem); 724 | 725 | pwmHeader->DMAC = 0; //disable DMA 726 | pwmHeader->CTL |= PWM_CTL_CLRFIFO; //clear pwm 727 | usleep(100); 728 | 729 | pwmHeader->STA = PWM_STA_ERRS; //clear PWM errors 730 | usleep(100); 731 | 732 | pwmHeader->DMAC = PWM_DMAC_EN | PWM_DMAC_DREQ(PWM_FIFO_SIZE) | PWM_DMAC_PANIC(PWM_FIFO_SIZE); //DREQ is activated at queue < PWM_FIFO_SIZE 733 | pwmHeader->RNG1 = BITS_PER_CLOCK; //used only for timing purposes; #writes to PWM FIFO/sec = PWM CLOCK / RNG1 734 | pwmHeader->CTL = PWM_CTL_REPEATEMPTY1 | PWM_CTL_ENABLE1 | PWM_CTL_USEFIFO1; 735 | 736 | //allocate memory for the control blocks 737 | size_t cbPageBytes = numSrcBlocks * sizeof(struct DmaControlBlock) * 3; //3 cbs for each source block 738 | void *virtCbPageCached = makeLockedMem(cbPageBytes); 739 | void *virtCbPage = makeUncachedMemView(virtCbPageCached, cbPageBytes, memfd, pagemapfd); 740 | //fill the control blocks: 741 | struct DmaControlBlock *cbArrCached = (struct DmaControlBlock*)virtCbPageCached; 742 | struct DmaControlBlock *cbArr = (struct DmaControlBlock*)virtCbPage; 743 | printf("#dma blocks: %i, #src blocks: %i\n", numSrcBlocks*3, numSrcBlocks); 744 | for (int i=0; i phys: 0x%08x (0x%08x)\n", i, virtToPhys(i+(void*)cbArrCached, pagemapfd), virtToUncachedPhys(i+(void*)cbArrCached, pagemapfd)); 770 | } 771 | //source: http://virtualfloppy.blogspot.com/2014/01/dma-support-at-last.html 772 | //cat /sys/module/dma/parameters/dmachans gives a bitmask of DMA channels that are not used by GPU. Results: ch 1, 3, 6, 7 are reserved. 773 | //dmesg | grep "DMA"; results: Ch 2 is used by SDHC host 774 | //ch 0 is known to be used for graphics acceleration 775 | //Thus, applications can use ch 4, 5, or the LITE channels @ 8 and beyond. 776 | //If using LITE channels, then we can't use the STRIDE feature, so that narrows it down to ch 4 and ch 5. 777 | int dmaCh = 5; 778 | //enable DMA channel (it's probably already enabled, but we want to be sure): 779 | writeBitmasked(dmaBaseMem + DMAENABLE/4, 1 << dmaCh, 1 << dmaCh); 780 | 781 | //configure the DMA header to point to our control block: 782 | dmaHeader = (struct DmaChannelHeader*)(dmaBaseMem + DMACH(dmaCh)/4); //must divide by 4, as dmaBaseMem is uint32_t* 783 | printf("Previous DMA header:\n"); 784 | logDmaChannelHeader(dmaHeader); 785 | //abort any previous DMA: 786 | //dmaHeader->NEXTCONBK = 0; //NEXTCONBK is read-only. 787 | dmaHeader->CS |= DMA_CS_ABORT; //make sure to disable dma first. 788 | usleep(100); //give time for the abort command to be handled. 789 | 790 | dmaHeader->CS = DMA_CS_RESET; 791 | usleep(100); 792 | 793 | writeBitmasked(&dmaHeader->CS, DMA_CS_END, DMA_CS_END); //clear the end flag 794 | dmaHeader->DEBUG = DMA_DEBUG_READ_ERROR | DMA_DEBUG_FIFO_ERROR | DMA_DEBUG_READ_LAST_NOT_SET_ERROR; // clear debug error flags 795 | uint32_t firstAddr = virtToUncachedPhys(cbArrCached, pagemapfd); 796 | printf("starting DMA @ CONBLK_AD=0x%08x\n", firstAddr); 797 | dmaHeader->CONBLK_AD = firstAddr; //(uint32_t)physCbPage + ((void*)cbArr - virtCbPage); //we have to point it to the PHYSICAL address of the control block (cb1) 798 | dmaHeader->CS = DMA_CS_PRIORITY(7) | DMA_CS_PANIC_PRIORITY(7) | DMA_CS_DISDEBUG; //high priority (max is 7) 799 | dmaHeader->CS = DMA_CS_PRIORITY(7) | DMA_CS_PANIC_PRIORITY(7) | DMA_CS_DISDEBUG | DMA_CS_ACTIVE; //activate DMA. 800 | 801 | uint64_t startTime = readSysTime(timerBaseMem); 802 | printf("DMA Active @ %llu uSec\n", startTime); 803 | /*while (dmaHeader->CS & DMA_CS_ACTIVE) { 804 | logDmaChannelHeader(dmaHeader); 805 | } //wait for DMA transfer to complete.*/ 806 | for (int i=1; ; ++i) { //generate the output sequence: 807 | //logDmaChannelHeader(dmaHeader); 808 | //this just toggles outPin every few us: 809 | queue(outPin, i%2, startTime + 1000*i, srcArray, timerBaseMem, dmaHeader); 810 | } 811 | //Exit routine: 812 | cleanup(); 813 | printf("Exiting cleanly:\n"); 814 | freeUncachedMemView(virtCbPage, cbPageBytes); 815 | freeLockedMem(virtCbPageCached, cbPageBytes); 816 | freeUncachedMemView(virtSrcPage, srcPageBytes); 817 | freeLockedMem(virtSrcPageCached, srcPageBytes); 818 | freeUncachedMemView(zerosPage, PAGE_SIZE); 819 | freeLockedMem(zerosPageCached, PAGE_SIZE); 820 | close(pagemapfd); 821 | close(memfd); 822 | return 0; 823 | } 824 | 825 | -------------------------------------------------------------------------------- /hw-addresses.h: -------------------------------------------------------------------------------- 1 | /* 2 | * https://github.com/Wallacoloo/Raspberry-Pi-DMA-Example : DMA Raspberry Pi Examples 3 | * Author: Colin Wallace 4 | 5 | This is free and unencumbered software released into the public domain. 6 | 7 | Anyone is free to copy, modify, publish, use, compile, sell, or 8 | distribute this software, either in source code form or as a compiled 9 | binary, for any purpose, commercial or non-commercial, and by any 10 | means. 11 | 12 | In jurisdictions that recognize copyright laws, the author or authors 13 | of this software dedicate any and all copyright interest in the 14 | software to the public domain. We make this dedication for the benefit 15 | of the public at large and to the detriment of our heirs and 16 | successors. We intend this dedication to be an overt act of 17 | relinquishment in perpetuity of all present and future rights to this 18 | software under copyright law. 19 | 20 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 21 | EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 22 | MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. 23 | IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR 24 | OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 25 | ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR 26 | OTHER DEALINGS IN THE SOFTWARE. 27 | 28 | For more information, please refer to 29 | */ 30 | 31 | /* 32 | * processor documentation for RPI1 at: http://www.raspberrypi.org/wp-content/uploads/2012/02/BCM2835-ARM-Peripherals.pdf 33 | * pg 38 for DMA 34 | */ 35 | 36 | #pragma once 37 | 38 | //-------- physical addresses for the peripherals, as found in the processor documentation: 39 | #if defined(RPI_V1) 40 | #define TIMER_BASE 0x20003000 41 | #define DMA_BASE 0x20007000 42 | #define CLOCK_BASE 0x20101000 // Undocumented. Taken from http://www.scribd.com/doc/127599939/BCM2835-Audio-clocks 43 | #define GPIO_BASE 0x20200000 44 | #define PWM_BASE 0x2020C000 45 | #define GPIO_BASE_BUS 0x7E200000 //this is the physical bus address of the GPIO module. This is only used when other peripherals directly connected to the bus (like DMA) need to read/write the GPIOs 46 | #define PWM_BASE_BUS 0x7E20C000 47 | 48 | #elif defined(RPI_V2) || defined(RPI_V3) 49 | // RPI2 and 3 use a different chipset, and the peripheral addresses have changed. 50 | #define TIMER_BASE 0x3F003000 51 | #define DMA_BASE 0x3F007000 52 | #define CLOCK_BASE 0x3F101000 // Undocumented. Extrapolated from RPI_V1 CLOCK_BASE 53 | #define GPIO_BASE 0x3F200000 54 | #define PWM_BASE 0x3F20C000 55 | #define GPIO_BASE_BUS 0x7E200000 //this is the physical bus address of the GPIO module. This is only used when other peripherals directly connected to the bus (like DMA) need to read/write the GPIOs 56 | #define PWM_BASE_BUS 0x7E20C000 57 | 58 | #else 59 | #error "Must define either RPI_V1, RPI_V2 or RPI_V3, based on target." 60 | #endif 61 | 62 | --------------------------------------------------------------------------------