├── ChangesLog ├── DEBUGFS.md ├── LICENSE ├── README.md ├── driver ├── .gitignore ├── 01goal.txt ├── 03design.txt ├── Makefile ├── TODO.japanese ├── load_ib_modules.sh ├── pib.conf ├── pib.files ├── pib.h ├── pib.spec ├── pib_ah.c ├── pib_cq.c ├── pib_debugfs.c ├── pib_dma.c ├── pib_easy_sw.c ├── pib_lib.c ├── pib_mad.c ├── pib_mad.h ├── pib_mad_pma.c ├── pib_main.c ├── pib_mr.c ├── pib_multicast.c ├── pib_packet.h ├── pib_pd.c ├── pib_qp.c ├── pib_rc.c ├── pib_spinlock.h ├── pib_srq.c ├── pib_thread.c ├── pib_trace.h ├── pib_ucontext.c └── pib_ud.c ├── libpib ├── .gitignore ├── AUTHORS ├── COPYING ├── Makefile ├── README ├── libpib-modprobe.conf ├── libpib-pib.conf ├── libpib-setup-pib.awk ├── libpib.spec ├── pib.driver └── src │ ├── pib.c │ └── pib.map ├── pibnetd ├── .gitignore ├── Makefile ├── byteorder.h ├── logger.c ├── main.c ├── perf.c ├── pibnetd.h ├── pibnetd.spec ├── pibnetd_packet.h ├── pibping.c ├── scripts │ └── redhat-pibnetd.init └── smp.c └── test ├── .gitignore ├── Makefile ├── comp_vector.c ├── qp-roundrobin.c ├── query_pkey.c ├── show_mem_reg.c ├── show_mem_reg.txt ├── test-ib_reg_mr-01.c └── test-ipoib-01.c /ChangesLog: -------------------------------------------------------------------------------- 1 | 2016-05-11 Minoru NAKAMURA 2 | 3 | * Resolve kernel version mismatch between mainline and RHEL 4 | * Update codes for kernel 4.3 5 | 6 | 2015-09-25 Minoru NAKAMURA 7 | 8 | * Update codes for kernel 4.2 9 | 10 | 2015-09-08 Minoru NAKAMURA 11 | 12 | * Fix compile errors CentOS 6.7 13 | 14 | 2015-06-27 Minoru NAKAMURA 15 | 16 | * Fix compile errors CentOS 7.1 17 | 18 | 2015-03-26 NeoCat 19 | 20 | * Update codes for kernel 3.19 21 | 22 | 2015-02-12 Minoru NAKAMURA 23 | 24 | * pib, pibned: version 0.4.6 25 | * pib: Change the L_Kery/R_Key format according to IBA Spec. Vol.1 10.6.3.3 LOCAL ACCESS KEYS 26 | * pib: Add SEND with Invaildate, Local Invalidate and Fast Register Physical MR operations 27 | * pibnet: Fix problem that pibnetd fails to reconnect to a node that has been shut down abnormally 28 | 29 | 2014-10-06 Minoru NAKAMURA 30 | 31 | * pib: version 0.4.5 32 | * Update README.md 33 | 34 | 2014-10-05 Dotan Barak 35 | 36 | * Fix 64 bit divide operation 37 | * Return zero in dev_cap.max_sge_rd 38 | * Add IB_EVENT_CLIENT_REREGISTER support to the port's cap flags 39 | 40 | 2014-10-05 Minoru NAKAMURA 41 | 42 | * Fix build errors on 32-bit x86 43 | * Fix compile errors CentOS 7 and Fedora 20 44 | 45 | 2014-10-04 Dotan Barak 46 | 47 | * Add IB_EVENT_CLIENT_REREGISTER support 48 | * Changed static allocation to dynamic allocation 49 | * Fixed casting problems in pointer <-> 64 bit variables 50 | * Add .gitignore to ignore all compilation products 51 | * Fixed typos 52 | 53 | 2014-10-29 Minoru NAKAMURA 54 | 55 | * pib: version 0.4.4 56 | * Update codes for kernel 3.17 57 | 58 | 2014-07-09 Minoru NAKAMURA 59 | 60 | * pib: version 0.4.3 61 | * Update codes for kernel 3.15 62 | * Remove floating point codes 63 | 64 | 2014-05-27 Minoru NAKAMURA 65 | 66 | * pib: version 0.4.2 67 | * Fix bug that the incorrect reinitialization of dev->thread.completion causes deadlock. 68 | 69 | 2014-05-03 Minoru NAKAMURA 70 | 71 | * pib: version 0.4.1 72 | * Add spec file 73 | 74 | 2014-03-25 Minoru NAKAMURA 75 | 76 | * pibnetd: version 0.4.0 77 | * Add spec file 78 | 79 | 2014-03-19 Minoru NAKAMURA 80 | 81 | * pib, pibnetd: version 0.3.5 82 | * Fix the wrong path of which a DR SMP MADs returns to the source node. 83 | * Fix the responder's acknowledge coalescing. 84 | * Fix that the responder forgets to set a syndrome value into the acknowledge packet. 85 | * Fix the requester's receiving acknowledge logic. 86 | * Fix missing qp_sched.master_tid increment. 87 | * Add a new flow control mechanism to prevent local ACK timeout retry by large size SEND & RDMA WRITE in RC service. 88 | Under this mechanism, the sender can send only up to PIB_MAX_CONTIG_REQUESTS of packets before it must wait for an acknowledgement. 89 | * Add a new flow control mechanism to prevent local ACK timeout retry by large size RDMA READs in RC service. 90 | The responder can send only up to PIB_MAX_CONTIG_READ_ACKS of packets before it must wait for Congestion Notification Packet(CNP). 91 | * Fix incorrect range check to remove pib_ack when a retried RDMA READ request is received. 92 | * Remove the flow control to limit the number of outstanding RDMA READS since 0.3.4. 93 | 94 | 2014-03-11 Minoru NAKAMURA 95 | 96 | * pib: version 0.3.4 97 | * Fix comparison between no masked PSNs. PSN compare must use get_psn_diff(). 98 | * In RC service, a successful completion postpones local ACK timeout of all send WQEs in the waiting list. 99 | It avoids happening timeout on the situation of outstanding RDMA READ requests. 100 | * Add a new flow control mechanism that the requester side sets the internal maximum number of outstanding RDMA READs & Atomic operations to 1 when a RDMA READ request is resend by local ACK timeout. 101 | * Fix bug that check outstanding RDMA READs & Atomic operations exceed max_rd_atomic before a retired RDMA READ request remove previous RDMA READs. 102 | * Enable to specify kthread's priority 103 | * Expand the default size of socket buffer to 16MiB 104 | * Enable to count how many times local ACK via /sys/class/infiniband/pib_X/local_ack_timeout. 105 | 106 | 2014-03-08 Minoru NAKAMURA 107 | 108 | * pib: version 0.3.3 109 | * Fix bug that the execution tracer uses seq_file in an incorrect way. 110 | * Enable to suppress duplicate execution traces. 111 | * Enable execution tracer to collect new events for completion and local ACK timer retry. 112 | 113 | 2014-03-05 Minoru NAKAMURA 114 | 115 | * pib, pibnetd: version 0.3.2 116 | * pib: Fix bug that get_sw_port_num may access the pointer of pib_devs[] as NULL during pib.ko is unloading. 117 | * pib: Fix bug that process_subn fails to deliver trap responses to subnet manager. 118 | * pibnetd: Fix incorrect type (u8 -> u16) in pib_packet_lrh_set/get_pktlen. 119 | 120 | 2014-02-27 Minoru NAKAMURA 121 | 122 | * pib, pibnetd: version 0.3.1 123 | * When a node is unloading pib.ko, pibnetd should free the port of which is connected to the node. 124 | 125 | 2014-02-25 Minoru NAKAMURA 126 | 127 | * pib, pibnetd: version 0.3.0 128 | * Add pibnetd 129 | 130 | 2014-02-11 Minoru NAKAMURA 131 | 132 | * pib: version 0.2.9 133 | * Hack that ib_ipoib leak address handles when IB driver is unregistered 134 | 135 | 2014-02-05 Minoru NAKAMURA 136 | 137 | * pib: version 0.2.8 138 | * Add debugfs feature to collect execution trace 139 | 140 | 2014-01-31 Minoru NAKAMURA 141 | 142 | * pib: version 0.2.7 143 | * Add debugfs feature to inject an error 144 | * Enable to change the size of send/recv buffers 145 | 146 | 2014-01-29 Minoru NAKAMURA 147 | 148 | * pib: version 0.2.6 149 | * Add debugfs feature to inspect internal objects 150 | * Add solicited event processing 151 | * Change dev->lock locking 152 | 153 | 2014-01-25 Minoru NAKAMURA 154 | 155 | * pib: version 0.2.5 156 | * Add PerfMtg MAD processing 157 | * Enable to work ibnetdiscover, perfquery 158 | 159 | 2014-01-24 Minoru NAKAMURA 160 | 161 | * pib: version 0.2.4 162 | * Add GRH support 163 | 164 | 2014-01-23 Minoru NAKAMURA 165 | 166 | * pib: version 0.2.3 167 | * Fix the condition to generate IBV_EVENT_SQ_DRAINED 168 | 169 | 2014-01-19 Minoru NAKAMURA 170 | 171 | * pib: version 0.2.1 172 | * Bugfix some bugs 173 | * Enable to work RDMA CM 174 | 175 | 2014-01-17 Minoru NAKAMURA 176 | 177 | * pib: version 0.2.0 178 | * Add multicast support 179 | * Enable to work IPoIB 180 | 181 | 2013-12-28 Minoru NAKAMURA 182 | 183 | * pib: version 0.1.0 184 | * Add Subnet Management Packet(SMP) processing 185 | * Enable to work with OpenSM 186 | 187 | 2013-12-09 Minoru NAKAMURA 188 | 189 | * pib: version 0.0.5 190 | * libpib: version 0.0.5 191 | * Hack for the IB/core bug to pass imm_data from ib_uverbs_send_wr to ib_send_wr correctly when sending UD messages. 192 | 193 | 2013-10-30 Minoru NAKAMURA 194 | 195 | * libpib: version 0.0.2 196 | * Add spec file 197 | 198 | 2013-10-06 Minoru NAKAMURA 199 | 200 | * pib: version 0.0.1 201 | * Initial 202 | -------------------------------------------------------------------------------- /DEBUGFS.md: -------------------------------------------------------------------------------- 1 | Debugging support 2 | ================= 3 | 4 | First ensure that debugfs is mounted. 5 | 6 | # mount -t debugfs none /sys/kernel/debug 7 | 8 | A list of available debugging functions can be found in /sys/kernel/debug/pib/pib_X/. 9 | 10 | 11 | Objection inspection 12 | -------------------- 13 | 14 | The object inspection displays IB objects. 15 | _ucontext_, _cq_, _pd_, _mr_, _ah_, _srq_ and _qp_ 16 | 17 | Each IB objects except QP has an unique number(OID) in creation time. 18 | The OID has a range from 1 to N. 19 | Zero indicates invalid. 20 | 21 | The QP's OID is the same as QPN. 22 | 23 | In addition, each IB object except ucontext via uverbs is assigned *user handle id* (UHWD). 24 | User-land programs can get user handle id from handle fields of struct ibv_pd, struct ibv_mr, struct ibv_ah, struct ibv_srq and struct ibv_qp. 25 | 26 | _ucontext_ displays a list of ucontext(s). 27 | 28 | OID CREATIONTIME PID COMM 29 | 0007 [2014-02-08 02:59:05.631,314,154] 13445 ibv_rc_pingpong 30 | 0008 [2014-02-08 02:59:10.044,493,257] 13447 ibv_srq_pingpon 31 | 32 | _cq_ displays a list of completion queue(s). 33 | 34 | OID UCTX UHWD CREATIONTIME S MAX CUR TYPE NOTIFY 35 | 0001 KERN NOHWD [2014-02-08 02:46:03.057,814,080] OK 1280 0 NONE WAIT 36 | 0002 KERN NOHWD [2014-02-08 02:46:03.061,807,815] OK 1280 0 NONE WAIT 37 | 0003 KERN NOHWD [2014-02-08 02:46:03.070,253,748] OK 642 0 NONE WAIT 38 | 0004 KERN NOHWD [2014-02-08 02:46:03.070,370,634] OK 128 13 NONE WAIT 39 | 0005 KERN NOHWD [2014-02-08 02:46:03.076,738,203] OK 642 0 NONE WAIT 40 | 0006 KERN NOHWD [2014-02-08 02:46:03.076,851,870] OK 128 14 NONE WAIT 41 | 000d 7 0 [2014-02-08 02:59:05.631,369,851] OK 501 0 NONE WAIT 42 | 000e 8 1 [2014-02-08 02:59:10.044,547,908] OK 516 0 NONE WAIT 43 | 44 | * _UCTX_ displays an ucontext OID that this cq belongs to. If the cq is generated by kernel code, UCTX indicates *KERN*. 45 | * _UHWD_ displays this cq's user handle id. 46 | * _S_ indicates *OK" or "ERR" as this cq's state. 47 | * _TYPE_ indicates *NONE*(don't attach completion channel), *SOLI*(solicited only) or *COMP*(all completion). 48 | * _NOTIFY_ indicates *NOTIFY* or *WAIT*. 49 | 50 | _pd_ displays a list of protection domain(s). 51 | 52 | OID UCTX UHWD CREATIONTIME 53 | 0001 KERN NOHWD [2014-02-08 02:46:03.058,019,052] 54 | 0002 KERN NOHWD [2014-02-08 02:46:03.062,021,207] 55 | 0003 KERN NOHWD [2014-02-08 02:46:03.067,386,427] 56 | 0004 KERN NOHWD [2014-02-08 02:46:03.073,458,442] 57 | 000b 7 0 [2014-02-08 02:59:05.631,326,473] 58 | 000c 8 1 [2014-02-08 02:59:10.044,504,640] 59 | 60 | _mr_ displays a list of memory region(s). 61 | 62 | OID UCTX UHWD CREATIONTIME PD START LENGTH LKEY RKEY DMA AC 63 | 0001 KERN NOHWD [2014-02-08 02:46:03.058,035,303] 0001 0000000000000000 ffffffffffffffff 63a43000 63a44000 DMA 1 64 | 0002 KERN NOHWD [2014-02-08 02:46:03.061,804,672] 0001 0000000000000000 ffffffffffffffff 28840001 38843001 DMA 1 65 | 0013 7 0 [2014-02-08 02:59:05.631,362,956] 000b 0000000001e4d000 0000000000001000 0bd78000 7bd7b000 USR 1 66 | 0014 8 1 [2014-02-08 02:59:10.044,541,101] 000c 00000000006a1000 0000000000001000 7a4e8000 4a4ef000 USR 1 67 | 68 | * _DMA_ indicates *DMA* or *USR*. 69 | 70 | _ah_ displays a list of address handle(s). 71 | 72 | OID UCTX UHWD CREATIONTIME PD DLID AC PORT 73 | 000017 KERN NOHWD [2014-02-08 02:46:03.361,759,808] 0001 0001 0 1 74 | 000019 KERN NOHWD [2014-02-08 02:46:03.361,770,318] 0002 0001 0 2 75 | 00001a KERN NOHWD [2014-02-08 02:46:06.193,034,040] 0003 c000 1 1 76 | 000023 KERN NOHWD [2014-02-08 02:46:06.199,621,847] 0004 c006 1 2 77 | 78 | _srq_ displays a list of share receive queue(s). 79 | 80 | OID UCTX UHWD CREATIONTIME PD S MAX CUR 81 | 0001 KERN NOHWD [2014-02-08 02:46:03.067,477,973] 0003 OK 256 0 82 | 0002 KERN NOHWD [2014-02-08 02:46:03.073,521,972] 0004 OK 256 0 83 | 0006 8 0 [2014-02-08 02:59:10.044,600,849] 000c OK 500 0 84 | 85 | _qp_ displays a list of queue pair(s). 86 | 87 | OID UCTX UHWD CREATIONTIME PD QT STATE S-CQ R-CQ SRQ MAX-S CUR-S MAX-R CUR-R 88 | 000000 KERN NOHWD [2014-02-08 02:46:03.058,037,569] 0001 SMI RTS 0001 0001 0000 128 0 512 0 89 | 000001 KERN NOHWD [2014-02-08 02:46:03.058,604,390] 0001 GSI RTS 0001 0001 0000 128 0 512 0 90 | 000000 KERN NOHWD [2014-02-08 02:46:03.062,054,749] 0002 SMI RTS 0002 0002 0000 128 0 512 0 91 | 000001 KERN NOHWD [2014-02-08 02:46:03.063,060,784] 0002 GSI RTS 0002 0002 0000 128 0 512 0 92 | 547575 KERN NOHWD [2014-02-08 02:46:03.070,408,186] 0003 UD RTS 0004 0003 0000 128 0 256 0 93 | 547576 KERN NOHWD [2014-02-08 02:46:03.076,867,571] 0004 UD RTS 0006 0005 0000 128 0 256 0 94 | 5475ab 8 1 [2014-02-08 02:59:10.044,746,008] 000c RC INIT 000e 000e 0006 1 0 0 0 95 | 5475bb 9 0 [2014-02-08 03:01:35.975,160,059] 000d UD INIT 000f 000f 0000 1 0 500 0 96 | 97 | Execution trace 98 | --------------- 99 | 100 | _trace_ displays execution trace. 101 | 102 | * _API_ indicates that user-land programs or kernel modules call IB API. 103 | * _SEND_ indicates that pib's socket transmits an IB packet encapsulated in the UDP packet. 104 | * _RCV1_ indicates that pib's socket receive an UDP packet. 105 | * _RCV2_ indicates that pib's socket accepts the receiving UDP packet as the encapsulated IB packet. 106 | * _RTRY_ indicates that pib's requester perform retries due to local ack timeout. 107 | * _COMP_ indicates that pib generates a successful completion or a completion error. 108 | * _ASYNC_ indicates that an asynchronous error(including an event) is caused. 109 | * _TIME_ is an internal entry. 110 | 111 | You can insert a bookmark message into the list of execution trace. 112 | 113 | $ echo "Benchmark Start." > /sys/kernel/debug/pib/pib_0/trace 114 | 115 | [2014-02-09 03:51:58.198,557,202] RCV1 UD/SEND_ONLY PORT:1 PSN:001cee LEN:0264 SLID:ffff DLID:0001 DQPN:000000 116 | [2014-02-09 03:51:58.198,557,944] RCV2 UD/SEND_ONLY PORT:1 PSN:001cee DATA:0256 SQPN:000000 117 | [2014-02-09 03:51:58.198,562,915] API req_notify_cq OID:0001 118 | [2014-02-09 03:51:58.198,563,055] API poll_cq OID:0001 119 | [2014-02-09 03:51:58.198,571,649] API destroy_ah OID:001d35 120 | [2014-02-09 03:51:58.198,573,892] API post_recv OID:000000 121 | [2014-02-09 03:51:58.198,574,445] API poll_cq OID:0001 122 | [2014-02-09 03:52:06.477,762,060] TIME 123 | [2014-02-09 03:52:06.477,762,084] BOOKMARK Bechmark Start. 124 | [2014-02-09 03:52:08.198,759,305] TIME 125 | [2014-02-09 03:52:08.198,759,409] API create_ah OID:001d36 126 | [2014-02-09 03:52:08.198,763,268] API post_send OID:000000 127 | 128 | Error injection 129 | --------------- 130 | 131 | You can inject CQ, QP or SRQ asynchronous error via _inject_err_. 132 | 133 | $ echo "CQ 0004" > inject_err 134 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | pib - Pseudo InfiniBand HCA driver 2 | ================================== 3 | 4 | pib is a software-based InfiniBand HCA driver. 5 | It provides InfiniBand functions without real IB HCA & fabric. 6 | pib aims to simulate InfiniBand behavior accurately but not to get speed. 7 | 8 | pib contains the three components. 9 | 10 | - pib.ko - Linux kernel module 11 | - libpib - Userspace plug-in module for libibverbs 12 | - pibnetd - IB switch emulator for multi-host-mode 13 | 14 | Features 15 | ======== 16 | 17 | In single-host-mode, pib creates up to 4 InfiniBand HCA (The default is 2). 18 | These IB devices are pib_0, pib_1, pib_2 and pib_3. 19 | Each HCA contains up to 32 ports (The default is 2). 20 | 21 | In addition, pib creates one internal InfiniBand switch too. 22 | All ports of pib's HCA are connected to this switch. 23 | 24 | The current version of pib enables to drive the following interface: 25 | 26 | * kernel-level Verbs (in-linux kernel) 27 | * kernel-level MAD (in-linux kernel) 28 | * uVerbs (libibverbs) 29 | * uMAD (libibmad & libibumad) 30 | * Subnet Manager (opensm) 31 | * IPoIB (in-linux kernel) 32 | * RDMA Connection Manager (librdmacm) 33 | * IB diagnostic utilities (infiniband-diags) 34 | 35 | Debugging support features: 36 | 37 | * Inspect IB objects (ucontext, PD, MR, SRQ, CQ, AH, QP) 38 | * Trace API invocations, packet sending/receiving, async events/errors 39 | * Inject a specified error (QP/CQ/SRQ Error) 40 | * Select some implementation dependent behaviour and enforce error checking. 41 | * Show a warning of pitfalls that IB programs should avoid. 42 | 43 | Other features: 44 | 45 | * The maximum size of inline data is 2048 bytes. 46 | 47 | Limitation 48 | ========== 49 | 50 | The current version is EXPERIMENTAL. 51 | 52 | The following features are not supported: 53 | 54 | - Unreliable Connected (UC) 55 | - Fast Memory Region (FMR) 56 | - Memory Windows (MW) 57 | - SEND Invalidate operation 58 | - Virtual Lane (VL) 59 | - Flow control 60 | 61 | Supported OS 62 | ============ 63 | 64 | pib supports the following Linux: 65 | 66 | * Red Hat Enterprise Linux 6.x 67 | * CentOS 6.x 68 | 69 | pib conflicts with Mellanox OFED. 70 | Mustn't install an environment to deploy Mellanox OFED. 71 | 72 | Preparation 73 | =========== 74 | 75 | The following software packages are required for building pib: 76 | 77 | * rdma 78 | * libibverbs 79 | * kernel-devel 80 | * opensm 81 | * opensm-libs 82 | 83 | The following packages are recommended: 84 | 85 | * libibverbs-devel (for developing Verbs API programs) 86 | * libibverbs-utils 87 | * librdmacm 88 | * librdmacm-utils 89 | * librdmacm-devel (for developing RDMA API programs) 90 | * infiniband-diags (IB diagnostic tools) 91 | 92 | Building 93 | ======== 94 | 95 | First, acquire the source code by cloning the git repository. 96 | 97 | $ git clone https://github.com/nminoru/pib.git 98 | 99 | pib.ko 100 | ------ 101 | 102 | If you want to compile the pib.ko kernel module from source code, input the following commands. 103 | 104 | $ cd pib/driver/ 105 | $ make 106 | # make modules_install 107 | 108 | If you want to create binary RPM file, input the following commands. 109 | 110 | First, create libpib's source RPM from source code. 111 | 112 | $ cp -r pib/driver pib-0.4.5 113 | $ tar czvf $(HOME)/rpmbuild/SOURCES/pib-0.4.5.tar.gz pib-0.4.5/ 114 | $ cp pib/driver/pib.conf $(HOME)/rpmbuild/SOURCES/ 115 | $ cp pib/driver/pib.files $(HOME)/rpmbuild/SOURCES/ 116 | $ rpmbuild -bs pib/driver/pib.spec 117 | 118 | Next, build the binary RPM from the source RPM. 119 | 120 | $ rpmbuild --rebuild $(HOME)/rpmbuild/SRPMS/pib-0.4.5-1.el6.src.rpm 121 | 122 | Finally, install the built binary RPM. 123 | 124 | # rpm -ihv $(HOME)/rpmbuild/RPMS/x86_64/kmod-pib-0.4.5-1.el6.x86_64.rpm 125 | 126 | libpib 127 | ------ 128 | 129 | The libpib userspace plug-in module will be installed from the binary RPM. 130 | 131 | $ cp -r pib/libpib libpib-0.0.6 132 | $ tar czvf $(HOME)/rpmbuild/SOURCES/libpib-0.0.6.tar.gz libpib-0.0.6/ 133 | $ rpmbuild -bs pib/libpib/libpib.spec 134 | 135 | $ rpmbuild --rebuild $(HOME)/rpmbuild/SRPMS/libpib-0.0.6-1.el6.src.rpm 136 | 137 | # rpm -ihv $(HOME)/rpmbuild/RPMS/x86_64/libpib-0.0.6-1.el6.x86_64.rpm 138 | 139 | pibnetd 140 | ------- 141 | 142 | If you want to compile the pibnetd daemon from source code, input the following commands. 143 | 144 | $ cd pib/pibnet/ 145 | $ make 146 | # install -m 755 -D pibnetd /usr/sbin/pibnetd 147 | # install -m 755 -D scripts/redhat-pibnetd.init /etc/rc.d/init.d/pibnetd 148 | 149 | If you want to create binary RPM file, input the following commands. 150 | 151 | $ cp -r pib/pibnetd pibnetd-0.4.1 152 | $ tar czvf $(HOME)/rpmbuild/SOURCES/pibnetd-0.4.1.tar.gz pibnetd-0.4.1/ 153 | $ rpmbuild -bs pib/pibnetd/pibnetd.spec 154 | 155 | $ rpmbuild --rebuild $(HOME)/rpmbuild/SRPMS/pibnetd-0.4.1-1.el6.src.rpm 156 | 157 | # rpm -ihv $(HOME)/rpmbuild/RPMS/x86_64/pibnetd-0.4.1-1.el6.x86_64.rpm 158 | 159 | Download 160 | -------- 161 | 162 | You can get source and binary RPMs for RHEL6 or CentOS6 on this link http://www.nminoru.jp/~nminoru/network/infiniband/src/ 163 | 164 | Loading (single-host-mode) 165 | ========================== 166 | 167 | First, load some modules which pib.ko is dependent on. 168 | 169 | # /etc/rc.d/init.d/rdma start 170 | 171 | Next, load pib.ko. 172 | 173 | # modprobe pib 174 | 175 | Finally, run opensm 176 | 177 | # /etc/rc.d/init.d/opensm start 178 | 179 | pib.ko options 180 | -------------- 181 | 182 | * debug_level 183 | * num_hca 184 | * phys_port_cnt 185 | * behavior 186 | * manner_warn 187 | * manner_err 188 | * addr 189 | 190 | Loading (multi-host-mode) 191 | ========================= 192 | 193 | In multi-host-mode mode, pib enables to connect up to 32 hosts (To be precise, up to 32 ports). 194 | 195 | Host A Host X Host B 196 | (10.0.0.1) (10.0.0.2) (10.0.0.3) 197 | +----------+ +-----------+ +----------+ 198 | | +------+ | | +-------+ | | +------+ | 199 | | |pib.ko| |-----| |pibnetd| |-----| |pib.ko| | 200 | | +------+ | | +-------+ | | +------+ | 201 | | | | | | +------+ | 202 | | | | | | |opensm| | 203 | | | | | | +------+ | 204 | +----------+ +-----------+ +----------+ 205 | 206 | First, run pibnetd on a host. 207 | 208 | # /etc/rc.d/init.d/pibnetd start 209 | 210 | Next, load pib.ko by running modprobe command with the _addr_ parameter specified by the pibnetd's IP address. 211 | 212 | # /etc/rc.d/init.d/rdma start 213 | # modprobe pib addr=10.0.0.2 214 | 215 | On th default parameters, pib creates 2 IB devices of 2 ports. 216 | You had better limit 1 IB device of 1 port by specifying the _num_hca_ and _phys_port_cnt_ parameters in multi-host-mode. 217 | 218 | # modprobe pib addr=10.0.0.2 num_hca=1 phys_port_cnt=1 219 | 220 | Finally, run opensm on one of hosts that load pib.ko. 221 | 222 | # /etc/rc.d/init.d/opensm start 223 | 224 | Running 225 | ======= 226 | 227 | For instance, ibv_devinfo (includes libibverbs-utils package) show such an result. 228 | 229 | $ ibv_devinfo 230 | hca_id: pib_0 231 | transport: InfiniBand (0) 232 | fw_ver: 0.2.000 233 | node_guid: 000c:2925:551e:0400 234 | sys_image_guid: 000c:2925:551e:0200 235 | vendor_id: 0x0001 236 | vendor_part_id: 1 237 | hw_ver: 0x0 238 | phys_port_cnt: 2 239 | 240 | Performance counter 241 | ------------------- 242 | 243 | # perfquery 244 | 245 | Debugging support 246 | ================= 247 | 248 | pib provides some debugging functions via debugfs to help developing IB programs. 249 | 250 | First ensure that debugfs is mounted. 251 | 252 | # mount -t debugfs none /sys/kernel/debug 253 | 254 | A list of available debugging functions can be found in /sys/kernel/debug/pib/pib_X/. 255 | 256 | See detailed information on DEBUGFS.md. 257 | 258 | 259 | FAQ 260 | === 261 | 262 | Failed to call ibv_reg_mr() with more than 64 KB 263 | ------------------------------------------------ 264 | 265 | pib permits an unprivileged program to use InfiniBand userspace verbs. 266 | However Linux operating system limits the maximum memory size that an unprivileged process may lock via mlock() and ibv_reg_mr() calls mlock() internally. 267 | This default max-locked-memory is only 64 K bytes. 268 | 269 | To avoid this trouble, run your program under privileged mode or increase max-locked-memory limit for unprivileged user. 270 | 271 | If you choose the latter, add the following two lines in the file /etc/security/limits.conf and then reboot. 272 | 273 | * soft memlock unlimited 274 | * hard memlock unlimited 275 | 276 | Or you can also set it temporarily to do `ulimit -l unlimited`. 277 | 278 | Future work 279 | =========== 280 | 281 | IB functions 282 | ------------ 283 | 284 | * Fast Memory Registration(FMR) 285 | * Peer-Direct 286 | * Alternate path 287 | * Unreliable Connection(UC) 288 | * Extended Reliable Connected (XRC) 289 | * Memory Window 290 | 291 | Debugging support 292 | ----------------- 293 | 294 | * Packet filtering 295 | 296 | Software components 297 | ------------------- 298 | 299 | * MPI 300 | * User Direct Access Programming Library (uDAPL) 301 | * iSCSI Extensions for RDMA (iSER) 302 | * SCSI RDMA Protocol (SRP) 303 | 304 | Other 305 | ----- 306 | 307 | * Systemd init script support 308 | * Other Linux distributions support 309 | * Kernel update package 310 | * IPv6 support 311 | * Translate Japanese into English in comments of source codes :-) 312 | 313 | Contact 314 | ======= 315 | 316 | [https://twitter.com/nminoru_jp](https://twitter.com/nminoru_jp) 317 | 318 | 319 | 320 | License 321 | ======= 322 | 323 | GPL version 2 or BSD license 324 | -------------------------------------------------------------------------------- /driver/.gitignore: -------------------------------------------------------------------------------- 1 | *.o 2 | *.o.cmd 3 | *.ko 4 | *.ko.* 5 | *.mod.c 6 | .tmp_versions 7 | Module.symvers 8 | modules.order 9 | -------------------------------------------------------------------------------- /driver/01goal.txt: -------------------------------------------------------------------------------- 1 | TODO 2 | 3 | * Behavior 4 | 5 | - Delayed WC 6 | - RC 通信で意図しない相手からの QPN をチェック? 7 | - QP に WR を登録できないステートでも、mlx は登録できる。 8 | 実際に送受信可能になるのは RTR / RTS 以降。 9 | - WQE も CQE も設定以上に存在数することがある。 10 | 11 | * Manner check 12 | 13 | - RESET 時に ibv_post_send または ibv_post_recv を実行した。 14 | - RC 通信で dest_qp 以外の QP からパケットを受信した。 15 | - ibv_modify_qp で RESET へ遷移する際に WQE/CQE が失われた。 16 | - ユーザからの UD 通信で WR の q_key の最上位ビットが指定された。 17 | - QP が RESET、INIT 時の送信 18 | - QP が RESET 時の受信 19 | - QP Reset、CQ error 時などに WQE や CQE が WC 失われていないことのチェック 20 | - CQ に CQE が残っている状態で ibv_req_notify を入れた場合 21 | 22 | * Function 23 | 24 | 25 | 26 | 27 | 28 | -------------------------------------------------------------------------------- /driver/03design.txt: -------------------------------------------------------------------------------- 1 | * Lock priority 2 | 3 | dev(lock) -> qp -> dev(schedule) -> srq -> pd -> cq -> dev(wq) 4 | 5 | mad.c の中で ib_post_send の外側で spin_lock_irqsave している。 6 | セマフォではなく spin_lock を使う必要がある。 7 | 8 | spin_lock_irqsave(&qp_info->send_queue.lock, flags); 9 | if (qp_info->send_queue.count < qp_info->send_queue.max_active) { 10 | ret = ib_post_send(mad_agent->qp, &mad_send_wr->send_wr, 11 | &bad_send_wr); 12 | list = &qp_info->send_queue.list; 13 | } else { 14 | ret = 0; 15 | list = &qp_info->overflow_list; 16 | } 17 | 18 | if (!ret) { 19 | qp_info->send_queue.count++; 20 | list_add_tail(&mad_send_wr->mad_list.list, list); 21 | } 22 | spin_unlock_irqrestore(&qp_info->send_queue.lock, flags); 23 | 24 | * HCA & switch エミュレーション 25 | 26 | pib の HCA エミュレーションは IB の定義するパケットを UDP パケットとして送受信する。 27 | # CRC がない。Credit-based flow control のためのフィールドや MSN もない。 28 | 29 | pib のエミュレーションしている HCA 毎に 1 つのスレッドと 1 つの UDP ソケットを作成する。 30 | 31 | ・DR SMP 仕様通りの実装。 32 | ・Unicast LID が分かれれば直接送信する。 33 | ・Multicast easy switch に投げてフォワーディングする。 34 | 35 | 実機の IB は HCA が接続された物理的なリンクを辿って送信を行うが、pib では通常のユニ 36 | キャスト通信は easy switch をバイパスして、直接通信相手のポートのソケットへ送信する。 37 | 38 | ただしマルチキャストと DR SMP MAD は、pibnetd 経由で行う。 39 | 40 | * MR 41 | 42 | MR の L_Key と R_Key は PD に設けられた mr_table[] のインデックス値とする。 43 | ただし配列のインデックスとして使うのは下位ビットで、上位ビットは PD ごと 44 | に設定する乱数値とする。 45 | 46 | * GID 47 | 48 | IB は 64-bit のユニークアドレス (EUI-64) が必要。 49 | 50 | マルチホストモードでは、複数のホスト間で衝突しない値をつける必要がある。 51 | 52 | このためにホスト内で見つかった最初のイーサーネット機器の MAC アドレス (EUI-48) を借用し、 53 | これを上位につめた 64 ビットを合成する。 54 | 55 | 0x0000 Not use 56 | 0x0100 Easy switch (for single-host-mode) 57 | 0x0101~0x01FF pibnetd (for multi-host-mode) 58 | 0x0200 SystemImageGUID 59 | 0xyyzz NodeGUID or PortGUID 60 | - yy は 3 から始まる HCA 番号 61 | - zz は 0 なら NodeGUID で、1 以上は PortGUID 62 | 63 | * Queue 64 | 65 | 送信処理は Send Queue は submitted(未送信)、sending(送信中)、waiting(ACK待ち) の 3 つのキューで処理する。 66 | 67 | - ibv_post_send は submitted queue につなげる。 68 | 69 | - QP が RESET または INIT 以外なら可能。 70 | 71 | - バックグラウンド処理が submitted queue から Send WR を取り出して、sending queue につなげる。 72 | これは QP が RTS の場合だけ実行する。 73 | 74 | - 先行する RMDA READ、Atomic 操作がある場合、SEND_FENCE がついている sending queue にはつなげない。 75 | - 同時実行中の RMDA READ、Atomic 操作 が上限を越えると sending queue にはつなげない。 76 | 77 | - Sending queue の先頭からパケットを送信する。 78 | これは QP が RTS または SQD の場合だけ実行する。 79 | 80 | - 1回分の送信が完了すれば sending キューのエントリは waiting キューに送る。 81 | - ACK/NAK がないまま連続送信が続いた場合、waiting キューに移す。 82 | 83 | - QP が Send Drain(SQD) 属性になった場合は、submitted キューにあるのものは sending queue に移さない。 84 | 85 | - 送信が失敗(NAK受信、再送)になると wainting キューにある全エントリは sending キューに戻した上で、 86 | 全ての未 ACK パケット分をクリアする。 87 | 88 | - ACK は sending キューと waiting キューにある場合の二通りが考えられる。 89 | 90 | * Congestion controls 91 | 92 | ・Requester は一定数以上のリクエストパケットを送信した後は、ACK が返るまで送信を停止する。 93 | 94 | ・RDMA READ、Atomic 操作に関しては、いったんリトライがかかると retry_cnt を内部的に最小値にする 95 | という輻輳制御が入っている。以降の RDMA READ、Atomic 操作が成功される度に retry_cnt は +1 され 96 | る。 97 | 98 | ・RDMA READ はもう一つ工夫が必要。 99 | -------------------------------------------------------------------------------- /driver/Makefile: -------------------------------------------------------------------------------- 1 | ifeq ($(KERNELRELEASE),) 2 | 3 | KVERSION ?= $(shell uname -r) 4 | 5 | BUILD_DIR ?= /lib/modules/${KVERSION}/build 6 | 7 | PWD := $(shell pwd) 8 | 9 | modules: 10 | $(MAKE) -C $(BUILD_DIR) M=$(PWD) modules 11 | 12 | modules_install: 13 | $(MAKE) -C $(BUILD_DIR) M=$(PWD) modules_install 14 | 15 | clean: 16 | rm -rf *~ *.o .*.cmd *.mod.c *.ko *.ko.unsigned .depend \ 17 | .tmp_versions modules.order Module.symvers Module.markers 18 | 19 | .PHONY: modules modules_install clean 20 | 21 | else 22 | 23 | # Called from kernel build system -- just declare the module(s). 24 | 25 | obj-m := pib.o 26 | pib-y := pib_main.o pib_dma.o pib_lib.o \ 27 | pib_ucontext.o pib_pd.o pib_qp.o pib_multicast.o pib_cq.o pib_srq.o pib_ah.o pib_mr.o \ 28 | pib_mad.o pib_mad_pma.o pib_easy_sw.o \ 29 | pib_thread.o pib_ud.o pib_rc.o \ 30 | pib_debugfs.o 31 | 32 | endif 33 | -------------------------------------------------------------------------------- /driver/TODO.japanese: -------------------------------------------------------------------------------- 1 | TODO 2 | 3 | - Unify WQ scheduler and QP scheduler 4 | - Discard lid_table[] table in single-host-mode 5 | - Investigate AH objects leak by ib_ipoib.ko 6 | 7 | - ICRC & VCRC, PktLen conform to the IBA specification 8 | - speed up unloading of pib.ko 9 | 10 | - Message Sequence Number(MSN) 11 | - LID Mask Control (LMC) 12 | - LOCAL_DMA_LKEY 13 | 14 | - redesign RNR timer. 15 | 16 | - redesing recevie_acknowledge 17 | 18 | - Asynchronous events/errors 19 | - QP event 20 | - IBV_EVENT_PATH_MIG 21 | - IBV_EVENT_PATH_MIG_ERR 22 | - Port events 23 | - IBV_EVENT_GID_CHANGE 24 | - CA event 25 | - IBV_EVENT_DEVICE_FATAL 26 | 27 | - Add verification of protection domain(QP、SRQ、MR、AH) 28 | - QP and SRQ 29 | verify in ibv_post_send(). 30 | - UD-QP and AH 31 | verify in ibv_post_send() 32 | -------------------------------------------------------------------------------- /driver/load_ib_modules.sh: -------------------------------------------------------------------------------- 1 | #! /bin/sh 2 | 3 | modprobe ib_core 4 | modprobe ib_uverbs 5 | modprobe ib_addr 6 | modprobe ib_umad 7 | modprobe ib_cm 8 | modprobe ib_mad 9 | # modprobe ib_ipoib 10 | modprobe ib_sa 11 | modprobe iw_cm 12 | modprobe ib_ucm 13 | modprobe rdma_ucm 14 | modprobe rdma_cm 15 | -------------------------------------------------------------------------------- /driver/pib.conf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/nminoru/pib/f2f508c685b28d51398c00fbe6407528fd8c5055/driver/pib.conf -------------------------------------------------------------------------------- /driver/pib.files: -------------------------------------------------------------------------------- 1 | %defattr(644,root,root,755) 2 | /lib/modules/%2-%1 3 | /etc/depmod.d/pib.conf 4 | -------------------------------------------------------------------------------- /driver/pib.spec: -------------------------------------------------------------------------------- 1 | Name: pib 2 | Version: 0.4.6 3 | Release: 1%{?dist} 4 | Summary: Pseudo InfiniBand (pib) HCA Kernel Driver 5 | Group: System/Kernel 6 | License: GPLv2 or BSD 7 | Url: http://www.nminoru.jp/ 8 | Source0: %{name}-%{version}.tar.gz 9 | Source1: %{name}.files 10 | Source2: %{name}.conf 11 | BuildRoot: %(mktemp -ud %{_tmppath}/%{name}-%{version}-%{release}-XXXXXX) 12 | BuildRequires: %kernel_module_package_buildreqs 13 | BuildArch: i686 x86_64 14 | 15 | %kernel_module_package -f %{SOURCE1} default 16 | 17 | %description 18 | Pseudo InfiniBand (pib) HCA Kernel Driver 19 | 20 | %prep 21 | 22 | %setup -q 23 | set -- * 24 | mkdir source 25 | mv "$@" source/ 26 | mkdir obj 27 | 28 | %build 29 | for flavor in %flavors_to_build; do 30 | rm -rf obj/$flavor 31 | cp -r source obj/$flavor 32 | make -C %{kernel_source $flavor} M=$PWD/obj/$flavor 33 | done 34 | 35 | %install 36 | export INSTALL_MOD_PATH=$RPM_BUILD_ROOT 37 | export INSTALL_MOD_DIR=extra/%{name} 38 | for flavor in %flavors_to_build; do 39 | make -C %{kernel_source $flavor} modules_install M=$PWD/obj/$flavor 40 | done 41 | 42 | install -m 644 -D %{SOURCE2} $RPM_BUILD_ROOT/etc/depmod.d/%{name}.conf 43 | 44 | %clean 45 | rm -rf $RPM_BUILD_ROOT 46 | 47 | %changelog 48 | * Thu Feb 12 2015 Minoru NAKAMURA - 0.4.6 49 | - Add SEND with Invaildate, Local Invalidate and Fast Register Physical MR operations 50 | 51 | * Tue Nov 06 2014 Minoru NAKAMURA - 0.4.5 52 | - Add i686 support 53 | 54 | * Tue Oct 30 2014 Minoru NAKAMURA - 0.4.4 55 | - Update codes for kernel 3.17 56 | 57 | * Tue Jul 09 2014 Minoru NAKAMURA - 0.4.3 58 | - Update codes for kernel 3.15 59 | 60 | * Tue May 27 2014 Minoru NAKAMURA - 0.4.2 61 | - Fix bug that the incorrect reinitialization of dev->thread.completion causes deadlock. 62 | 63 | * Sat May 03 2014 Minoru NAKAMURA - 0.4.1 64 | - Initial spec file 65 | -------------------------------------------------------------------------------- /driver/pib_ah.c: -------------------------------------------------------------------------------- 1 | /* 2 | * pib_ah.c - Address Handle(AH) functions 3 | * 4 | * Copyright (c) 2013,2014 Minoru NAKAMURA 5 | * 6 | * This code is licenced under the GPL version 2 or BSD license. 7 | */ 8 | #include 9 | #include 10 | 11 | #include "pib.h" 12 | #include "pib_trace.h" 13 | 14 | 15 | struct ib_ah * 16 | pib_create_ah(struct ib_pd *ibpd, struct ib_ah_attr *ah_attr) 17 | { 18 | struct pib_dev *dev; 19 | struct pib_ah *ah; 20 | unsigned long flags; 21 | u32 ah_num; 22 | 23 | if (!ah_attr) 24 | return ERR_PTR(-EINVAL); 25 | 26 | dev = to_pdev(ibpd->device); 27 | 28 | ah = kmem_cache_zalloc(pib_ah_cachep, GFP_KERNEL); 29 | if (!ah) 30 | return ERR_PTR(-ENOMEM); 31 | 32 | INIT_LIST_HEAD(&ah->list); 33 | getnstimeofday(&ah->creation_time); 34 | 35 | spin_lock_irqsave(&dev->lock, flags); 36 | ah_num = pib_alloc_obj_num(dev, PIB_BITMAP_AH_START, PIB_MAX_AH, &dev->last_ah_num); 37 | if (ah_num == (u32)-1) { 38 | spin_unlock_irqrestore(&dev->lock, flags); 39 | goto err_alloc_ah_num; 40 | } 41 | dev->nr_ah++; 42 | list_add_tail(&ah->list, &dev->ah_head); 43 | ah->ah_num = ah_num; 44 | spin_unlock_irqrestore(&dev->lock, flags); 45 | 46 | ah->ib_ah_attr = *ah_attr; 47 | 48 | pib_trace_api(dev, IB_USER_VERBS_CMD_CREATE_AH, ah_num); 49 | 50 | return &ah->ib_ah; 51 | 52 | err_alloc_ah_num: 53 | kmem_cache_free(pib_ah_cachep, ah); 54 | 55 | return ERR_PTR(-ENOMEM); 56 | } 57 | 58 | 59 | int pib_modify_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr) 60 | { 61 | struct pib_dev *dev; 62 | struct pib_ah *ah; 63 | 64 | if (!ibah || !ah_attr) 65 | return -EINVAL; 66 | 67 | dev = to_pdev(ibah->device); 68 | ah = to_pah(ibah); 69 | 70 | pib_trace_api(dev, IB_USER_VERBS_CMD_MODIFY_AH, ah->ah_num); 71 | 72 | ah->ib_ah_attr = *ah_attr; 73 | 74 | return 0; 75 | } 76 | 77 | 78 | int pib_query_ah(struct ib_ah *ibah, struct ib_ah_attr *ah_attr) 79 | { 80 | struct pib_dev *dev; 81 | struct pib_ah *ah; 82 | 83 | if (!ibah || !ah_attr) 84 | return -EINVAL; 85 | 86 | dev = to_pdev(ibah->device); 87 | ah = to_pah(ibah); 88 | 89 | pib_trace_api(dev, IB_USER_VERBS_CMD_QUERY_AH, ah->ah_num); 90 | 91 | *ah_attr = ah->ib_ah_attr; 92 | 93 | return 0; 94 | } 95 | 96 | 97 | int pib_destroy_ah(struct ib_ah *ibah) 98 | { 99 | struct pib_dev *dev; 100 | struct pib_ah *ah; 101 | unsigned long flags; 102 | 103 | if (!ibah) 104 | return 0; 105 | 106 | dev = to_pdev(ibah->device); 107 | ah = to_pah(ibah); 108 | 109 | pib_trace_api(dev, IB_USER_VERBS_CMD_DESTROY_AH, ah->ah_num); 110 | 111 | spin_lock_irqsave(&dev->lock, flags); 112 | list_del(&ah->list); 113 | dev->nr_ah--; 114 | pib_dealloc_obj_num(dev, PIB_BITMAP_AH_START, ah->ah_num); 115 | spin_unlock_irqrestore(&dev->lock, flags); 116 | 117 | kmem_cache_free(pib_ah_cachep, ah); 118 | 119 | return 0; 120 | } 121 | 122 | -------------------------------------------------------------------------------- /driver/pib_cq.c: -------------------------------------------------------------------------------- 1 | /* 2 | * pib_cq.c - Completion Queue(CQ) functions 3 | * 4 | * Copyright (c) 2013-2015 Minoru NAKAMURA 5 | * 6 | * This code is licenced under the GPL version 2 or BSD license. 7 | */ 8 | #include 9 | #include 10 | 11 | #include "pib.h" 12 | #include "pib_spinlock.h" 13 | #include "pib_trace.h" 14 | 15 | 16 | static int insert_wc(struct pib_cq *cq, const struct ib_wc *wc, int solicited); 17 | static void cq_overflow_handler(struct pib_work_struct *work); 18 | 19 | 20 | static struct ib_cq * 21 | create_cq(struct ib_device *ibdev, int entries, int vector, int dummy, 22 | struct ib_ucontext *context, 23 | struct ib_udata *udata) 24 | { 25 | int i; 26 | struct pib_dev *dev; 27 | struct pib_cq *cq; 28 | struct pib_cqe *cqe, *cqe_next; 29 | unsigned long flags; 30 | u32 cq_num; 31 | 32 | if (!ibdev) 33 | return ERR_PTR(-EINVAL); 34 | 35 | dev = to_pdev(ibdev); 36 | 37 | if (entries < 1 || dev->ib_dev_attr.max_cqe <= entries) 38 | return ERR_PTR(-EINVAL); 39 | 40 | if (dev->ib_dev_attr.max_cq <= dev->nr_cq) 41 | return ERR_PTR(-ENOMEM); 42 | 43 | cq = kmem_cache_zalloc(pib_cq_cachep, GFP_KERNEL); 44 | if (!cq) 45 | return ERR_PTR(-ENOMEM); 46 | 47 | INIT_LIST_HEAD(&cq->list); 48 | getnstimeofday(&cq->creation_time); 49 | 50 | spin_lock_irqsave(&dev->lock, flags); 51 | cq_num = pib_alloc_obj_num(dev, PIB_BITMAP_CQ_START, PIB_MAX_CQ, &dev->last_cq_num); 52 | if (cq_num == (u32)-1) { 53 | spin_unlock_irqrestore(&dev->lock, flags); 54 | goto err_alloc_cq_num; 55 | } 56 | dev->nr_cq++; 57 | list_add_tail(&cq->list, &dev->cq_head); 58 | cq->cq_num = cq_num; 59 | spin_unlock_irqrestore(&dev->lock, flags); 60 | 61 | cq->state = PIB_STATE_OK; 62 | cq->notify_flag = 0; 63 | cq->has_notified= 1; /* assume CQ has been notified when initial */ 64 | 65 | cq->ib_cq.cqe = entries; 66 | cq->nr_cqe = 0; 67 | 68 | pib_spin_lock_init(&cq->lock); 69 | 70 | INIT_LIST_HEAD(&cq->cqe_head); 71 | INIT_LIST_HEAD(&cq->free_cqe_head); 72 | PIB_INIT_WORK(&cq->work, dev, cq, cq_overflow_handler); 73 | 74 | /* allocate CQE internally */ 75 | 76 | for (i=0 ; ilist); 84 | list_add_tail(&cqe->list, &cq->free_cqe_head); 85 | } 86 | 87 | pib_trace_api(dev, IB_USER_VERBS_CMD_CREATE_CQ, cq_num); 88 | 89 | return &cq->ib_cq; 90 | 91 | err_allloc_ceq: 92 | list_for_each_entry_safe(cqe, cqe_next, &cq->free_cqe_head, list) { 93 | list_del_init(&cqe->list); 94 | kmem_cache_free(pib_cqe_cachep, cqe); 95 | } 96 | 97 | spin_lock_irqsave(&dev->lock, flags); 98 | list_del(&cq->list); 99 | dev->nr_cq--; 100 | pib_dealloc_obj_num(dev, PIB_BITMAP_CQ_START, cq_num); 101 | spin_unlock_irqrestore(&dev->lock, flags); 102 | 103 | err_alloc_cq_num: 104 | kmem_cache_free(pib_cq_cachep, cq); 105 | 106 | return ERR_PTR(-ENOMEM); 107 | } 108 | 109 | 110 | #ifdef PIB_CQ_FLAGS_TIMESTAMP_COMPLETION_SUPPORT 111 | struct ib_cq *pib_create_cq(struct ib_device *ibdev, 112 | const struct ib_cq_init_attr *attr, 113 | struct ib_ucontext *context, 114 | struct ib_udata *udata) 115 | { 116 | return create_cq(ibdev, attr->cqe, attr->comp_vector, attr->flags, context, udata); 117 | } 118 | #else 119 | struct ib_cq *pib_create_cq(struct ib_device *ibdev, int entries, int vector, 120 | struct ib_ucontext *context, 121 | struct ib_udata *udata) 122 | { 123 | return create_cq(ibdev, entries, vector, 0, context, udata); 124 | } 125 | #endif 126 | 127 | 128 | int pib_destroy_cq(struct ib_cq *ibcq) 129 | { 130 | struct pib_dev *dev; 131 | struct pib_cq *cq; 132 | struct pib_cqe *cqe, *cqe_next; 133 | unsigned long flags; 134 | 135 | if (!ibcq) 136 | return 0; 137 | 138 | dev = to_pdev(ibcq->device); 139 | cq = to_pcq(ibcq); 140 | 141 | pib_trace_api(dev, IB_USER_VERBS_CMD_DESTROY_CQ, cq->cq_num); 142 | 143 | pib_spin_lock_irqsave(&cq->lock, flags); 144 | list_for_each_entry_safe(cqe, cqe_next, &cq->cqe_head, list) { 145 | list_del_init(&cqe->list); 146 | kmem_cache_free(pib_cqe_cachep, cqe); 147 | } 148 | list_for_each_entry_safe(cqe, cqe_next, &cq->free_cqe_head, list) { 149 | list_del_init(&cqe->list); 150 | kmem_cache_free(pib_cqe_cachep, cqe); 151 | } 152 | cq->nr_cqe = 0; 153 | pib_spin_unlock_irqrestore(&cq->lock, flags); 154 | 155 | spin_lock_irqsave(&dev->lock, flags); 156 | list_del(&cq->list); 157 | dev->nr_cq--; 158 | pib_dealloc_obj_num(dev, PIB_BITMAP_CQ_START, cq->cq_num); 159 | 160 | pib_cancel_work(dev, &cq->work); 161 | spin_unlock_irqrestore(&dev->lock, flags); 162 | 163 | kmem_cache_free(pib_cq_cachep, cq); 164 | 165 | return 0; 166 | } 167 | 168 | 169 | int pib_modify_cq(struct ib_cq *ibcq, u16 cq_count, u16 cq_period) 170 | { 171 | struct pib_dev *dev; 172 | struct pib_cq *cq; 173 | 174 | pr_err("pib: pib_modify_cq\n"); 175 | 176 | if (!ibcq) 177 | return -EINVAL; 178 | 179 | dev = to_pdev(ibcq->device); 180 | cq = to_pcq(ibcq); 181 | 182 | pib_trace_api(dev, PIB_USER_VERBS_CMD_MODIFY_CQ, cq->cq_num); 183 | 184 | return 0; 185 | } 186 | 187 | 188 | int pib_resize_cq(struct ib_cq *ibcq, int entries, struct ib_udata *udata) 189 | { 190 | struct pib_dev *dev; 191 | struct pib_cq *cq; 192 | 193 | pr_err("pib: pib_resize_cq\n"); 194 | 195 | if (!ibcq) 196 | return -EINVAL; 197 | 198 | dev = to_pdev(ibcq->device); 199 | cq = to_pcq(ibcq); 200 | 201 | pib_trace_api(dev, IB_USER_VERBS_CMD_RESIZE_CQ, cq->cq_num); 202 | 203 | return 0; 204 | } 205 | 206 | 207 | int pib_poll_cq(struct ib_cq *ibcq, int num_entries, struct ib_wc *ibwc) 208 | { 209 | int i, ret = 0; 210 | struct pib_dev *dev; 211 | struct pib_cq *cq; 212 | unsigned long flags; 213 | 214 | if (!ibcq) 215 | return -EINVAL; 216 | 217 | dev = to_pdev(ibcq->device); 218 | cq = to_pcq(ibcq); 219 | 220 | pib_trace_api(dev, IB_USER_VERBS_CMD_POLL_CQ, cq->cq_num); 221 | 222 | pib_spin_lock_irqsave(&cq->lock, flags); 223 | 224 | if (cq->state != PIB_STATE_OK) { 225 | ret = -EACCES; 226 | goto done; 227 | } 228 | 229 | for (i=0 ; (icqe_head) ; i++) { 230 | struct pib_cqe *cqe; 231 | 232 | cqe = list_first_entry(&cq->cqe_head, struct pib_cqe, list); 233 | list_del_init(&cqe->list); 234 | list_add_tail(&cqe->list, &cq->free_cqe_head); 235 | 236 | ibwc[i] = cqe->ib_wc; 237 | 238 | cq->nr_cqe--; 239 | ret++; 240 | } 241 | 242 | done: 243 | pib_spin_unlock_irqrestore(&cq->lock, flags); 244 | 245 | return ret; 246 | } 247 | 248 | 249 | int pib_req_notify_cq(struct ib_cq *ibcq, enum ib_cq_notify_flags notify_flags) 250 | { 251 | struct pib_dev *dev; 252 | struct pib_cq *cq; 253 | unsigned long flags; 254 | int ret = 0; 255 | 256 | if (!ibcq) 257 | return -EINVAL; 258 | 259 | dev = to_pdev(ibcq->device); 260 | cq = to_pcq(ibcq); 261 | 262 | pib_trace_api(dev, IB_USER_VERBS_CMD_REQ_NOTIFY_CQ, cq->cq_num); 263 | 264 | pib_spin_lock_irqsave(&cq->lock, flags); 265 | 266 | if (cq->state != PIB_STATE_OK) 267 | ret = -1; 268 | else { 269 | if (notify_flags & IB_CQ_SOLICITED) 270 | cq->notify_flag = IB_CQ_SOLICITED; 271 | else if (notify_flags & IB_CQ_NEXT_COMP) 272 | cq->notify_flag = IB_CQ_NEXT_COMP; 273 | 274 | if ((notify_flags & IB_CQ_REPORT_MISSED_EVENTS) && 275 | !list_empty(&cq->cqe_head)) 276 | ret = 1; 277 | 278 | /* @note CQE が溜まっている時に req_notify_cq を呼び出したらどうなるかは実装依存 */ 279 | cq->has_notified = 0; 280 | } 281 | 282 | pib_spin_unlock_irqrestore(&cq->lock, flags); 283 | 284 | return ret; 285 | } 286 | 287 | 288 | /** 289 | * QP を RESET に変更した場合に該当する CQ から外す 290 | * @return 削除した WC 数 291 | */ 292 | int pib_util_remove_cq(struct pib_cq *cq, struct pib_qp *qp) 293 | { 294 | int count = 0; 295 | unsigned long flags; 296 | struct pib_cqe *cqe, *cqe_next; 297 | 298 | BUG_ON(qp == NULL); 299 | 300 | pib_spin_lock_irqsave(&cq->lock, flags); 301 | list_for_each_entry_safe(cqe, cqe_next, &cq->cqe_head, list) { 302 | if (cqe->ib_wc.qp == &qp->ib_qp) { 303 | cq->nr_cqe--; 304 | list_del_init(&cqe->list); 305 | list_add_tail(&cqe->list, &cq->free_cqe_head); 306 | count++; 307 | } 308 | } 309 | pib_spin_unlock_irqrestore(&cq->lock, flags); 310 | 311 | return count; 312 | } 313 | 314 | 315 | int pib_util_insert_wc_success(struct pib_cq *cq, const struct ib_wc *wc, int solicited) 316 | { 317 | return insert_wc(cq, wc, solicited); 318 | } 319 | 320 | 321 | int pib_util_insert_wc_error(struct pib_cq *cq, struct pib_qp *qp, u64 wr_id, enum ib_wc_status status, enum ib_wc_opcode opcode) 322 | { 323 | struct ib_wc wc = { 324 | .wr_id = wr_id, 325 | .status = status, 326 | .opcode = opcode, 327 | .qp = &qp->ib_qp, 328 | }; 329 | 330 | if (pib_get_behavior(PIB_BEHAVIOR_CORRUPT_INVALID_WC_ATTRS)) { 331 | wc.opcode = pib_random(); 332 | wc.byte_len = pib_random(); 333 | wc.ex.imm_data = pib_random(); 334 | wc.wc_flags = pib_random(); 335 | wc.pkey_index = pib_random(); 336 | wc.slid = pib_random(); 337 | wc.sl = pib_random(); 338 | wc.dlid_path_bits = pib_random(); 339 | } 340 | 341 | return insert_wc(cq, &wc, 1); 342 | } 343 | 344 | 345 | static int insert_wc(struct pib_cq *cq, const struct ib_wc *wc, int solicited) 346 | { 347 | int ret; 348 | unsigned long flags; 349 | struct pib_cqe *cqe; 350 | 351 | pib_trace_comp(to_pdev(cq->ib_cq.device), cq, wc); 352 | 353 | pib_spin_lock_irqsave(&cq->lock, flags); 354 | 355 | if (cq->state != PIB_STATE_OK) { 356 | ret = -EACCES; 357 | goto done; 358 | } 359 | 360 | if (list_empty(&cq->free_cqe_head)) { 361 | /* CQ overflow */ 362 | cq->state = PIB_STATE_ERR; 363 | pib_queue_work(to_pdev(cq->ib_cq.device), &cq->work); 364 | 365 | ret = -ENOMEM; 366 | goto done; 367 | } 368 | 369 | cqe = list_first_entry(&cq->free_cqe_head, struct pib_cqe, list); 370 | list_del_init(&cqe->list); 371 | 372 | cqe->ib_wc = *wc; 373 | 374 | if (to_pqp(wc->qp)->qp_type == IB_QPT_SMI) 375 | cqe->ib_wc.port_num = to_pqp(wc->qp)->ib_qp_init_attr.port_num; 376 | 377 | cq->nr_cqe++; 378 | 379 | list_add_tail(&cqe->list, &cq->cqe_head); 380 | 381 | /* tell completion channel */ 382 | if ((cq->notify_flag == IB_CQ_NEXT_COMP) || 383 | ((cq->notify_flag == IB_CQ_SOLICITED) && solicited)) { 384 | if (!cq->has_notified) { 385 | /* 386 | * The cq->has_notified must be set to 1 before calling completion handler. 387 | * Because pib_req_notify_cq() may be called via cq completion handler. 388 | */ 389 | cq->has_notified = 1; 390 | cq->ib_cq.comp_handler(&cq->ib_cq, cq->ib_cq.cq_context); 391 | } 392 | } 393 | 394 | ret = 0; 395 | 396 | done: 397 | pib_spin_unlock_irqrestore(&cq->lock, flags); 398 | 399 | return ret; 400 | } 401 | 402 | 403 | void pib_util_insert_async_cq_error(struct pib_dev *dev, struct pib_cq *cq) 404 | { 405 | struct ib_event ev; 406 | struct pib_qp *qp; 407 | unsigned long flags; 408 | 409 | pib_trace_async(dev, IB_EVENT_CQ_ERR, cq->cq_num); 410 | 411 | pib_spin_lock_irqsave(&cq->lock, flags); 412 | cq->state = PIB_STATE_ERR; 413 | ev.event = IB_EVENT_CQ_ERR; 414 | ev.device = cq->ib_cq.device; 415 | ev.element.cq = &cq->ib_cq; 416 | cq->ib_cq.event_handler(&ev, cq->ib_cq.cq_context); 417 | pib_spin_unlock_irqrestore(&cq->lock, flags); 418 | 419 | /* ここでは cq はロックしない */ 420 | 421 | list_for_each_entry(qp, &dev->qp_head, list) { 422 | pib_spin_lock(&qp->lock); 423 | if ((cq == qp->send_cq) || (cq == qp->recv_cq)) { 424 | qp->state = IB_QPS_ERR; 425 | pib_util_flush_qp(qp, 0); 426 | pib_util_insert_async_qp_error(qp, IB_EVENT_QP_FATAL); 427 | } 428 | pib_spin_unlock(&qp->lock); 429 | } 430 | } 431 | 432 | 433 | static void cq_overflow_handler(struct pib_work_struct *work) 434 | { 435 | struct pib_cq *cq = work->data; 436 | struct pib_dev *dev = work->dev; 437 | 438 | BUG_ON(!spin_is_locked(&dev->lock)); 439 | 440 | pib_util_insert_async_cq_error(dev, cq); 441 | } 442 | -------------------------------------------------------------------------------- /driver/pib_dma.c: -------------------------------------------------------------------------------- 1 | /* 2 | * pib_dma.c - DMA mapping 3 | * 4 | * Copyright (c) 2013,2014 Minoru NAKAMURA 5 | * 6 | * This code is licenced under the GPL version 2 or BSD license. 7 | * 8 | * Original sources from driver/infiniband/hw/qib/qib_dma.c 9 | * (c) 2006, 2009, 2010 QLogic, Corporation. 10 | */ 11 | #include 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | 18 | #include "pib.h" 19 | 20 | 21 | #define BAD_DMA_ADDRESS ((u64) 0) 22 | 23 | 24 | static int pib_dma_mapping_error(struct ib_device *dev, u64 dma_addr) 25 | { 26 | return dma_addr == BAD_DMA_ADDRESS; 27 | } 28 | 29 | 30 | static u64 pib_dma_map_single(struct ib_device *dev, void *cpu_addr, 31 | size_t size, enum dma_data_direction direction) 32 | { 33 | return (u64)(uintptr_t)cpu_addr; 34 | } 35 | 36 | 37 | static void pib_dma_unmap_single(struct ib_device *dev, u64 addr, size_t size, 38 | enum dma_data_direction direction) 39 | { 40 | } 41 | 42 | 43 | static u64 pib_dma_map_page(struct ib_device *dev, struct page *page, 44 | unsigned long offset, size_t size, 45 | enum dma_data_direction direction) 46 | { 47 | u64 addr; 48 | 49 | if (offset + size > PAGE_SIZE) { 50 | addr = BAD_DMA_ADDRESS; 51 | goto done; 52 | } 53 | 54 | addr = (u64)(uintptr_t)page_address(page); 55 | if (addr) 56 | addr += offset; 57 | 58 | /* @todo handle highmem pages */ 59 | 60 | done: 61 | return addr; 62 | } 63 | 64 | 65 | static void pib_dma_unmap_page(struct ib_device *dev, u64 addr, size_t size, 66 | enum dma_data_direction direction) 67 | { 68 | } 69 | 70 | static int pib_dma_map_sg(struct ib_device *dev, struct scatterlist *sgl, 71 | int nents, enum dma_data_direction direction) 72 | { 73 | struct scatterlist *sg; 74 | u64 addr; 75 | int i; 76 | int ret = nents; 77 | 78 | for_each_sg(sgl, sg, nents, i) { 79 | addr = (u64)(uintptr_t) page_address(sg_page(sg)); 80 | /* TODO: handle highmem pages */ 81 | if (!addr) { 82 | ret = 0; 83 | break; 84 | } 85 | } 86 | return ret; 87 | } 88 | 89 | 90 | static void pib_dma_unmap_sg(struct ib_device *dev, 91 | struct scatterlist *sg, int nents, 92 | enum dma_data_direction direction) 93 | { 94 | } 95 | 96 | #if PIB_IB_DMA_MAPPING_VERSION < 1 97 | static u64 pib_dma_address(struct ib_device *dev, struct scatterlist *sg) 98 | { 99 | u64 addr; 100 | 101 | addr = (u64)(uintptr_t) page_address(sg_page(sg)); 102 | 103 | if (addr) 104 | addr += sg->offset; 105 | 106 | return addr; 107 | } 108 | 109 | static unsigned int pib_dma_len(struct ib_device *dev, 110 | struct scatterlist *sg) 111 | { 112 | return sg->length; 113 | } 114 | #endif 115 | 116 | static void pib_dma_sync_single_for_cpu(struct ib_device *dev, u64 addr, 117 | size_t size, enum dma_data_direction dir) 118 | { 119 | } 120 | 121 | static void pib_dma_sync_single_for_device(struct ib_device *dev, u64 addr, 122 | size_t size, 123 | enum dma_data_direction dir) 124 | { 125 | } 126 | 127 | static void *pib_dma_alloc_coherent(struct ib_device *dev, size_t size, 128 | u64 *dma_handle, gfp_t flag) 129 | { 130 | struct page *p; 131 | void *addr = NULL; 132 | 133 | p = alloc_pages(flag, get_order(size)); 134 | if (p) 135 | addr = page_address(p); 136 | if (dma_handle) 137 | *dma_handle = (u64)(uintptr_t) addr; 138 | 139 | return addr; 140 | } 141 | 142 | static void pib_dma_free_coherent(struct ib_device *dev, size_t size, 143 | void *cpu_addr, u64 dma_handle) 144 | { 145 | free_pages((unsigned long) cpu_addr, get_order(size)); 146 | } 147 | 148 | 149 | struct ib_dma_mapping_ops pib_dma_mapping_ops = { 150 | .mapping_error = pib_dma_mapping_error, 151 | .map_single = pib_dma_map_single, 152 | .unmap_single = pib_dma_unmap_single, 153 | .map_page = pib_dma_map_page, 154 | .unmap_page = pib_dma_unmap_page, 155 | .map_sg = pib_dma_map_sg, 156 | .unmap_sg = pib_dma_unmap_sg, 157 | #if PIB_IB_DMA_MAPPING_VERSION < 1 158 | .dma_address = pib_dma_address, 159 | .dma_len = pib_dma_len, 160 | #endif 161 | .sync_single_for_cpu = pib_dma_sync_single_for_cpu, 162 | .sync_single_for_device = pib_dma_sync_single_for_device, 163 | .alloc_coherent = pib_dma_alloc_coherent, 164 | .free_coherent = pib_dma_free_coherent 165 | }; 166 | -------------------------------------------------------------------------------- /driver/pib_mad.h: -------------------------------------------------------------------------------- 1 | /* 2 | * pib_mad.h - Definitions of Management Datagram(MAD) and Subnet Management Packet(SMP) 3 | * 4 | * Copyright (c) 2013,2014 Minoru NAKAMURA 5 | * 6 | * This code is licenced under the GPL version 2 or BSD license. 7 | */ 8 | #ifndef PIB_MAD_H 9 | #define PIB_MAD_H 10 | 11 | #include 12 | #include 13 | 14 | 15 | #define PIB_MGMT_CLASS_VERSION (1) 16 | 17 | #define PIB_SMP_UNSUP_VERSION cpu_to_be16(0x0004) 18 | #define PIB_SMP_UNSUP_METHOD cpu_to_be16(0x0008) 19 | #define PIB_SMP_UNSUP_METH_ATTR cpu_to_be16(0x000C) 20 | #define PIB_SMP_INVALID_FIELD cpu_to_be16(0x001C) 21 | 22 | 23 | struct pib_smp_node_info { 24 | u8 base_version; 25 | u8 class_version; 26 | u8 node_type; 27 | u8 node_ports; 28 | __be64 sys_image_guid; 29 | __be64 node_guid; 30 | __be64 port_guid; 31 | __be16 partition_cap; 32 | __be16 device_id; 33 | __be32 revision; 34 | u8 local_port_num; 35 | u8 vendor_id[3]; 36 | } __attribute__ ((packed)); 37 | 38 | 39 | struct pib_smp_switch_info { 40 | __be16 linear_fdb_cap; 41 | __be16 random_fdb_cap; 42 | __be16 multicast_fdb_cap; 43 | __be16 linear_fdb_top; 44 | u8 default_port; 45 | u8 default_mcast_primary_port; 46 | u8 default_mcast_not_primary_port; 47 | 48 | /* 49 | * LifeTimeValue 5 bits 50 | * PortStateChange 1 bit 51 | * OptimizedSLtoVLMappingProgramming 2bits 52 | */ 53 | u8 various1; 54 | 55 | __be16 lids_per_port; 56 | __be16 partition_enforcement_cap; 57 | 58 | /* 59 | * InboundEnforcementCap 1 bit 60 | * OutboundEnforcementCap 1 bit 61 | * FilterRawInboundCap 1 bit 62 | * EnhancedPort0 1 bit 63 | * Reserved 3 bits 64 | */ 65 | u8 various2; 66 | } __attribute__ ((packed)); 67 | 68 | 69 | #endif /* PIB_MAD_H */ 70 | -------------------------------------------------------------------------------- /driver/pib_mad_pma.c: -------------------------------------------------------------------------------- 1 | /* 2 | * pib_mad_pmc.c - Performance Management Agent 3 | * 4 | * Copyright (c) 2013-2015 Minoru NAKAMURA 5 | * 6 | * This code is licenced under the GPL version 2 or BSD license. 7 | */ 8 | #include 9 | #include 10 | #include 11 | #include 12 | 13 | #include "pib.h" 14 | #include "pib_mad.h" 15 | 16 | #define PIB_PMA_CLASS_VERSION (1) 17 | 18 | #define PIB_PMA_STATUS_BAD_VERSION (0x1 << 2) 19 | #define PIB_PMA_STATUS_UNSUPPORTED_METHOD (0x2 << 2) /* discard response ? */ 20 | #define PIB_PMA_STATUS_UNSUPPORTED_METHOD_ATTRIB (0x3 << 2) 21 | #define PIB_PMA_STATUS_INVALID_ATTRIB_VALUE (0x7 << 2) 22 | 23 | #define PIB_PMA_PORT_RCV_ERROR_DETAILS (0x0015) 24 | #define PIB_PMA_PORT_XMIT_DISCARD_DETAILS (0x0016) 25 | #define PIB_PMA_PORT_OP_RCV_COUNTERS (0x0017) 26 | #define PIB_PMA_PORT_FLOW_CTL_COUNTERS (0x0018) 27 | #define PIB_PMA_PORT_VL_OP_PACKETS (0x0019) 28 | #define PIB_PMA_PORT_VL_OP_DATA (0x001A) 29 | #define PIB_PMA_PORT_VL_XMIT_FLOW_CTL_UPDATE_ERRORS (0x001B) 30 | #define PIB_PMA_PORT_VL_XMIT_WAIT_COUNTERS (0x001C) 31 | #define PIB_PMA_PORT_COUNTERS_EXT (0x001D) 32 | #define PIB_PMA_PORT_SAMPLES_RESULT_EXT (0x001E) 33 | #define PIB_PMA_PORT_VL_CONGESTION (0x0030) 34 | 35 | #define PIB_PMA_SAMPLE_STATUS_DONE (0x00) 36 | #define PIB_PMA_SAMPLE_STATUS_STARTED (0x01) 37 | #define PIB_PMA_SAMPLE_STATUS_RUNNING (0x02) 38 | 39 | #define PIB_PMA_SEL_PORT_RCV_SWITCH_RELAY_ERRORS cpu_to_be16(0x0020) 40 | #define PIB_PMA_SEL_PORT_XMIT_CONSTRAINT_ERRORS cpu_to_be16(0x0080) 41 | #define PIB_PMA_SEL_PORT_RCV_CONSTRAINT_ERRORS cpu_to_be16(0x0100) 42 | 43 | 44 | static u8 get_saturation4(u64 value) 45 | { 46 | if (value > 0xF) 47 | return 0xF; 48 | else 49 | return (value & 0xF); 50 | } 51 | 52 | 53 | static u8 get_saturation8(u64 value) 54 | { 55 | if (value > 0xFF) 56 | return 0xFF; 57 | else 58 | return (u8)value; 59 | } 60 | 61 | 62 | static u16 get_saturation16(u64 value) 63 | { 64 | if (value > 0xFFFF) 65 | return 0xFFFF; 66 | else 67 | return (u16)value; 68 | } 69 | 70 | 71 | static u32 get_saturation32(u64 value) 72 | { 73 | if (value >> 32) 74 | return 0xFFFFFFFF; 75 | else 76 | return (u32)value; 77 | } 78 | 79 | 80 | static int pma_get_method(struct ib_pma_mad *pmp, struct pib_node *node, u8 port_num); 81 | static int pma_set_method(struct ib_pma_mad *pmp, struct pib_node *node, u8 port_num); 82 | 83 | static int pma_get_class_port_info(struct ib_pma_mad *pmp, struct pib_node *node, u8 port_num); 84 | static int pma_get_port_samples_control(struct ib_pma_mad *pmp, struct pib_node *node, u8 port_num); 85 | static int pma_set_port_samples_control(struct ib_pma_mad *pmp, struct pib_node *node, u8 port_num); 86 | static int pma_get_port_samples_result(struct ib_pma_mad *pmp, struct pib_node *node, u8 port_num); 87 | static int pma_get_port_samples_result_ext(struct ib_pma_mad *pmp, struct pib_node *node, u8 port_num); 88 | static int pma_get_port_counters(struct ib_pma_mad *pmp, struct pib_node *node, u8 port_num); 89 | static int pma_set_port_counters(struct ib_pma_mad *pmp, struct pib_node *node, u8 port_num); 90 | static int pma_get_port_counters_ext(struct ib_pma_mad *pmp, struct pib_node *node, u8 port_num); 91 | static int pma_set_port_counters_ext(struct ib_pma_mad *pmp, struct pib_node *node, u8 port_num); 92 | 93 | 94 | static int reply(struct ib_mad_hdr *mad_hdr) 95 | { 96 | mad_hdr->method = IB_MGMT_METHOD_GET_RESP; 97 | 98 | return IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_REPLY; 99 | } 100 | 101 | 102 | int pib_process_pma_mad(struct pib_node *node, u8 port_num, 103 | const struct ib_mad *in_mad, struct ib_mad *out_mad) 104 | { 105 | int ret; 106 | struct ib_pma_mad *pmp = (struct ib_pma_mad *)out_mad; 107 | u8 method; 108 | 109 | *out_mad = *in_mad; 110 | 111 | pmp = (struct ib_pma_mad *)out_mad; 112 | 113 | if ((pmp->mad_hdr.base_version != IB_MGMT_BASE_VERSION) || 114 | (pmp->mad_hdr.class_version != PIB_PMA_CLASS_VERSION)) { 115 | pmp->mad_hdr.status = PIB_PMA_STATUS_BAD_VERSION; 116 | return reply(&pmp->mad_hdr); 117 | } 118 | 119 | method = pmp->mad_hdr.method; 120 | 121 | switch (method) { 122 | 123 | case IB_MGMT_METHOD_GET: 124 | ret = pma_get_method(pmp, node, port_num); 125 | break; 126 | 127 | case IB_MGMT_METHOD_SET: 128 | ret = pma_set_method(pmp, node, port_num); 129 | break; 130 | 131 | case IB_MGMT_METHOD_TRAP: 132 | case IB_MGMT_METHOD_GET_RESP: 133 | pr_info("*** %s %u ***\n", __FUNCTION__, __LINE__); 134 | return IB_MAD_RESULT_SUCCESS; 135 | 136 | default: 137 | pr_err("pib: *** %s subn: %u ***", __func__, method); 138 | pmp->mad_hdr.status = PIB_PMA_STATUS_UNSUPPORTED_METHOD; 139 | ret = reply(&pmp->mad_hdr); 140 | break; 141 | } 142 | 143 | return ret; 144 | 145 | } 146 | 147 | 148 | static int pma_get_method(struct ib_pma_mad *pmp, struct pib_node *node, u8 port_num) 149 | { 150 | switch (pmp->mad_hdr.attr_id) { 151 | 152 | case IB_PMA_CLASS_PORT_INFO: 153 | return pma_get_class_port_info(pmp, node, port_num); 154 | 155 | case IB_PMA_PORT_SAMPLES_CONTROL: 156 | pr_info("pib: PerformanceGet(PORT_SAMPLES_CONTROL) attr_id=0x%04x", be16_to_cpu(pmp->mad_hdr.attr_id)); 157 | return pma_get_port_samples_control(pmp, node, port_num); 158 | 159 | case IB_PMA_PORT_SAMPLES_RESULT: 160 | pr_info("pib: PerformanceGet(PORT_SAMPLES_RESULT) attr_id=0x%04x", be16_to_cpu(pmp->mad_hdr.attr_id)); 161 | return pma_get_port_samples_result(pmp, node, port_num); 162 | 163 | case IB_PMA_PORT_SAMPLES_RESULT_EXT: 164 | pr_info("pib: PerformanceGet(PORT_SAMPLES_RESULT_EXT) attr_id=0x%04x", be16_to_cpu(pmp->mad_hdr.attr_id)); 165 | return pma_get_port_samples_result_ext(pmp, node, port_num); 166 | 167 | case IB_PMA_PORT_COUNTERS: 168 | return pma_get_port_counters(pmp, node, port_num); 169 | 170 | case IB_PMA_PORT_COUNTERS_EXT: 171 | return pma_get_port_counters_ext(pmp, node, port_num); 172 | 173 | default: 174 | pr_err("pib: PerformanceGet() attr_id=0x%04x", be16_to_cpu(pmp->mad_hdr.attr_id)); 175 | pmp->mad_hdr.status = PIB_PMA_STATUS_UNSUPPORTED_METHOD_ATTRIB; 176 | return reply(&pmp->mad_hdr); 177 | } 178 | } 179 | 180 | 181 | static int pma_set_method(struct ib_pma_mad *pmp, struct pib_node *node, u8 port_num) 182 | { 183 | switch (pmp->mad_hdr.attr_id) { 184 | 185 | case IB_PMA_PORT_SAMPLES_CONTROL: 186 | pr_info("pib: PerformanceSet(PORT_SAMPLES_CONTROL) attr_id=0x%04x", be16_to_cpu(pmp->mad_hdr.attr_id)); 187 | return pma_set_port_samples_control(pmp, node, port_num); 188 | 189 | #if 0 190 | case IB_PMA_PORT_SAMPLES_RESULT_EXT: 191 | return pma_set_port_samples_result_ext(pmp, node, port_num); 192 | #endif 193 | 194 | case IB_PMA_PORT_COUNTERS: 195 | return pma_set_port_counters(pmp, node, port_num); 196 | 197 | case IB_PMA_PORT_COUNTERS_EXT: 198 | return pma_set_port_counters_ext(pmp, node, port_num); 199 | 200 | default: 201 | pr_err("pib: PerformanceSet() attr_id=0x%04x", be16_to_cpu(pmp->mad_hdr.attr_id)); 202 | pmp->mad_hdr.status = PIB_PMA_STATUS_UNSUPPORTED_METHOD_ATTRIB; 203 | return reply(&pmp->mad_hdr); 204 | } 205 | } 206 | 207 | 208 | static int pma_get_class_port_info(struct ib_pma_mad *pmp, struct pib_node *node, u8 port_num) 209 | { 210 | struct ib_class_port_info *info = 211 | (struct ib_class_port_info *)pmp->data; 212 | 213 | memset(pmp->data, 0, sizeof(pmp->data)); 214 | 215 | if (pmp->mad_hdr.attr_mod != cpu_to_be16(0)) { 216 | pmp->mad_hdr.status = PIB_PMA_STATUS_INVALID_ATTRIB_VALUE; 217 | goto bail; 218 | } 219 | 220 | info->base_version = IB_MGMT_BASE_VERSION; 221 | info->class_version = PIB_PMA_CLASS_VERSION; 222 | info->capability_mask = IB_PMA_CLASS_CAP_EXT_WIDTH; 223 | 224 | /* 225 | * Set the most significant bit of CM2 to indicate support for 226 | * congestion statistics 227 | */ 228 | /* p->reserved[0] = dd->psxmitwait_supported << 7; */ 229 | 230 | /* 231 | * Expected response time is 4.096 usec. * 2^18 == 1.073741824 sec. 232 | */ 233 | info->resp_time_value = 18; 234 | 235 | bail: 236 | return reply(&pmp->mad_hdr); 237 | } 238 | 239 | 240 | static int pma_get_port_samples_control(struct ib_pma_mad *pmp, struct pib_node *node, u8 port_num) 241 | { 242 | int i; 243 | struct ib_pma_portsamplescontrol *p = 244 | (struct ib_pma_portsamplescontrol *)pmp->data; 245 | struct pib_port_perf *perf; 246 | u8 port_select; 247 | 248 | port_select = p->port_select; 249 | 250 | memset(pmp->data, 0, sizeof(pmp->data)); 251 | 252 | p->port_select = port_select; 253 | 254 | if (pmp->mad_hdr.attr_mod != cpu_to_be16(0)) { 255 | pmp->mad_hdr.status = PIB_PMA_STATUS_INVALID_ATTRIB_VALUE; 256 | goto bail; 257 | } 258 | 259 | /* Base management port is ignore. */ 260 | if ((port_select < node->port_start) || (node->port_count <= port_select)) { 261 | pmp->mad_hdr.status = PIB_PMA_STATUS_INVALID_ATTRIB_VALUE; 262 | goto bail; 263 | } 264 | 265 | perf = &node->ports[port_select - node->port_start].perf; 266 | 267 | p->opcode = perf->OpCode; 268 | p->tick = 1; 269 | p->counter_width = 4; /* 32 bits counter */ 270 | 271 | p->counter_mask0_9 = cpu_to_be32(0x09249249); 272 | p->counter_mask10_14 = cpu_to_be16(0x1249); 273 | 274 | p->sample_mechanisms = 0; /* one sample mechanism is available. */ 275 | p->sample_status = PIB_PMA_SAMPLE_STATUS_DONE; 276 | 277 | p->sample_start = cpu_to_be32(0); 278 | p->sample_interval = cpu_to_be32(0); 279 | p->tag = cpu_to_be16(perf->tag); 280 | 281 | for (i=0 ; icounter_select) ; i++) 282 | p->counter_select[i] = cpu_to_be16(perf->counter_select[i]); 283 | 284 | bail: 285 | return reply(&pmp->mad_hdr); 286 | } 287 | 288 | 289 | static int pma_set_port_samples_control(struct ib_pma_mad *pmp, struct pib_node *node, u8 port_num) 290 | { 291 | int i; 292 | struct ib_pma_portsamplescontrol *p = 293 | (struct ib_pma_portsamplescontrol *)pmp->data; 294 | struct pib_port_perf *perf; 295 | u8 port_select; 296 | 297 | port_select = p->port_select; 298 | 299 | memset(pmp->data, 0, sizeof(pmp->data)); 300 | 301 | p->port_select = port_select; 302 | 303 | if (pmp->mad_hdr.attr_mod != cpu_to_be16(0)) { 304 | pmp->mad_hdr.status = PIB_PMA_STATUS_INVALID_ATTRIB_VALUE; 305 | goto bail; 306 | } 307 | 308 | /* Base management port is ignore. */ 309 | if ((port_select < node->port_start) || (node->port_count <= port_select)) { 310 | pmp->mad_hdr.status = PIB_PMA_STATUS_INVALID_ATTRIB_VALUE; 311 | goto bail; 312 | } 313 | 314 | perf = &node->ports[port_select - node->port_start].perf; 315 | 316 | perf->OpCode = p->opcode; 317 | perf->tag = be16_to_cpu(p->tag); 318 | 319 | #if 0 320 | p->sample_start = cpu_to_be32(); 321 | p->sample_interval = cpu_to_be32(); 322 | #endif 323 | 324 | for (i=0 ; icounter_select) ; i++) 325 | perf->counter_select[i] = be16_to_cpu(p->counter_select[i]); 326 | 327 | bail: 328 | return pma_get_port_samples_control(pmp, node, port_num); 329 | } 330 | 331 | 332 | static int pma_get_port_samples_result(struct ib_pma_mad *pmp, struct pib_node *node, u8 port_num) 333 | { 334 | int i; 335 | struct ib_pma_portsamplesresult *p = 336 | (struct ib_pma_portsamplesresult *)pmp->data; 337 | struct pib_port_perf *perf; 338 | 339 | memset(pmp->data, 0, sizeof(pmp->data)); 340 | 341 | if (pmp->mad_hdr.attr_mod != cpu_to_be16(0)) { 342 | pmp->mad_hdr.status = PIB_PMA_STATUS_INVALID_ATTRIB_VALUE; 343 | goto bail; 344 | } 345 | 346 | perf = &node->ports[port_num - node->port_start].perf; 347 | 348 | p->tag = cpu_to_be16(perf->tag); 349 | p->sample_status = PIB_PMA_SAMPLE_STATUS_DONE; 350 | 351 | for (i=0 ; icounter) ; i++) 352 | p->counter[i] = cpu_to_be32((u32)perf->counter[i]); 353 | 354 | bail: 355 | return reply(&pmp->mad_hdr); 356 | } 357 | 358 | 359 | static int pma_get_port_samples_result_ext(struct ib_pma_mad *pmp, struct pib_node *node, u8 port_num) 360 | { 361 | int i; 362 | struct ib_pma_portsamplesresult_ext *p = 363 | (struct ib_pma_portsamplesresult_ext *)pmp->data; 364 | struct pib_port_perf *perf; 365 | 366 | memset(pmp->data, 0, sizeof(pmp->data)); 367 | 368 | if (pmp->mad_hdr.attr_mod != cpu_to_be16(0)) { 369 | pmp->mad_hdr.status = PIB_PMA_STATUS_INVALID_ATTRIB_VALUE; 370 | goto bail; 371 | } 372 | 373 | perf = &node->ports[port_num - node->port_start].perf; 374 | 375 | p->tag = cpu_to_be16(perf->tag); 376 | p->sample_status = cpu_to_be16(PIB_PMA_SAMPLE_STATUS_DONE); 377 | p->extended_width = cpu_to_be32(0x80000000); /* 64 bits counter */ 378 | 379 | for (i=0 ; icounter) ; i++) 380 | p->counter[i] = cpu_to_be64(perf->counter[i]); 381 | 382 | bail: 383 | return reply(&pmp->mad_hdr); 384 | } 385 | 386 | 387 | static int pma_get_port_counters(struct ib_pma_mad *pmp, struct pib_node *node, u8 port_num) 388 | { 389 | struct ib_pma_portcounters *p = 390 | (struct ib_pma_portcounters *)pmp->data; 391 | struct pib_port_perf *perf; 392 | u8 port_select; 393 | 394 | port_select = p->port_select; 395 | 396 | memset(pmp->data, 0, sizeof(pmp->data)); 397 | 398 | p->port_select = port_select; 399 | 400 | if (pmp->mad_hdr.attr_mod != cpu_to_be16(0)) { 401 | pmp->mad_hdr.status = PIB_PMA_STATUS_INVALID_ATTRIB_VALUE; 402 | goto bail; 403 | } 404 | 405 | /* Base management port is ignore. */ 406 | if ((port_select < node->port_start) || (node->port_count <= port_select)) { 407 | pmp->mad_hdr.status = PIB_PMA_STATUS_INVALID_ATTRIB_VALUE; 408 | goto bail; 409 | } 410 | 411 | perf = &node->ports[port_select - node->port_start].perf; 412 | 413 | p->symbol_error_counter = cpu_to_be16(get_saturation16(perf->symbol_error_counter)); 414 | p->link_error_recovery_counter = get_saturation8(perf->link_error_recovery_counter); 415 | p->link_downed_counter = get_saturation8(perf->link_downed_counter); 416 | p->port_rcv_errors = cpu_to_be16(get_saturation16(perf->rcv_errors)); 417 | p->port_rcv_remphys_errors = cpu_to_be16(get_saturation16(perf->rcv_remphys_errors)); 418 | p->port_rcv_switch_relay_errors = cpu_to_be16(get_saturation16(perf->rcv_switch_relay_errors)); 419 | p->port_xmit_discards = cpu_to_be16(get_saturation16(perf->xmit_discards)); 420 | p->port_xmit_constraint_errors = get_saturation8(perf->xmit_constraint_errors); 421 | p->port_rcv_constraint_errors = get_saturation8(perf->rcv_constraint_errors); 422 | 423 | p->link_overrun_errors = 424 | (get_saturation4(perf->local_link_integrity_errors) << 4) | 425 | get_saturation4(perf->excessive_buffer_overrun_errors); 426 | 427 | p->vl15_dropped = cpu_to_be16(get_saturation16(perf->vl15_dropped)); 428 | p->port_xmit_data = cpu_to_be32(get_saturation32(perf->xmit_data)); 429 | p->port_rcv_data = cpu_to_be32(get_saturation32(perf->rcv_data)); 430 | p->port_xmit_packets = cpu_to_be32(get_saturation32(perf->xmit_packets)); 431 | p->port_rcv_packets = cpu_to_be32(get_saturation32(perf->rcv_packets)); 432 | p->port_xmit_wait = cpu_to_be32(get_saturation32(perf->xmit_wait)); 433 | 434 | bail: 435 | return reply(&pmp->mad_hdr); 436 | } 437 | 438 | 439 | static int pma_set_port_counters(struct ib_pma_mad *pmp, struct pib_node *node, u8 port_num) 440 | { 441 | struct ib_pma_portcounters *p = 442 | (struct ib_pma_portcounters *)pmp->data; 443 | struct pib_port_perf *perf; 444 | u8 port_select; 445 | 446 | port_select = p->port_select; 447 | 448 | memset(pmp->data, 0, sizeof(pmp->data)); 449 | 450 | p->port_select = port_select; 451 | 452 | if (pmp->mad_hdr.attr_mod != cpu_to_be16(0)) { 453 | pmp->mad_hdr.status = PIB_PMA_STATUS_INVALID_ATTRIB_VALUE; 454 | goto bail; 455 | } 456 | 457 | /* Base management port is ignore. */ 458 | if ((port_select < node->port_start) || (node->port_count <= port_select)) { 459 | pmp->mad_hdr.status = PIB_PMA_STATUS_INVALID_ATTRIB_VALUE; 460 | goto bail; 461 | } 462 | 463 | perf = &node->ports[port_select - node->port_start].perf; 464 | 465 | if (p->counter_select & IB_PMA_SEL_SYMBOL_ERROR) 466 | perf->symbol_error_counter = be16_to_cpu(p->symbol_error_counter); 467 | 468 | if (p->counter_select & IB_PMA_SEL_LINK_ERROR_RECOVERY) 469 | perf->link_error_recovery_counter = p->link_error_recovery_counter; 470 | 471 | if (p->counter_select & IB_PMA_SEL_LINK_DOWNED) 472 | perf->link_downed_counter = p->link_downed_counter; 473 | 474 | if (p->counter_select & IB_PMA_SEL_PORT_RCV_ERRORS) 475 | perf->rcv_errors = be16_to_cpu(p->port_rcv_errors); 476 | 477 | if (p->counter_select & IB_PMA_SEL_PORT_RCV_REMPHYS_ERRORS) 478 | perf->rcv_remphys_errors = be16_to_cpu(p->port_rcv_remphys_errors); 479 | 480 | if (p->counter_select & PIB_PMA_SEL_PORT_RCV_SWITCH_RELAY_ERRORS) 481 | perf->rcv_switch_relay_errors = be16_to_cpu(p->port_rcv_switch_relay_errors); 482 | 483 | if (p->counter_select & IB_PMA_SEL_PORT_XMIT_DISCARDS) 484 | perf->xmit_discards = be16_to_cpu(p->port_xmit_discards); 485 | 486 | if (p->counter_select & PIB_PMA_SEL_PORT_XMIT_CONSTRAINT_ERRORS) 487 | perf->xmit_constraint_errors = p->port_xmit_constraint_errors; 488 | 489 | if (p->counter_select & PIB_PMA_SEL_PORT_RCV_CONSTRAINT_ERRORS) 490 | perf->rcv_constraint_errors = p->port_rcv_constraint_errors; 491 | 492 | if (p->counter_select & IB_PMA_SEL_LOCAL_LINK_INTEGRITY_ERRORS) 493 | perf->local_link_integrity_errors = (p->link_overrun_errors >> 4) & 0xF; 494 | 495 | if (p->counter_select & IB_PMA_SEL_EXCESSIVE_BUFFER_OVERRUNS) 496 | perf->excessive_buffer_overrun_errors = (p->link_overrun_errors & 0xF); 497 | 498 | if (p->counter_select & IB_PMA_SEL_PORT_VL15_DROPPED) 499 | perf->vl15_dropped = be16_to_cpu(p->vl15_dropped); 500 | 501 | if (p->counter_select & IB_PMA_SEL_PORT_XMIT_DATA) 502 | perf->xmit_data = be32_to_cpu(p->port_xmit_data); 503 | 504 | if (p->counter_select & IB_PMA_SEL_PORT_RCV_DATA) 505 | perf->rcv_data = be32_to_cpu(p->port_rcv_data); 506 | 507 | if (p->counter_select & IB_PMA_SEL_PORT_XMIT_PACKETS) 508 | perf->xmit_packets = be32_to_cpu(p->port_xmit_packets); 509 | 510 | if (p->counter_select & IB_PMA_SEL_PORT_RCV_PACKETS) 511 | perf->rcv_packets = be32_to_cpu(p->port_rcv_packets); 512 | 513 | bail: 514 | return pma_set_port_counters(pmp, node, port_num); 515 | } 516 | 517 | 518 | static int pma_get_port_counters_ext(struct ib_pma_mad *pmp, struct pib_node *node, u8 port_num) 519 | { 520 | struct ib_pma_portcounters_ext *p = 521 | (struct ib_pma_portcounters_ext *)pmp->data; 522 | struct pib_port_perf *perf; 523 | u8 port_select; 524 | 525 | port_select = p->port_select; 526 | 527 | memset(pmp->data, 0, sizeof(pmp->data)); 528 | 529 | p->port_select = port_select; 530 | 531 | if (pmp->mad_hdr.attr_mod != cpu_to_be16(0)) { 532 | pmp->mad_hdr.status = PIB_PMA_STATUS_INVALID_ATTRIB_VALUE; 533 | goto bail; 534 | } 535 | 536 | /* Base management port is ignore. */ 537 | if ((port_select < node->port_start) || (node->port_count <= port_select)) { 538 | pmp->mad_hdr.status = PIB_PMA_STATUS_INVALID_ATTRIB_VALUE; 539 | goto bail; 540 | } 541 | 542 | perf = &node->ports[port_select - node->port_start].perf; 543 | 544 | p->port_xmit_data = cpu_to_be64(perf->xmit_data); 545 | p->port_rcv_data = cpu_to_be64(perf->rcv_data); 546 | p->port_xmit_packets = cpu_to_be64(perf->xmit_packets); 547 | p->port_rcv_packets = cpu_to_be64(perf->rcv_packets); 548 | p->port_unicast_xmit_packets = cpu_to_be64(perf->unicast_xmit_packets); 549 | p->port_unicast_rcv_packets = cpu_to_be64(perf->unicast_rcv_packets); 550 | p->port_multicast_xmit_packets = cpu_to_be64(perf->multicast_xmit_packets); 551 | p->port_multicast_rcv_packets = cpu_to_be64(perf->multicast_rcv_packets); 552 | 553 | bail: 554 | return reply(&pmp->mad_hdr); 555 | } 556 | 557 | 558 | static int pma_set_port_counters_ext(struct ib_pma_mad *pmp, struct pib_node *node, u8 port_num) 559 | { 560 | struct ib_pma_portcounters_ext *p = 561 | (struct ib_pma_portcounters_ext *)pmp->data; 562 | struct pib_port_perf *perf; 563 | u8 port_select; 564 | 565 | port_select = p->port_select; 566 | 567 | if (pmp->mad_hdr.attr_mod != cpu_to_be16(0)) { 568 | pmp->mad_hdr.status = PIB_PMA_STATUS_INVALID_ATTRIB_VALUE; 569 | goto bail; 570 | } 571 | 572 | /* Base management port is ignore. */ 573 | if ((port_select < node->port_start) || (node->port_count <= port_select)) { 574 | pmp->mad_hdr.status = PIB_PMA_STATUS_INVALID_ATTRIB_VALUE; 575 | goto bail; 576 | } 577 | 578 | perf = &node->ports[port_select - node->port_start].perf; 579 | 580 | if (p->counter_select & IB_PMA_SELX_PORT_XMIT_DATA) 581 | perf->xmit_data = be64_to_cpu(p->port_xmit_data); 582 | 583 | if (p->counter_select & IB_PMA_SELX_PORT_RCV_DATA) 584 | perf->rcv_data = be64_to_cpu(p->port_rcv_data); 585 | 586 | if (p->counter_select & IB_PMA_SELX_PORT_XMIT_PACKETS) 587 | perf->xmit_packets = be64_to_cpu(p->port_xmit_packets); 588 | 589 | if (p->counter_select & IB_PMA_SELX_PORT_RCV_PACKETS) 590 | perf->rcv_packets = be64_to_cpu(p->port_rcv_packets); 591 | 592 | if (p->counter_select & IB_PMA_SELX_PORT_UNI_XMIT_PACKETS) 593 | p->port_unicast_xmit_packets = 0; 594 | 595 | if (p->counter_select & IB_PMA_SELX_PORT_UNI_RCV_PACKETS) 596 | p->port_unicast_rcv_packets = 0; 597 | 598 | if (p->counter_select & IB_PMA_SELX_PORT_MULTI_XMIT_PACKETS) 599 | p->port_multicast_xmit_packets = 0; 600 | 601 | if (p->counter_select & IB_PMA_SELX_PORT_MULTI_RCV_PACKETS) 602 | p->port_multicast_rcv_packets = 0; 603 | 604 | bail: 605 | return pma_get_port_counters_ext(pmp, node, port_num); 606 | } 607 | -------------------------------------------------------------------------------- /driver/pib_mr.c: -------------------------------------------------------------------------------- 1 | /* 2 | * pib_mr.c - Memory Region(MR) functions 3 | * 4 | * Copyright (c) 2013-2015 Minoru NAKAMURA 5 | * 6 | * This code is licenced under the GPL version 2 or BSD license. 7 | */ 8 | #include 9 | #include 10 | #include 11 | 12 | 13 | #include "pib.h" 14 | #include "pib_spinlock.h" 15 | #include "pib_trace.h" 16 | 17 | 18 | static struct pib_mr *create_mr(struct pib_dev *dev, struct pib_pd *pd, enum pib_mr_state init_state, bool fast_reg_mr, int max_page_list_len); 19 | static enum ib_wc_status copy_data_with_rkey(struct pib_pd *pd, u32 rkey, void *buffer, u64 address, u64 size, int access_flags, enum pib_mr_direction direction, bool check_only); 20 | static int mr_copy_data(struct pib_mr *mr, void *buffer, u64 offset, u64 size, u64 swap, u64 compare, enum pib_mr_direction direction); 21 | static bool mr_copy_data_sub(void *buffer, void *target_vaddr, u64 range, u64 swap, u64 compare, enum pib_mr_direction direction); 22 | 23 | 24 | static int 25 | reg_mr(struct pib_pd *pd, struct pib_mr *mr) 26 | { 27 | int i; 28 | unsigned long flags; 29 | 30 | /* find an empty slot in mr_table[] */ 31 | spin_lock_irqsave(&pd->lock, flags); 32 | for (i=0 ; imr_table[i] == NULL) 34 | goto generate_new_key; 35 | spin_unlock_irqrestore(&pd->lock, flags); 36 | 37 | return -1; 38 | 39 | generate_new_key: 40 | mr->ib_mr.lkey = (i + pib_random() * PIB_MAX_MR_PER_PD) << PIB_MR_INDEX_SHIFT; 41 | mr->ib_mr.rkey = (i + pib_random() * PIB_MAX_MR_PER_PD) << PIB_MR_INDEX_SHIFT; 42 | 43 | if (mr->ib_mr.lkey == PIB_LOCAL_DMA_LKEY) 44 | goto generate_new_key; 45 | 46 | #ifdef PIB_HACK_IMM_DATA_LKEY 47 | if (mr->ib_mr.lkey == PIB_IMM_DATA_LKEY) 48 | goto generate_new_key; 49 | #endif 50 | 51 | pd->mr_table[i] = mr; 52 | 53 | pd->nr_mr++; 54 | 55 | spin_unlock_irqrestore(&pd->lock, flags); 56 | 57 | return 0; 58 | } 59 | 60 | 61 | struct ib_mr * 62 | pib_get_dma_mr(struct ib_pd *ibpd, int access_flags) 63 | { 64 | struct pib_dev *dev; 65 | struct pib_pd *pd; 66 | struct pib_mr *mr; 67 | 68 | if (!ibpd) 69 | return ERR_PTR(-EINVAL); 70 | 71 | dev = to_pdev(ibpd->device); 72 | pd = to_ppd(ibpd); 73 | 74 | mr = create_mr(dev, pd, PIB_MR_VALID, false, 0); 75 | if (IS_ERR(mr)) 76 | return (struct ib_mr *)mr; 77 | 78 | mr->start = 0; 79 | mr->length = (u64)-1; 80 | mr->virt_addr = 0; 81 | mr->access_flags = access_flags; 82 | mr->is_dma = 1; 83 | 84 | pib_trace_api(dev, IB_USER_VERBS_CMD_REG_MR, mr->mr_num); 85 | 86 | return &mr->ib_mr; 87 | } 88 | 89 | 90 | struct ib_mr * 91 | pib_reg_user_mr(struct ib_pd *ibpd, u64 start, u64 length, 92 | u64 virt_addr, int access_flags, 93 | struct ib_udata *udata) 94 | { 95 | struct pib_dev *dev; 96 | struct pib_pd *pd; 97 | struct ib_umem *umem; 98 | struct pib_mr *mr; 99 | 100 | if (!ibpd) 101 | return ERR_PTR(-EINVAL); 102 | 103 | pd = to_ppd(ibpd); 104 | dev = to_pdev(ibpd->device); 105 | 106 | umem = ib_umem_get(ibpd->uobject->context, start, length, 107 | access_flags, 0); 108 | if (IS_ERR(umem)) 109 | return (struct ib_mr *)umem; 110 | 111 | mr = create_mr(dev, pd, PIB_MR_VALID, false, 0); 112 | if (IS_ERR(mr)) 113 | goto err_alloc_mr; 114 | 115 | mr->start = start; 116 | mr->length = length; 117 | mr->virt_addr = virt_addr; 118 | mr->access_flags = access_flags; 119 | mr->ib_umem = umem; 120 | 121 | pib_trace_api(dev, IB_USER_VERBS_CMD_REG_MR, mr->mr_num); 122 | 123 | return &mr->ib_mr; 124 | 125 | err_alloc_mr: 126 | ib_umem_release(umem); 127 | 128 | return ERR_PTR(-ENOMEM); 129 | } 130 | 131 | 132 | static struct pib_mr * 133 | create_mr(struct pib_dev *dev, struct pib_pd *pd, enum pib_mr_state init_state, 134 | bool fast_reg_mr, int max_page_list_len) 135 | { 136 | struct pib_mr *mr; 137 | unsigned long flags; 138 | u32 mr_num; 139 | void *page_list = NULL; 140 | 141 | mr = kmem_cache_zalloc(pib_mr_cachep, GFP_KERNEL); 142 | if (!mr) 143 | return ERR_PTR(-ENOMEM); 144 | 145 | if (fast_reg_mr && max_page_list_len > 0) { 146 | page_list = kzalloc(sizeof(void *) * max_page_list_len, GFP_KERNEL); 147 | if (!page_list) 148 | goto err_alloc_mr_num; 149 | } 150 | 151 | INIT_LIST_HEAD(&mr->list); 152 | getnstimeofday(&mr->creation_time); 153 | 154 | spin_lock_irqsave(&dev->lock, flags); 155 | mr_num = pib_alloc_obj_num(dev, PIB_BITMAP_MR_START, PIB_MAX_MR, &dev->last_mr_num); 156 | if (mr_num == (u32)-1) { 157 | spin_unlock_irqrestore(&dev->lock, flags); 158 | goto err_alloc_page_list; 159 | } 160 | dev->nr_mr++; 161 | list_add_tail(&mr->list, &dev->mr_head); 162 | mr->mr_num = mr_num; 163 | spin_unlock_irqrestore(&dev->lock, flags); 164 | 165 | if (reg_mr(pd, mr)) 166 | goto err_reg_mr; 167 | 168 | mr->state = init_state; 169 | 170 | mr->page_list = page_list; 171 | mr->max_page_list_len = max_page_list_len; 172 | 173 | return mr; 174 | 175 | err_reg_mr: 176 | spin_lock_irqsave(&dev->lock, flags); 177 | list_del(&mr->list); 178 | dev->nr_mr--; 179 | pib_dealloc_obj_num(dev, PIB_BITMAP_MR_START, mr_num); 180 | spin_unlock_irqrestore(&dev->lock, flags); 181 | 182 | err_alloc_page_list: 183 | if (page_list) 184 | kfree(page_list); 185 | 186 | err_alloc_mr_num: 187 | kmem_cache_free(pib_mr_cachep, mr); 188 | 189 | return ERR_PTR(-ENOMEM); 190 | } 191 | 192 | 193 | int 194 | pib_dereg_mr(struct ib_mr *ibmr) 195 | { 196 | int ret = 0; 197 | struct pib_dev *dev; 198 | struct pib_mr *mr, *mr_comp; 199 | struct pib_pd *pd; 200 | unsigned long flags; 201 | u32 lkey; 202 | 203 | if (!ibmr) 204 | return -EINVAL; 205 | 206 | dev = to_pdev(ibmr->device); 207 | mr = to_pmr(ibmr); 208 | pd = to_ppd(ibmr->pd); 209 | 210 | pib_trace_api(dev, IB_USER_VERBS_CMD_DEREG_MR, mr->mr_num); 211 | 212 | spin_lock_irqsave(&pd->lock, flags); 213 | lkey = (mr->ib_mr.lkey & PIB_MR_INDEX_MASK) >> PIB_MR_INDEX_SHIFT; 214 | mr_comp = pd->mr_table[lkey]; 215 | if (mr == mr_comp) { 216 | pd->mr_table[lkey] = NULL; 217 | pd->nr_mr--; 218 | } else { 219 | pr_err("pib: MR(%u) don't be registered in PD(%u) (pib_dereg_mr)\n", 220 | mr->mr_num, pd->pd_num); 221 | ret = -ENOENT; 222 | } 223 | spin_unlock_irqrestore(&pd->lock, flags); 224 | 225 | if (mr->ib_umem) 226 | ib_umem_release(mr->ib_umem); 227 | 228 | spin_lock_irqsave(&dev->lock, flags); 229 | list_del(&mr->list); 230 | dev->nr_mr--; 231 | pib_dealloc_obj_num(dev, PIB_BITMAP_MR_START, mr->mr_num); 232 | spin_unlock_irqrestore(&dev->lock, flags); 233 | 234 | if (mr->page_list) 235 | kfree(mr->page_list); 236 | 237 | kmem_cache_free(pib_mr_cachep, mr); 238 | 239 | return ret; 240 | } 241 | 242 | #ifdef PIB_FAST_REG_MR_SUPPORT 243 | struct ib_mr * 244 | pib_alloc_fast_reg_mr(struct ib_pd *ibpd, 245 | int max_page_list_len) 246 | { 247 | struct pib_dev *dev; 248 | struct pib_pd *pd; 249 | struct pib_mr *mr; 250 | 251 | if (!ibpd) 252 | return ERR_PTR(-EINVAL); 253 | 254 | dev = to_pdev(ibpd->device); 255 | pd = to_ppd(ibpd); 256 | 257 | mr = create_mr(dev, pd, PIB_MR_FREE, true, max_page_list_len); 258 | if (IS_ERR(mr)) 259 | return (struct ib_mr *)mr; 260 | 261 | mr->start = 0; 262 | mr->length = (u64)-1; 263 | mr->virt_addr = 0; 264 | mr->access_flags = 0; 265 | 266 | pib_trace_api(dev, PIB_USER_VERBS_CMD_ALLOC_FAST_REG_MR, mr->mr_num); 267 | 268 | return &mr->ib_mr; 269 | } 270 | #endif /* PIB_FAST_REG_MR_SUPPORT */ 271 | 272 | struct ib_fast_reg_page_list * 273 | pib_alloc_fast_reg_page_list(struct ib_device *ibdev, 274 | int page_list_len) 275 | { 276 | struct pib_dev *dev; 277 | size_t size; 278 | struct ib_fast_reg_page_list *page_list; 279 | 280 | dev = to_pdev(ibdev); 281 | 282 | size = page_list_len * sizeof(u64); 283 | 284 | if (size > PAGE_SIZE) 285 | return ERR_PTR(-EINVAL); 286 | 287 | page_list = kzalloc(sizeof *page_list, GFP_KERNEL); 288 | if (!page_list) 289 | return ERR_PTR(-ENOMEM); 290 | 291 | page_list->page_list = kzalloc(size, GFP_KERNEL); 292 | if (!page_list->page_list) 293 | goto err_free; 294 | 295 | page_list->device = ibdev; 296 | page_list->max_page_list_len = page_list_len; 297 | 298 | pib_trace_api(dev, PIB_USER_VERBS_CMD_ALLOC_FAST_REG_PAGE_LIST, 0); 299 | 300 | return page_list; 301 | 302 | err_free: 303 | kfree(page_list); 304 | 305 | return ERR_PTR(-ENOMEM); 306 | } 307 | 308 | 309 | void 310 | pib_free_fast_reg_page_list(struct ib_fast_reg_page_list *page_list) 311 | { 312 | struct pib_dev *dev; 313 | 314 | if (!page_list) 315 | return; 316 | 317 | dev = to_pdev(page_list->device); 318 | 319 | pib_trace_api(dev, PIB_USER_VERBS_CMD_FREE_FAST_REG_PAGE_LIST, 0); 320 | 321 | kfree(page_list->page_list); 322 | kfree(page_list); 323 | } 324 | 325 | 326 | enum ib_wc_status 327 | pib_util_mr_copy_data(struct pib_pd *pd, struct ib_sge *sge_array, int num_sge, void *buffer, u64 offset, u64 size, int access_flags, enum pib_mr_direction direction) 328 | { 329 | int i; 330 | 331 | if (PIB_MAX_PAYLOAD_LEN < size) 332 | return IB_WC_LOC_LEN_ERR; 333 | 334 | for (i=0 ; imr_table[(sge.lkey & PIB_MR_INDEX_MASK) >> PIB_MR_INDEX_SHIFT]; 340 | 341 | if (!mr) 342 | return IB_WC_LOC_PROT_ERR; 343 | 344 | if (mr->state != PIB_MR_VALID) 345 | return IB_WC_LOC_PROT_ERR; /* @todo */ 346 | 347 | if (sge.lkey != mr->ib_mr.lkey) 348 | return IB_WC_LOC_PROT_ERR; 349 | 350 | if ((mr->access_flags & access_flags) != access_flags) 351 | return IB_WC_LOC_PROT_ERR; 352 | 353 | range = min_t(u64, sge.length, offset + size); 354 | 355 | offset_tmp = offset; 356 | 357 | if (0 < offset) 358 | offset = (sge.length < offset) ? (offset - sge.length) : 0; 359 | 360 | if ((sge.addr < mr->start) || (mr->start + mr->length <= sge.addr) || 361 | (sge.addr + range <= mr->start) || (mr->start + mr->length < sge.addr + range)) 362 | continue; 363 | 364 | mr_base = sge.addr - mr->start; 365 | 366 | if (offset_tmp < range) { 367 | u64 chunk_size = range - offset_tmp; 368 | mr_copy_data(mr, buffer, mr_base + offset_tmp, chunk_size, 0, 0, direction); 369 | buffer += chunk_size; 370 | size -= chunk_size; 371 | } 372 | 373 | if (size == 0) 374 | return IB_WC_SUCCESS; 375 | } 376 | 377 | return IB_WC_LOC_PROT_ERR; 378 | } 379 | 380 | 381 | enum ib_wc_status 382 | pib_util_mr_verify_rkey_validation(struct pib_pd *pd, u32 rkey, u64 address, u64 size, int access_flags) 383 | { 384 | return copy_data_with_rkey(pd, rkey, NULL, address, size, access_flags, PIB_MR_CHECK, true); 385 | } 386 | 387 | 388 | enum ib_wc_status 389 | pib_util_mr_copy_data_with_rkey(struct pib_pd *pd, u32 rkey, void *buffer, u64 address, u64 size, int access_flags, enum pib_mr_direction direction) 390 | { 391 | return copy_data_with_rkey(pd, rkey, buffer, address, size, access_flags, direction, false); 392 | } 393 | 394 | 395 | static enum ib_wc_status 396 | copy_data_with_rkey(struct pib_pd *pd, u32 rkey, void *buffer, u64 address, u64 size, int access_flags, enum pib_mr_direction direction, bool check_only) 397 | { 398 | struct pib_mr *mr; 399 | 400 | if (PIB_MAX_PAYLOAD_LEN < size) 401 | return IB_WC_LOC_LEN_ERR; 402 | 403 | mr = pd->mr_table[(rkey & PIB_MR_INDEX_MASK) >> PIB_MR_INDEX_SHIFT]; 404 | 405 | if (!mr) 406 | return IB_WC_LOC_PROT_ERR; 407 | 408 | if (mr->state != PIB_MR_VALID) 409 | return IB_WC_LOC_PROT_ERR; /* @todo */ 410 | 411 | if (rkey != mr->ib_mr.rkey) 412 | return IB_WC_LOC_PROT_ERR; 413 | 414 | if ((mr->access_flags & access_flags) != access_flags) 415 | return IB_WC_LOC_PROT_ERR; 416 | 417 | if (mr->is_dma) { 418 | pr_err("pib: Can't use DMA MR in copy_data_with_rkey\n"); /* @todo */ 419 | return IB_WC_LOC_PROT_ERR; 420 | } 421 | 422 | if ((address < mr->start) || (mr->start + mr->length <= address) || 423 | (address + size <= mr->start) || (mr->start + mr->length < address + size)) 424 | return IB_WC_LOC_PROT_ERR; 425 | 426 | if (!check_only) { 427 | if (mr_copy_data(mr, buffer, address - mr->start, size, 0, 0, direction)) 428 | return IB_WC_LOC_PROT_ERR; 429 | } 430 | 431 | return IB_WC_SUCCESS; 432 | } 433 | 434 | 435 | enum ib_wc_status 436 | pib_util_mr_atomic(struct pib_pd *pd, u32 rkey, u64 address, u64 swap, u64 compare, u64 *result, enum pib_mr_direction direction) 437 | { 438 | struct pib_mr *mr; 439 | 440 | mr = pd->mr_table[(rkey & PIB_MR_INDEX_MASK) >> PIB_MR_INDEX_SHIFT]; 441 | 442 | if (!mr) 443 | return IB_WC_LOC_PROT_ERR; 444 | 445 | if (mr->state != PIB_MR_VALID) 446 | return IB_WC_LOC_PROT_ERR; /* @todo */ 447 | 448 | if (rkey != mr->ib_mr.rkey) 449 | return IB_WC_LOC_PROT_ERR; 450 | 451 | if ((mr->access_flags & IB_ACCESS_REMOTE_ATOMIC) != IB_ACCESS_REMOTE_ATOMIC) 452 | return IB_WC_LOC_PROT_ERR; 453 | 454 | if ((address < mr->start) || (mr->start + mr->length <= address) || 455 | (address + 8 <= mr->start) || (mr->start + mr->length < address + 8)) 456 | return IB_WC_LOC_PROT_ERR; 457 | 458 | if (mr_copy_data(mr, result, address - mr->start, 8, swap, compare, 459 | (direction == PIB_MR_FETCHADD) ? PIB_MR_FETCHADD : PIB_MR_CAS)) 460 | return IB_WC_LOC_PROT_ERR; 461 | 462 | return IB_WC_SUCCESS; 463 | } 464 | 465 | #ifndef PIB_NO_NEED_TO_DEFINE_IB_UMEM_OFFSET 466 | static inline int ib_umem_offset(struct ib_umem *umem) 467 | { 468 | return umem->offset; 469 | } 470 | #endif 471 | 472 | static int 473 | mr_copy_data(struct pib_mr *mr, void *buffer, u64 offset, u64 size, u64 swap, u64 compare, enum pib_mr_direction direction) 474 | { 475 | u64 addr; 476 | struct ib_umem *umem; 477 | #if PIB_IB_DMA_MAPPING_VERSION >= 1 478 | struct scatterlist *sg; 479 | int entry; 480 | #else 481 | struct ib_umem_chunk *chunk; 482 | #endif 483 | 484 | if (mr->state != PIB_MR_VALID) 485 | return -EPERM; 486 | 487 | if (mr->is_dma) 488 | goto dma; 489 | 490 | if (size == 0) 491 | return 0; 492 | 493 | umem = mr->ib_umem; 494 | 495 | offset += ib_umem_offset(umem); 496 | 497 | addr = 0; 498 | 499 | if (mr->is_fast_reg_mr) 500 | goto fast_reg_mr; 501 | 502 | #if PIB_IB_DMA_MAPPING_VERSION >= 1 503 | for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) { 504 | void *vaddr; 505 | 506 | vaddr = page_address(sg_page(sg)); 507 | if (!vaddr) 508 | return -EINVAL; 509 | 510 | if ((addr <= offset) && (offset < addr + umem->page_size)) { 511 | u64 range; 512 | void *target_vaddr; 513 | 514 | range = min_t(u64, (addr + umem->page_size - offset), size); 515 | target_vaddr = vaddr + (offset & (umem->page_size - 1)); 516 | 517 | if (mr_copy_data_sub(buffer, target_vaddr, range, swap, compare, direction)) 518 | return 0; 519 | 520 | offset += range; 521 | buffer += range; 522 | size -= range; 523 | } 524 | 525 | if (size == 0) 526 | return 0; 527 | 528 | addr += umem->page_size; 529 | } 530 | #else 531 | list_for_each_entry(chunk, &umem->chunk_list, list) { 532 | int i; 533 | for (i = 0; i < chunk->nents; i++) { 534 | void *vaddr; 535 | 536 | vaddr = page_address(sg_page(&chunk->page_list[i])); 537 | if (!vaddr) 538 | return -EINVAL; 539 | 540 | if ((addr <= offset) && (offset < addr + umem->page_size)) { 541 | u64 range; 542 | void *target_vaddr; 543 | 544 | range = min_t(u64, (addr + umem->page_size - offset), size); 545 | target_vaddr = vaddr + (offset & (umem->page_size - 1)); 546 | 547 | if (mr_copy_data_sub(buffer, target_vaddr, range, swap, compare, direction)) 548 | return 0; 549 | 550 | offset += range; 551 | buffer += range; 552 | size -= range; 553 | } 554 | 555 | if (size == 0) 556 | return 0; 557 | 558 | addr += umem->page_size; 559 | } 560 | } 561 | #endif 562 | 563 | return 0; 564 | 565 | fast_reg_mr: 566 | { 567 | int i; 568 | size_t page_size = 1UL << mr->page_shift; 569 | 570 | for (i = 0 ; i < mr->page_list_len ; i++) { 571 | void *vaddr; 572 | 573 | vaddr = mr->page_list[i]; 574 | 575 | if ((addr <= offset) && (offset < addr + page_size)) { 576 | u64 range; 577 | void *target_vaddr; 578 | 579 | range = min_t(u64, (addr + page_size - offset), size); 580 | target_vaddr = vaddr + (offset & (page_size - 1)); 581 | 582 | if (mr_copy_data_sub(buffer, target_vaddr, range, swap, compare, direction)) 583 | return 0; 584 | 585 | offset += range; 586 | buffer += range; 587 | size -= range; 588 | } 589 | 590 | if (size == 0) 591 | return 0; 592 | 593 | addr += page_size; 594 | } 595 | } 596 | 597 | return 0; 598 | 599 | dma: 600 | mr_copy_data_sub(buffer, (void*)(uintptr_t)offset, size, swap, compare, direction); 601 | 602 | return 0; 603 | } 604 | 605 | static bool 606 | mr_copy_data_sub(void *buffer, void *target_vaddr, u64 range, u64 swap, u64 compare, enum pib_mr_direction direction) 607 | { 608 | u64 res; 609 | 610 | switch (direction) { 611 | 612 | case PIB_MR_COPY_FROM: 613 | memcpy(buffer, target_vaddr, range); 614 | break; 615 | 616 | case PIB_MR_COPY_TO: 617 | memcpy(target_vaddr, buffer, range); 618 | break; 619 | 620 | case PIB_MR_CAS: 621 | *(u64*)buffer = atomic64_cmpxchg((atomic64_t*)target_vaddr, compare, swap); 622 | return true; /* return function */ 623 | 624 | case PIB_MR_FETCHADD: 625 | res = atomic64_add_return(compare, (atomic64_t*)target_vaddr); 626 | *(u64*)buffer = res - compare; 627 | return true; /* return function */ 628 | 629 | default: 630 | BUG(); 631 | } 632 | 633 | return false; 634 | } 635 | 636 | enum ib_wc_status 637 | pib_util_mr_invalidate(struct pib_pd *pd, u32 rkey) 638 | { 639 | struct pib_mr *mr; 640 | 641 | mr = pd->mr_table[(rkey & PIB_MR_INDEX_MASK) >> PIB_MR_INDEX_SHIFT]; 642 | 643 | if (!mr) 644 | return IB_WC_MW_BIND_ERR; 645 | 646 | if (mr->state == PIB_MR_INVALID) 647 | return IB_WC_MW_BIND_ERR; 648 | 649 | #if 0 650 | if (!mr->is_dma) 651 | return IB_WC_MW_BIND_ERR; 652 | #endif 653 | 654 | if (!mr->is_fast_reg_mr) { 655 | pr_err("pib: Invalidate operation must perform only MR generated by alloc_fast_reg_mr\n"); 656 | return IB_WC_MW_BIND_ERR; 657 | } 658 | 659 | if (rkey != mr->ib_mr.rkey) 660 | return IB_WC_MW_BIND_ERR; 661 | 662 | mr->state = PIB_MR_FREE; 663 | 664 | return IB_WC_SUCCESS; 665 | } 666 | 667 | enum ib_wc_status 668 | pib_util_mr_fast_reg_pmr(struct pib_pd *pd, u32 rkey, u64 iova_start, struct ib_fast_reg_page_list *page_list, unsigned int page_shift, unsigned int page_list_len, u32 length, int access_flags) 669 | { 670 | int i; 671 | struct pib_mr *mr; 672 | size_t ps; 673 | 674 | mr = pd->mr_table[(rkey & PIB_MR_INDEX_MASK) >> PIB_MR_INDEX_SHIFT]; 675 | 676 | if (!mr) 677 | return IB_WC_MW_BIND_ERR; 678 | 679 | if ((mr->state == PIB_MR_INVALID) || (mr->state == PIB_MR_VALID)) 680 | return IB_WC_MW_BIND_ERR; 681 | 682 | #if 0 683 | if (!mr->is_dma) 684 | return IB_WC_MW_BIND_ERR; 685 | #endif 686 | 687 | if (!mr->is_fast_reg_mr) { 688 | pr_err("pib: Fast Register PMR operation must perform only MR generated by alloc_fast_reg_mr\n"); 689 | return IB_WC_MW_BIND_ERR; 690 | } 691 | 692 | if (rkey != mr->ib_mr.rkey) 693 | return IB_WC_MW_BIND_ERR; 694 | 695 | ps = 1UL << page_shift; 696 | 697 | if (page_list_len > mr->max_page_list_len) 698 | return IB_WC_MW_BIND_ERR; 699 | 700 | if (page_list_len > page_list->max_page_list_len) 701 | return IB_WC_MW_BIND_ERR; 702 | 703 | if (length > ps * page_list_len) 704 | return IB_WC_MW_BIND_ERR; 705 | 706 | mr->start = iova_start; 707 | mr->virt_addr = iova_start; 708 | /* mr->lkey = rkey; */ 709 | mr->length = length; 710 | mr->access_flags = access_flags; 711 | 712 | mr->page_list_len = page_list_len; 713 | mr->page_shift = page_shift; 714 | 715 | for (i = 0 ; i < page_list_len ; i++) 716 | mr->page_list[i] = (void *) page_list->page_list[i]; 717 | 718 | return IB_WC_SUCCESS; 719 | } 720 | -------------------------------------------------------------------------------- /driver/pib_multicast.c: -------------------------------------------------------------------------------- 1 | /* 2 | * pib_multicast.c - Functions about multicast 3 | * 4 | * Copyright (c) 2013,2014 Minoru NAKAMURA 5 | * 6 | * This code is licenced under the GPL version 2 or BSD license. 7 | */ 8 | #include 9 | #include 10 | #include 11 | 12 | #include "pib.h" 13 | #include "pib_trace.h" 14 | 15 | 16 | int pib_attach_mcast(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) 17 | { 18 | int ret, count; 19 | struct pib_qp *qp; 20 | struct pib_dev *dev; 21 | unsigned long flags; 22 | struct pib_mcast_link *mcast_link; 23 | 24 | pib_debug("pib: pib_attach_mcast(qp=0x%06x, lid=0x%04x)\n", 25 | (int)ibqp->qp_num, lid); 26 | 27 | if (!ibqp) 28 | return -EINVAL; 29 | 30 | if (lid < PIB_MCAST_LID_BASE) 31 | return -EINVAL; 32 | 33 | ret = 0; 34 | 35 | dev = to_pdev(ibqp->device); 36 | qp = to_pqp(ibqp); 37 | 38 | pib_trace_api(dev, IB_USER_VERBS_CMD_ATTACH_MCAST, qp->ib_qp.qp_num); 39 | 40 | spin_lock_irqsave(&dev->lock, flags); 41 | 42 | count = 0; 43 | 44 | list_for_each_entry(mcast_link, &qp->mcast_head, qp_list) { 45 | if (mcast_link->lid == lid) 46 | goto done; 47 | count++; 48 | } 49 | 50 | if (PIB_MCAST_QP_ATTACH < count) { 51 | ret = -ENOMEM; 52 | goto done; 53 | } 54 | 55 | mcast_link = kmem_cache_zalloc(pib_mcast_link_cachep, GFP_ATOMIC); /* @todo 割禁の外に */ 56 | if (!mcast_link) { 57 | ret = -ENOMEM; 58 | goto done; 59 | } 60 | 61 | mcast_link->lid = lid; 62 | mcast_link->qp_num = qp->ib_qp.qp_num; 63 | 64 | INIT_LIST_HEAD(&mcast_link->qp_list); 65 | INIT_LIST_HEAD(&mcast_link->lid_list); 66 | 67 | list_add_tail(&mcast_link->qp_list, &qp->mcast_head); 68 | list_add_tail(&mcast_link->lid_list, &dev->mcast_table[lid - PIB_MCAST_LID_BASE]); 69 | 70 | done: 71 | spin_unlock_irqrestore(&dev->lock, flags); 72 | 73 | return ret; 74 | } 75 | 76 | 77 | int pib_detach_mcast(struct ib_qp *ibqp, union ib_gid *gid, u16 lid) 78 | { 79 | int ret; 80 | struct pib_qp *qp; 81 | struct pib_dev *dev; 82 | unsigned long flags; 83 | struct pib_mcast_link *mcast_link; 84 | 85 | pib_debug("pib: pib_detach_mcast(qp=0x%06x, lid=0x%04x)\n", 86 | (int)ibqp->qp_num, lid); 87 | 88 | if (!ibqp) 89 | return -EINVAL; 90 | 91 | if (lid < PIB_MCAST_LID_BASE) 92 | return -EINVAL; 93 | 94 | ret = 0; 95 | 96 | dev = to_pdev(ibqp->device); 97 | qp = to_pqp(ibqp); 98 | 99 | pib_trace_api(dev, IB_USER_VERBS_CMD_DETACH_MCAST, qp->ib_qp.qp_num); 100 | 101 | spin_lock_irqsave(&dev->lock, flags); 102 | list_for_each_entry(mcast_link, &qp->mcast_head, qp_list) { 103 | if (mcast_link->lid == lid) { 104 | list_del(&mcast_link->qp_list); 105 | list_del(&mcast_link->lid_list); 106 | kmem_cache_free(pib_mcast_link_cachep, mcast_link); 107 | goto done; 108 | } 109 | } 110 | done: 111 | spin_unlock_irqrestore(&dev->lock, flags); 112 | 113 | return 0; 114 | } 115 | 116 | 117 | void pib_detach_all_mcast(struct pib_dev *dev, struct pib_qp *qp) 118 | { 119 | unsigned long flags; 120 | struct pib_mcast_link *mcast_link, *next_mcast_link; 121 | 122 | spin_lock_irqsave(&dev->lock, flags); 123 | list_for_each_entry_safe(mcast_link, next_mcast_link, &qp->mcast_head, qp_list) { 124 | list_del(&mcast_link->qp_list); 125 | list_del(&mcast_link->lid_list); 126 | kmem_cache_free(pib_mcast_link_cachep, mcast_link); 127 | } 128 | spin_unlock_irqrestore(&dev->lock, flags); 129 | } 130 | -------------------------------------------------------------------------------- /driver/pib_packet.h: -------------------------------------------------------------------------------- 1 | /* 2 | * pib_packet.h - Structures of IB packets. 3 | * 4 | * Copyright (c) 2013-2015 Minoru NAKAMURA 5 | * 6 | * This code is licenced under the GPL version 2 or BSD license. 7 | */ 8 | #ifndef PIB_PACKET_H 9 | #define PIB_PACKET_H 10 | 11 | #include 12 | #include 13 | #include 14 | #include 15 | #include 16 | 17 | 18 | enum { 19 | PIB_OPCODE_CNP = 0x80, 20 | PIB_OPCODE_CNP_SEND_NOTIFY = 0x80 21 | }; 22 | 23 | enum { 24 | IB_OPCODE_SEND_LAST_WITH_INVALIDATE = 0x16, 25 | IB_OPCODE_SEND_ONLY_WITH_INVALIDATE = 0x17, 26 | 27 | IB_OPCODE(RC, SEND_LAST_WITH_INVALIDATE), 28 | IB_OPCODE(RC, SEND_ONLY_WITH_INVALIDATE), 29 | }; 30 | 31 | 32 | /* NAK Codes */ 33 | enum pib_syndrome { 34 | /* Major code (bit[7:5]) */ 35 | PIB_SYND_ACK_CODE = 0x00, /* ACK */ 36 | PIB_SYND_RNR_NAK_CODE = 0x20, /* RNR NAK */ 37 | PIB_SYND_NAK_CODE = 0x60, /* General NAK except RNR */ 38 | 39 | /* Major code mask */ 40 | PIB_SYND_CODE_MASK = 0xE0, 41 | 42 | /* Subcode */ 43 | PIB_SYND_NAK_CODE_PSN_SEQ_ERR = 0x60, /* PSN Sequence Error */ 44 | PIB_SYND_NAK_CODE_INV_REQ_ERR = 0x61, /* Invalid Request */ 45 | PIB_SYND_NAK_CODE_REM_ACCESS_ERR = 0x62, /* Remote Access Error */ 46 | PIB_SYND_NAK_CODE_REM_OP_ERR = 0x63, /* Remote Operational Error */ 47 | PIB_SYND_NAK_CODE_INV_RD_REQ_ERR = 0x64 /* Invalid RD Request */ 48 | }; 49 | 50 | 51 | /* Local Route Header */ 52 | struct pib_packet_lrh { 53 | __be16 dlid; 54 | 55 | /* 56 | * Virtual Lane 4 bits 57 | * Link Version 4 bits 58 | */ 59 | u8 vl_lver; 60 | 61 | /* 62 | * Service Level 4 bits 63 | * Reserved 2 bits 64 | * Link Next Header 2 bits 65 | */ 66 | u8 sl_rsv_lnh; 67 | 68 | __be16 slid; 69 | 70 | /* 71 | * Reserved 5 bits 72 | * Packet Length 11 bits 73 | */ 74 | __be16 pktlen; 75 | 76 | } __attribute__ ((packed)); 77 | 78 | 79 | static inline u16 pib_packet_lrh_get_pktlen(const struct pib_packet_lrh *lrh) 80 | { 81 | return be16_to_cpu(lrh->pktlen) & 0x7FF; 82 | } 83 | 84 | 85 | static inline void pib_packet_lrh_set_pktlen(struct pib_packet_lrh *lrh, u16 value) 86 | { 87 | lrh->pktlen = cpu_to_be16(value & 0x7FF); 88 | } 89 | 90 | 91 | /* Base Transport Header */ 92 | struct pib_packet_bth { 93 | u8 OpCode; /* Opcode */ 94 | 95 | /* 96 | * Solicited Event 1 bit 97 | * MigReq 1 bit 98 | * Pad Count 2 bits 99 | * Transport Header Version 4 bits 100 | */ 101 | u8 se_m_padcnt_tver; 102 | 103 | __be16 pkey; /* Partition Key */ 104 | __be32 destQP; /* Destinatino QP (The most significant 8-bits must be zero.) */ 105 | __be32 psn; /* Packet Sequence Number (The MSB is A bit) */ 106 | } __attribute__ ((packed)); 107 | 108 | 109 | static inline u8 pib_packet_bth_get_padcnt(const struct pib_packet_bth *bth) 110 | { 111 | return (bth->se_m_padcnt_tver >> 4) & 0x3; 112 | } 113 | 114 | 115 | static inline void pib_packet_bth_set_padcnt(struct pib_packet_bth *bth, u8 padcnt) 116 | { 117 | bth->se_m_padcnt_tver &= ~0x30; 118 | bth->se_m_padcnt_tver |= ((padcnt & 0x3) << 4); 119 | } 120 | 121 | 122 | static inline u8 pib_packet_bth_get_solicited(const struct pib_packet_bth *bth) 123 | { 124 | return (bth->se_m_padcnt_tver >> 7) & 0x1; 125 | } 126 | 127 | 128 | static inline void pib_packet_bth_set_solicited(struct pib_packet_bth *bth, int solicited) 129 | { 130 | bth->se_m_padcnt_tver &= ~0x80; 131 | bth->se_m_padcnt_tver |= ((!!solicited) << 7); 132 | } 133 | 134 | 135 | /* Datagram Extended Transport Header */ 136 | struct pib_packet_deth { 137 | __be32 qkey; /* Queue Key */ 138 | __be32 srcQP; /* Source QP (The most significant 8-bits must be zero.) */ 139 | } __attribute__ ((packed)); 140 | 141 | 142 | /* RDMA Extended Trasnport Header */ 143 | struct pib_packet_reth { 144 | __u64 vaddr; /* Virtual Address */ 145 | __u32 rkey; /* Remote Key */ 146 | __u32 dmalen; /* DMA Length */ 147 | } __attribute__ ((packed)); 148 | 149 | 150 | /* Atomic Extended Trasnport Header */ 151 | struct pib_packet_atomiceth { 152 | __u64 vaddr; /* Virtual Address */ 153 | __u32 rkey; /* Remote Key */ 154 | __u64 swap_dt;/* Swap (or Add) Data */ 155 | __u64 cmp_dt; /* Compare Data */ 156 | } __attribute__ ((packed)); 157 | 158 | 159 | /* ACK Extended Transport Header */ 160 | struct pib_packet_aeth { 161 | /* 162 | * Syndrome 8 bits 163 | * Message Sequence Number 24 bits 164 | */ 165 | __u32 syndrome_msn; 166 | } __attribute__ ((packed)); 167 | 168 | 169 | /* Atomic ACK Extended Transport Header */ 170 | struct pib_packet_atomicacketh { 171 | __u64 orig_rem_dt; /* Virtual Address */ 172 | } __attribute__ ((packed)); 173 | 174 | 175 | /* Invalidate Extended Transport */ 176 | struct pib_packet_ieth { 177 | __u32 rkey; /* Remote Key */ 178 | } __attribute__ ((packed)); 179 | 180 | 181 | struct pib_packet_link { 182 | __be32 cmd; 183 | } __attribute__ ((packed)); 184 | 185 | 186 | union pib_packet_footer { 187 | struct { 188 | __be16 vcrc; /* Variant CRC */ 189 | } native; 190 | struct { 191 | __be64 port_guid; 192 | } pib; 193 | } __attribute__ ((packed)); 194 | 195 | 196 | #endif /* PIB_PACKET_H */ 197 | -------------------------------------------------------------------------------- /driver/pib_pd.c: -------------------------------------------------------------------------------- 1 | /* 2 | * pib_pd.c - Protection Domain(PD) functions 3 | * 4 | * Copyright (c) 2013,2014 Minoru NAKAMURA 5 | * 6 | * This code is licenced under the GPL version 2 or BSD license. 7 | */ 8 | #include 9 | #include 10 | #include 11 | 12 | #include "pib.h" 13 | #include "pib_spinlock.h" 14 | #include "pib_trace.h" 15 | 16 | 17 | struct ib_pd * 18 | pib_alloc_pd(struct ib_device *ibdev, 19 | struct ib_ucontext *ibucontext, 20 | struct ib_udata *udata) 21 | { 22 | struct pib_dev *dev; 23 | struct pib_pd *pd; 24 | unsigned long flags; 25 | u32 pd_num; 26 | 27 | if (!ibdev) 28 | return ERR_PTR(-EINVAL); 29 | 30 | dev = to_pdev(ibdev); 31 | 32 | pd = kzalloc(sizeof *pd, GFP_KERNEL); 33 | if (!pd) 34 | return ERR_PTR(-ENOMEM); 35 | 36 | INIT_LIST_HEAD(&pd->list); 37 | getnstimeofday(&pd->creation_time); 38 | 39 | spin_lock_init(&pd->lock); 40 | 41 | spin_lock_irqsave(&dev->lock, flags); 42 | pd_num = pib_alloc_obj_num(dev, PIB_BITMAP_PD_START, PIB_MAX_PD, &dev->last_pd_num); 43 | if (pd_num == (u32)-1) { 44 | spin_unlock_irqrestore(&dev->lock, flags); 45 | goto err_alloc_pd_num; 46 | } 47 | dev->nr_pd++; 48 | list_add_tail(&pd->list, &dev->pd_head); 49 | pd->pd_num = pd_num; 50 | spin_unlock_irqrestore(&dev->lock, flags); 51 | 52 | pd->mr_table = vzalloc(sizeof(struct pib_mr*) * PIB_MAX_MR_PER_PD); 53 | if (!pd->mr_table) 54 | goto err_mr_table; 55 | 56 | pib_trace_api(dev, IB_USER_VERBS_CMD_ALLOC_PD, pd_num); 57 | 58 | return &pd->ib_pd; 59 | 60 | err_mr_table: 61 | spin_lock_irqsave(&dev->lock, flags); 62 | list_del(&pd->list); 63 | dev->nr_pd--; 64 | pib_dealloc_obj_num(dev, PIB_BITMAP_PD_START, pd_num); 65 | spin_unlock_irqrestore(&dev->lock, flags); 66 | 67 | err_alloc_pd_num: 68 | kfree(pd); 69 | 70 | return ERR_PTR(-ENOMEM); 71 | } 72 | 73 | 74 | int pib_dealloc_pd(struct ib_pd *ibpd) 75 | { 76 | struct pib_dev *dev; 77 | struct pib_pd *pd; 78 | unsigned long flags; 79 | 80 | if (!ibpd) 81 | return 0; 82 | 83 | dev = to_pdev(ibpd->device); 84 | pd = to_ppd(ibpd); 85 | 86 | pib_trace_api(dev, IB_USER_VERBS_CMD_DEALLOC_PD, pd->pd_num); 87 | 88 | spin_lock_irqsave(&pd->lock, flags); 89 | if (pd->nr_mr > 0) 90 | pr_err("pib: pib_dealloc_pd: nr_mr=%d\n", pd->nr_mr); 91 | spin_unlock_irqrestore(&pd->lock, flags); 92 | 93 | vfree(pd->mr_table); 94 | 95 | spin_lock_irqsave(&dev->lock, flags); 96 | list_del(&pd->list); 97 | dev->nr_pd--; 98 | pib_dealloc_obj_num(dev, PIB_BITMAP_PD_START, pd->pd_num); 99 | spin_unlock_irqrestore(&dev->lock, flags); 100 | 101 | kfree(pd); 102 | 103 | return 0; 104 | } 105 | -------------------------------------------------------------------------------- /driver/pib_spinlock.h: -------------------------------------------------------------------------------- 1 | /* 2 | * pib_spinlock.h - Recursive pinlock declarations for pib 3 | * 4 | * Copyright (c) 2015 Minoru NAKAMURA 5 | * 6 | * This code is licenced under the GPL version 2 or BSD license. 7 | */ 8 | #ifndef PIB_SPINLOCK_H 9 | #define PIB_SPINLOCK_H 10 | 11 | #include 12 | #include 13 | 14 | struct pib_spinlock { 15 | spinlock_t lock; 16 | struct task_struct *owner; 17 | int depth; 18 | }; 19 | 20 | typedef struct pib_spinlock pib_spinlock_t; 21 | 22 | #define pib_spin_lock_init(lockp) \ 23 | do { \ 24 | spin_lock_init(&(lockp)->lock); \ 25 | (lockp)->owner = NULL; \ 26 | (lockp)->depth = 0; \ 27 | } while (0) 28 | 29 | #define pib_spin_lock(lockp) \ 30 | do { \ 31 | if ((lockp)->owner != current) { \ 32 | spin_lock(&(lockp)->lock); \ 33 | (lockp)->owner = current; \ 34 | } \ 35 | (lockp)->depth++; \ 36 | } while (0) 37 | 38 | #define pib_spin_unlock(lockp) \ 39 | do { \ 40 | (lockp)->depth--; \ 41 | if ((lockp)->depth == 0) { \ 42 | (lockp)->owner = NULL; \ 43 | spin_unlock(&(lockp)->lock); \ 44 | } \ 45 | } while (0) 46 | 47 | #define pib_spin_lock_irqsave(lockp, flags) \ 48 | do { \ 49 | if ((lockp)->owner != current) { \ 50 | spin_lock_irqsave(&(lockp)->lock, flags); \ 51 | (lockp)->owner = current; \ 52 | } else { \ 53 | (flags) = 0; /* keep compiler quiet */ \ 54 | } \ 55 | (lockp)->depth++; \ 56 | } while (0) 57 | 58 | #define pib_spin_unlock_irqrestore(lockp, flags) \ 59 | do { \ 60 | (lockp)->depth--; \ 61 | if ((lockp)->depth == 0) { \ 62 | (lockp)->owner = NULL; \ 63 | spin_unlock_irqrestore(&(lockp)->lock, flags); \ 64 | } \ 65 | } while (0) 66 | 67 | static inline int pib_spin_is_locked(pib_spinlock_t *lockp) 68 | { 69 | return spin_is_locked(&lockp->lock); 70 | } 71 | 72 | #endif /* PIB_SPINLOCK_H */ 73 | -------------------------------------------------------------------------------- /driver/pib_srq.c: -------------------------------------------------------------------------------- 1 | /* 2 | * pib_srq.c - Shared Receive Queeu(SRQ) functions 3 | * 4 | * Copyright (c) 2013-2015 Minoru NAKAMURA 5 | * 6 | * This code is licenced under the GPL version 2 or BSD license. 7 | */ 8 | #include 9 | #include 10 | 11 | #include "pib.h" 12 | #include "pib_spinlock.h" 13 | #include "pib_trace.h" 14 | 15 | 16 | static volatile int post_srq_recv_counter; /* これは厳密でなくてもよい */ 17 | 18 | 19 | static void srq_error_handler(struct pib_work_struct *work); 20 | 21 | 22 | static int pib_srq_attr_is_ok(const struct pib_dev *dev, const struct ib_srq_attr *attr) 23 | { 24 | if ((attr->max_wr < 1) || (dev->ib_dev_attr.max_srq_wr < attr->max_wr)) 25 | return 0; 26 | 27 | if ((attr->max_sge < 1) || (dev->ib_dev_attr.max_srq_sge < attr->max_sge)) 28 | return 0; 29 | 30 | return 1; 31 | } 32 | 33 | 34 | struct ib_srq *pib_create_srq(struct ib_pd *ibpd, 35 | struct ib_srq_init_attr *init_attr, 36 | struct ib_udata *udata) 37 | { 38 | int i; 39 | struct pib_dev *dev; 40 | struct pib_srq *srq; 41 | unsigned long flags; 42 | u32 srq_num; 43 | 44 | if (!ibpd || !init_attr) 45 | return ERR_PTR(-EINVAL); 46 | 47 | dev = to_pdev(ibpd->device); 48 | 49 | if (!pib_srq_attr_is_ok(dev, &init_attr->attr)) 50 | return ERR_PTR(-EINVAL); 51 | 52 | srq = kmem_cache_zalloc(pib_srq_cachep, GFP_KERNEL); 53 | if (!srq) 54 | return ERR_PTR(-ENOMEM); 55 | 56 | INIT_LIST_HEAD(&srq->list); 57 | getnstimeofday(&srq->creation_time); 58 | 59 | spin_lock_irqsave(&dev->lock, flags); 60 | srq_num = pib_alloc_obj_num(dev, PIB_BITMAP_SRQ_START, PIB_MAX_SRQ, &dev->last_srq_num); 61 | if (srq_num == (u32)-1) { 62 | spin_unlock_irqrestore(&dev->lock, flags); 63 | goto err_alloc_srq_num; 64 | } 65 | dev->nr_srq++; 66 | list_add_tail(&srq->list, &dev->srq_head); 67 | spin_unlock_irqrestore(&dev->lock, flags); 68 | 69 | srq->srq_num = srq_num; 70 | srq->state = PIB_STATE_OK; 71 | 72 | srq->ib_srq_attr = init_attr->attr; 73 | srq->ib_srq_attr.srq_limit = 0; /* srq_limit isn't set when ibv_craete_srq */ 74 | 75 | pib_spin_lock_init(&srq->lock); 76 | INIT_LIST_HEAD(&srq->recv_wqe_head); 77 | INIT_LIST_HEAD(&srq->free_recv_wqe_head); 78 | PIB_INIT_WORK(&srq->work, dev, srq, srq_error_handler); 79 | 80 | for (i=0 ; iib_srq_attr.max_wr ; i++) { 81 | struct pib_recv_wqe *recv_wqe; 82 | 83 | recv_wqe = kmem_cache_zalloc(pib_recv_wqe_cachep, GFP_KERNEL); 84 | if (!recv_wqe) 85 | goto err_alloc_wqe; 86 | 87 | INIT_LIST_HEAD(&recv_wqe->list); 88 | list_add_tail(&recv_wqe->list, &srq->free_recv_wqe_head); 89 | } 90 | 91 | pib_trace_api(dev, IB_USER_VERBS_CMD_CREATE_SRQ, srq_num); 92 | 93 | return &srq->ib_srq; 94 | 95 | err_alloc_wqe: 96 | while (!list_empty(&srq->free_recv_wqe_head)) { 97 | struct pib_recv_wqe *recv_wqe; 98 | recv_wqe = list_first_entry(&srq->free_recv_wqe_head, struct pib_recv_wqe, list); 99 | list_del_init(&recv_wqe->list); 100 | kmem_cache_free(pib_recv_wqe_cachep, recv_wqe); 101 | } 102 | 103 | spin_lock_irqsave(&dev->lock, flags); 104 | list_del(&srq->list); 105 | dev->nr_srq--; 106 | pib_dealloc_obj_num(dev, PIB_BITMAP_SRQ_START, srq_num); 107 | spin_unlock_irqrestore(&dev->lock, flags); 108 | 109 | err_alloc_srq_num: 110 | kmem_cache_free(pib_srq_cachep, srq); 111 | 112 | return ERR_PTR(-ENOMEM); 113 | } 114 | 115 | 116 | int pib_destroy_srq(struct ib_srq *ibsrq) 117 | { 118 | struct pib_dev *dev; 119 | struct pib_srq *srq; 120 | struct pib_recv_wqe *recv_wqe, *next; 121 | unsigned long flags; 122 | 123 | if (!ibsrq) 124 | return 0; 125 | 126 | dev = to_pdev(ibsrq->device); 127 | srq = to_psrq(ibsrq); 128 | 129 | pib_trace_api(dev, IB_USER_VERBS_CMD_DESTROY_SRQ, srq->srq_num); 130 | 131 | pib_spin_lock_irqsave(&srq->lock, flags); 132 | list_for_each_entry_safe(recv_wqe, next, &srq->recv_wqe_head, list) { 133 | list_del_init(&recv_wqe->list); 134 | kmem_cache_free(pib_recv_wqe_cachep, recv_wqe); 135 | } 136 | list_for_each_entry_safe(recv_wqe, next, &srq->free_recv_wqe_head, list) { 137 | list_del_init(&recv_wqe->list); 138 | kmem_cache_free(pib_recv_wqe_cachep, recv_wqe); 139 | } 140 | srq->nr_recv_wqe = 0; 141 | pib_spin_unlock_irqrestore(&srq->lock, flags); 142 | 143 | spin_lock_irqsave(&dev->lock, flags); 144 | list_del(&srq->list); 145 | dev->nr_srq--; 146 | pib_dealloc_obj_num(dev, PIB_BITMAP_SRQ_START, srq->srq_num); 147 | spin_unlock_irqrestore(&dev->lock, flags); 148 | 149 | kmem_cache_free(pib_srq_cachep, srq); 150 | 151 | return 0; 152 | } 153 | 154 | 155 | int pib_modify_srq(struct ib_srq *ibsrq, struct ib_srq_attr *attr, 156 | enum ib_srq_attr_mask attr_mask, struct ib_udata *udata) 157 | { 158 | int ret; 159 | struct pib_dev *dev; 160 | struct pib_srq *srq; 161 | unsigned long flags; 162 | 163 | if (!ibsrq || !attr) 164 | return -EINVAL; 165 | 166 | dev = to_pdev(ibsrq->device); 167 | srq = to_psrq(ibsrq); 168 | 169 | pib_trace_api(dev, IB_USER_VERBS_CMD_MODIFY_SRQ, srq->srq_num); 170 | 171 | pib_spin_lock_irqsave(&srq->lock, flags); 172 | 173 | if (srq->state != PIB_STATE_OK) { 174 | ret = -EACCES; 175 | goto done; 176 | } 177 | 178 | if (attr_mask & IB_SRQ_MAX_WR) { 179 | struct ib_srq_attr new_attr; 180 | 181 | if (!(dev->ib_dev_attr.device_cap_flags & IB_DEVICE_SRQ_RESIZE)) { 182 | pib_debug("pib: Can't resize SRQ w/o DEVICE_SRQ_RESIZE\n"); 183 | ret = -EINVAL; 184 | goto done; 185 | } 186 | 187 | new_attr = srq->ib_srq_attr; 188 | new_attr.max_wr = attr->max_wr; 189 | new_attr.max_sge = attr->max_sge; 190 | 191 | if (!pib_srq_attr_is_ok(dev, &new_attr)) { 192 | ret = -EINVAL; 193 | goto done; 194 | } 195 | 196 | /* @todo ここで free_recv 増減を行う */ 197 | 198 | srq->ib_srq_attr = new_attr; 199 | } 200 | 201 | if (attr_mask & IB_SRQ_LIMIT) { 202 | srq->ib_srq_attr.srq_limit = attr->srq_limit; 203 | srq->issue_srq_limit = 0; 204 | } 205 | 206 | ret = 0; 207 | 208 | done: 209 | pib_spin_unlock_irqrestore(&srq->lock, flags); 210 | 211 | return ret; 212 | } 213 | 214 | 215 | int pib_query_srq(struct ib_srq *ibsrq, struct ib_srq_attr *attr) 216 | { 217 | int ret; 218 | struct pib_dev *dev; 219 | struct pib_srq *srq; 220 | unsigned long flags; 221 | 222 | if (!ibsrq || !attr) 223 | return -EINVAL; 224 | 225 | dev = to_pdev(ibsrq->device); 226 | srq = to_psrq(ibsrq); 227 | 228 | pib_trace_api(dev, IB_USER_VERBS_CMD_QUERY_SRQ, srq->srq_num); 229 | 230 | pib_spin_lock_irqsave(&srq->lock, flags); 231 | 232 | if (srq->state != PIB_STATE_OK) { 233 | ret = -EACCES; 234 | goto done; 235 | } 236 | 237 | *attr = srq->ib_srq_attr; 238 | 239 | ret = 0; 240 | 241 | done: 242 | pib_spin_unlock_irqrestore(&srq->lock, flags); 243 | 244 | return ret; 245 | } 246 | 247 | 248 | int pib_post_srq_recv(struct ib_srq *ibsrq, struct ib_recv_wr *ibwr, 249 | struct ib_recv_wr **bad_wr) 250 | { 251 | int i, ret = 0; 252 | struct pib_dev *dev; 253 | struct pib_recv_wqe *recv_wqe; 254 | struct pib_srq *srq; 255 | u64 total_length = 0; 256 | unsigned long flags; 257 | 258 | if (!ibsrq || !ibwr) 259 | return -EINVAL; 260 | 261 | dev = to_pdev(ibsrq->device); 262 | srq = to_psrq(ibsrq); 263 | 264 | pib_trace_api(dev, IB_USER_VERBS_CMD_POST_SRQ_RECV, srq->srq_num); 265 | 266 | pib_spin_lock_irqsave(&srq->lock, flags); 267 | 268 | /* 269 | * No state checking 270 | * 271 | * IBA Spec. Vol.1 10.2.9.5 SRQ STATES 272 | * Even if a SRQ is in the error state, the consumer may be able to 273 | * post WR to the SRQ. 274 | */ 275 | 276 | next_wr: 277 | if ((ibwr->num_sge < 1) || (srq->ib_srq_attr.max_sge < ibwr->num_sge)) { 278 | ret = -EINVAL; 279 | goto err; 280 | } 281 | 282 | if (list_empty(&srq->free_recv_wqe_head)) { 283 | ret = -ENOMEM; 284 | goto err; 285 | } 286 | 287 | recv_wqe = list_first_entry(&srq->free_recv_wqe_head, struct pib_recv_wqe, list); 288 | list_del_init(&recv_wqe->list); /* list_del でいい? */ 289 | 290 | recv_wqe->wr_id = ibwr->wr_id; 291 | recv_wqe->num_sge = ibwr->num_sge; 292 | 293 | for (i=0 ; inum_sge ; i++) { 294 | recv_wqe->sge_array[i] = ibwr->sg_list[i]; 295 | 296 | if (pib_get_behavior(PIB_BEHAVIOR_ZERO_LEN_SGE_CONSIDER_AS_MAX_LEN)) 297 | if (ibwr->sg_list[i].length == 0) 298 | ibwr->sg_list[i].length = PIB_MAX_PAYLOAD_LEN; 299 | 300 | total_length += ibwr->sg_list[i].length; 301 | } 302 | 303 | if (PIB_MAX_PAYLOAD_LEN < total_length) { 304 | ret = -EMSGSIZE; 305 | goto err; 306 | } 307 | 308 | recv_wqe->total_length = (u32)total_length; 309 | 310 | list_add_tail(&recv_wqe->list, &srq->recv_wqe_head); 311 | 312 | srq->nr_recv_wqe++; 313 | 314 | ibwr = ibwr->next; 315 | if (ibwr) 316 | goto next_wr; 317 | 318 | err: 319 | pib_spin_unlock_irqrestore(&srq->lock, flags); 320 | 321 | if (ret && bad_wr) 322 | *bad_wr = ibwr; 323 | 324 | return ret; 325 | } 326 | 327 | 328 | struct pib_recv_wqe * 329 | pib_util_get_srq(struct pib_srq *srq) 330 | { 331 | unsigned long flags; 332 | struct pib_recv_wqe *recv_wqe = NULL; 333 | 334 | pib_spin_lock_irqsave(&srq->lock, flags); 335 | 336 | if (srq->state != PIB_STATE_OK) 337 | goto skip; 338 | 339 | if (list_empty(&srq->recv_wqe_head)) 340 | goto skip; 341 | 342 | recv_wqe = list_first_entry(&srq->recv_wqe_head, struct pib_recv_wqe, list); 343 | list_del_init(&recv_wqe->list); 344 | srq->nr_recv_wqe--; 345 | 346 | if ((srq->ib_srq_attr.srq_limit != 0) && 347 | (srq->issue_srq_limit == 0) && 348 | (srq->nr_recv_wqe < srq->ib_srq_attr.srq_limit)) { 349 | struct ib_event ev; 350 | 351 | srq->issue_srq_limit = 1; 352 | 353 | ev.event = IB_EVENT_SRQ_LIMIT_REACHED; 354 | ev.device = srq->ib_srq.device; 355 | ev.element.srq = &srq->ib_srq; 356 | 357 | srq->ib_srq.event_handler(&ev, srq->ib_srq.srq_context); 358 | } 359 | 360 | skip: 361 | pib_spin_unlock_irqrestore(&srq->lock, flags); 362 | 363 | return recv_wqe; 364 | } 365 | 366 | 367 | void pib_util_insert_async_srq_error(struct pib_dev *dev, struct pib_srq *srq) 368 | { 369 | struct ib_event ev; 370 | struct pib_qp *qp; 371 | unsigned long flags; 372 | 373 | pib_trace_async(dev, IB_EVENT_SRQ_ERR, srq->srq_num); 374 | 375 | pib_spin_lock_irqsave(&srq->lock, flags); 376 | 377 | srq->state = PIB_STATE_ERR; 378 | 379 | ev.event = IB_EVENT_SRQ_ERR; 380 | ev.device = srq->ib_srq.device; 381 | ev.element.srq = &srq->ib_srq; 382 | srq->ib_srq.event_handler(&ev, srq->ib_srq.srq_context); 383 | 384 | pib_spin_unlock_irqrestore(&srq->lock, flags); 385 | 386 | /* ここでは srq はロックしない */ 387 | 388 | list_for_each_entry(qp, &dev->qp_head, list) { 389 | pib_spin_lock(&qp->lock); 390 | if (srq == to_psrq(qp->ib_qp_init_attr.srq)) { 391 | qp->state = IB_QPS_ERR; 392 | pib_util_flush_qp(qp, 0); 393 | pib_util_insert_async_qp_error(qp, IB_EVENT_QP_FATAL); 394 | } 395 | pib_spin_unlock(&qp->lock); 396 | } 397 | } 398 | 399 | 400 | static void srq_error_handler(struct pib_work_struct *work) 401 | { 402 | struct pib_srq *srq = work->data; 403 | struct pib_dev *dev = work->dev; 404 | 405 | BUG_ON(!spin_is_locked(&dev->lock)); 406 | 407 | /* srq はロックしない */ 408 | 409 | pib_util_insert_async_srq_error(dev, srq); 410 | } 411 | -------------------------------------------------------------------------------- /driver/pib_trace.h: -------------------------------------------------------------------------------- 1 | /* 2 | * pib_trace.h - Execution trace 3 | * 4 | * Copyright (c) 2013-2016 Minoru NAKAMURA 5 | * 6 | * This code is licenced under the GPL version 2 or BSD license. 7 | */ 8 | #ifndef PIB_TRACE_H 9 | #define PIB_TRACE_H 10 | 11 | #include 12 | #include 13 | #include 14 | 15 | 16 | #define PIB_TRACE_MAX_ENTRIES (65536) 17 | 18 | 19 | enum { 20 | PIB_USER_VERBS_CMD_DEALLOC_CONTEXT = 52, /* IB_USER_VERBS_CMD_THRESHOLD */ 21 | PIB_USER_VERBS_CMD_MODIFY_DEVICE, 22 | PIB_USER_VERBS_CMD_MODIFY_PORT, 23 | PIB_USER_VERBS_CMD_MODIFY_CQ, 24 | #ifdef PIB_FAST_REG_MR_SUPPORT 25 | PIB_USER_VERBS_CMD_ALLOC_FAST_REG_MR, 26 | #endif 27 | PIB_USER_VERBS_CMD_ALLOC_FAST_REG_PAGE_LIST, 28 | PIB_USER_VERBS_CMD_FREE_FAST_REG_PAGE_LIST 29 | }; 30 | 31 | 32 | struct pib_dev; 33 | 34 | extern void pib_trace_api(struct pib_dev *dev, int cmd, u32 oid); 35 | extern void pib_trace_send(struct pib_dev *dev, u8 port_num, int size); 36 | extern void pib_trace_recv(struct pib_dev *dev, u8 port_num, u8 opcode, u32 psn, int size, u16 slid, u16 dlid, u32 dqpn); 37 | extern void pib_trace_recv_ok(struct pib_dev *dev, u8 port_num, u8 opcode, u32 psn, u32 sqpn, u32 data); 38 | extern void pib_trace_retry(struct pib_dev *dev, u8 port_num, struct pib_send_wqe *send_wqe); 39 | extern void pib_trace_comp(struct pib_dev *dev, struct pib_cq *cq, const struct ib_wc *wc); 40 | extern void pib_trace_async(struct pib_dev *dev, enum ib_event_type type, u32 oid); 41 | 42 | 43 | #endif /* PIB_TRACE_H */ 44 | -------------------------------------------------------------------------------- /driver/pib_ucontext.c: -------------------------------------------------------------------------------- 1 | /* 2 | * pib_ucontext.c - User Context functions 3 | * 4 | * Copyright (c) 2013,2014 Minoru NAKAMURA 5 | * 6 | * This code is licenced under the GPL version 2 or BSD license. 7 | */ 8 | #include 9 | #include 10 | 11 | #include "pib.h" 12 | #include "pib_trace.h" 13 | 14 | 15 | struct ib_ucontext * 16 | pib_alloc_ucontext(struct ib_device *ibdev, 17 | struct ib_udata *udata) 18 | { 19 | unsigned long flags; 20 | struct pib_dev *dev; 21 | struct pib_ucontext *ucontext; 22 | u32 ucontext_num; 23 | 24 | if (!ibdev) 25 | return ERR_PTR(-EINVAL); 26 | 27 | dev = to_pdev(ibdev); 28 | 29 | ucontext = kzalloc(sizeof *ucontext, GFP_KERNEL); 30 | if (!ucontext) 31 | return ERR_PTR(-ENOMEM); 32 | 33 | pib_trace_api(dev, IB_USER_VERBS_CMD_GET_CONTEXT, 0); 34 | 35 | INIT_LIST_HEAD(&ucontext->list); 36 | getnstimeofday(&ucontext->creation_time); 37 | 38 | spin_lock_irqsave(&dev->lock, flags); 39 | ucontext_num = pib_alloc_obj_num(dev, PIB_BITMAP_CONTEXT_START, PIB_MAX_CONTEXT, &dev->last_ucontext_num); 40 | if (ucontext_num == (u32)-1) { 41 | spin_unlock_irqrestore(&dev->lock, flags); 42 | goto err_alloc_ucontext_num; 43 | } 44 | dev->nr_ucontext++; 45 | list_add_tail(&ucontext->list, &dev->ucontext_head); 46 | ucontext->ucontext_num = ucontext_num; 47 | spin_unlock_irqrestore(&dev->lock, flags); 48 | 49 | memcpy(ucontext->comm, current->comm, sizeof(current->comm)); 50 | ucontext->tgid = current->tgid; 51 | 52 | return &ucontext->ib_ucontext; 53 | 54 | err_alloc_ucontext_num: 55 | kfree(ucontext); 56 | 57 | return ERR_PTR(-ENOMEM); 58 | } 59 | 60 | 61 | int pib_dealloc_ucontext(struct ib_ucontext *ibcontext) 62 | { 63 | unsigned long flags; 64 | struct pib_dev *dev; 65 | struct pib_ucontext *ucontext; 66 | 67 | if (!ibcontext) 68 | return 0; 69 | 70 | dev = to_pdev(ibcontext->device); 71 | ucontext = to_pucontext(ibcontext); 72 | 73 | pib_trace_api(dev, PIB_USER_VERBS_CMD_DEALLOC_CONTEXT, 0); 74 | 75 | spin_lock_irqsave(&dev->lock, flags); 76 | list_del(&ucontext->list); 77 | dev->nr_ucontext--; 78 | pib_dealloc_obj_num(dev, PIB_BITMAP_CONTEXT_START, ucontext->ucontext_num); 79 | spin_unlock_irqrestore(&dev->lock, flags); 80 | 81 | kfree(ucontext); 82 | 83 | return 0; 84 | } 85 | -------------------------------------------------------------------------------- /driver/pib_ud.c: -------------------------------------------------------------------------------- 1 | /* 2 | * pib_ud.c - Unreliable Datagram service processing 3 | * 4 | * Copyright (c) 2013-2015 Minoru NAKAMURA 5 | * 6 | * This code is licenced under the GPL version 2 or BSD license. 7 | */ 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | #include 18 | #include 19 | #include 20 | #include 21 | #include 22 | #include /* for struct sock */ 23 | #include 24 | #include 25 | 26 | #include "pib.h" 27 | #include "pib_spinlock.h" 28 | #include "pib_packet.h" 29 | #include "pib_trace.h" 30 | 31 | 32 | /* 33 | * state は RTS 34 | * 35 | * Lock: qp 36 | */ 37 | int pib_process_ud_qp_request(struct pib_dev *dev, struct pib_qp *qp, struct pib_send_wqe *send_wqe) 38 | { 39 | int ret; 40 | int push_wc; 41 | struct pib_pd *pd; 42 | void *buffer; 43 | u8 port_num; 44 | struct pib_ah *ah; 45 | u16 slid, dlid; 46 | struct pib_packet_lrh *lrh; 47 | struct ib_grh *grh; 48 | struct pib_packet_bth *bth; 49 | struct pib_packet_deth *deth; 50 | u8 lnh; 51 | enum ib_wr_opcode opcode; 52 | enum ib_wc_status status = IB_WC_SUCCESS; 53 | int with_imm; 54 | unsigned long flags; 55 | u32 packet_length, fix_packet_length; 56 | 57 | opcode = send_wqe->opcode; 58 | 59 | with_imm = (opcode == IB_WR_SEND_WITH_IMM); 60 | 61 | /* Check Opcode */ 62 | switch (opcode) { 63 | case IB_WR_SEND: 64 | case IB_WR_SEND_WITH_IMM: 65 | break; 66 | 67 | default: 68 | /* Unsupported Opcode */ 69 | status = IB_WC_LOC_QP_OP_ERR; 70 | goto completion_error; 71 | } 72 | 73 | /* Check address handle */ 74 | ah = to_pah(send_wqe->wr.ud.ah); 75 | if (!ah) { 76 | status = IB_WC_LOC_QP_OP_ERR; 77 | goto completion_error; 78 | } 79 | 80 | /* Check P_Key index */ 81 | if (PIB_PKEY_TABLE_LEN <= send_wqe->wr.ud.pkey_index) { 82 | status = IB_WC_LOC_QP_OP_ERR; 83 | goto completion_error; 84 | } 85 | 86 | if (qp->ib_qp.pd != ah->ib_ah.pd) { 87 | /* @todo PIB_BEHAVIOR_AH_PD_VIOLATOIN_COMP_ERR が立っていればここにはこない */ 88 | status = IB_WC_LOC_QP_OP_ERR; 89 | goto completion_error; 90 | } 91 | 92 | /* Check port_num */ 93 | port_num = ah->ib_ah_attr.port_num; 94 | if (port_num < 1 || dev->ib_dev.phys_port_cnt < port_num) { 95 | status = IB_WC_LOC_QP_OP_ERR; 96 | goto completion_error; 97 | } 98 | 99 | if (qp->qp_type == IB_QPT_UD) /* ignore port_num check if SMI QP and GSI QP */ 100 | if (qp->ib_qp_attr.port_num != port_num) { 101 | status = IB_WC_LOC_QP_OP_ERR; 102 | goto completion_error; 103 | } 104 | 105 | slid = dev->ports[port_num - 1].ib_port_attr.lid; 106 | dlid = ah->ib_ah_attr.dlid; 107 | 108 | push_wc = (qp->ib_qp_init_attr.sq_sig_type == IB_SIGNAL_ALL_WR) 109 | || (send_wqe->send_flags & IB_SEND_SIGNALED); 110 | 111 | pd = to_ppd(qp->ib_qp.pd); 112 | 113 | buffer = dev->thread.send_buffer; 114 | 115 | memset(buffer, 0, sizeof(*lrh) + sizeof(*grh) + sizeof(*bth) + sizeof(*deth)); 116 | 117 | /* write IB Packet Header (LRH, GRH, BTH, DETH) */ 118 | lrh = (struct pib_packet_lrh*)buffer; 119 | buffer += sizeof(*lrh); 120 | if (ah->ib_ah_attr.ah_flags & IB_AH_GRH) { 121 | grh = (struct ib_grh*)buffer; 122 | pib_fill_grh(dev, port_num, grh, &ah->ib_ah_attr.grh); 123 | buffer += sizeof(*grh); 124 | lnh = 0x3; 125 | } else { 126 | grh = NULL; 127 | lnh = 0x2; 128 | } 129 | bth = (struct pib_packet_bth*)buffer; 130 | buffer += sizeof(*bth); 131 | deth = (struct pib_packet_deth*)buffer; 132 | buffer += sizeof(*deth); 133 | 134 | bth->OpCode = with_imm ? IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE : IB_OPCODE_UD_SEND_ONLY; 135 | 136 | lrh->sl_rsv_lnh = (ah->ib_ah_attr.sl << 4) | lnh; /* Transport: IBA & Next Header: BTH */ 137 | lrh->dlid = cpu_to_be16(ah->ib_ah_attr.dlid); 138 | lrh->slid = cpu_to_be16(slid); 139 | 140 | bth->pkey = dev->ports[port_num - 1].pkey_table[send_wqe->wr.ud.pkey_index]; 141 | bth->destQP = cpu_to_be32(send_wqe->wr.ud.remote_qpn); 142 | bth->psn = cpu_to_be32(qp->ib_qp_attr.sq_psn & PIB_PSN_MASK); /* A-bit is 0 */ 143 | 144 | /* 145 | * An attempt to send a Q_Key with the most siginificant bit set results 146 | * in using the Q_key in the QP context instead of Send WR context. 147 | * 148 | * @see IBA Spec. Vol.1 3.5.3 KEYS 149 | */ 150 | deth->qkey = cpu_to_be32(((s32)send_wqe->wr.ud.remote_qkey < 0) ? 151 | qp->ib_qp_attr.qkey : send_wqe->wr.ud.remote_qkey); 152 | 153 | deth->srcQP = cpu_to_be32(qp->ib_qp.qp_num); 154 | 155 | if (with_imm) { 156 | *(__be32*)buffer = send_wqe->ex.imm_data; 157 | buffer += 4; 158 | } 159 | 160 | /* SMP の場合の補正 */ 161 | if (send_wqe->wr.ud.remote_qpn == PIB_QP0) { 162 | struct ib_smp *smp = (struct ib_smp*)buffer; 163 | if (smp->mgmt_class == IB_MGMT_CLASS_SUBN_DIRECTED_ROUTE) { 164 | if (smp->dr_slid == IB_LID_PERMISSIVE) 165 | lrh->slid = IB_LID_PERMISSIVE; 166 | if (smp->dr_dlid == IB_LID_PERMISSIVE) 167 | lrh->dlid = IB_LID_PERMISSIVE; 168 | } 169 | } 170 | 171 | /* The maximum message length constrained in size to fi in a single packet. */ 172 | if (send_wqe->processing.all_packets != 1) { 173 | status = IB_WC_LOC_LEN_ERR; 174 | goto completion_error; 175 | } 176 | 177 | if (send_wqe->total_length == 0) { 178 | 179 | } else if (send_wqe->send_flags & IB_SEND_INLINE) { 180 | memcpy(buffer, send_wqe->inline_data_buffer, send_wqe->total_length); 181 | } else { 182 | spin_lock_irqsave(&pd->lock, flags); 183 | status = pib_util_mr_copy_data(pd, send_wqe->sge_array, send_wqe->num_sge, 184 | buffer, 0, send_wqe->total_length, 185 | 0, 186 | PIB_MR_COPY_FROM); 187 | spin_unlock_irqrestore(&pd->lock, flags); 188 | } 189 | 190 | if (status != IB_WC_SUCCESS) 191 | goto completion_error; 192 | 193 | buffer += send_wqe->total_length; 194 | 195 | /* サイズの再計算 */ 196 | packet_length = buffer - dev->thread.send_buffer; 197 | fix_packet_length = (packet_length + 3) & ~3; 198 | 199 | pib_packet_lrh_set_pktlen(lrh, (fix_packet_length + 4)/ 4); /* add ICRC size */ 200 | pib_packet_bth_set_padcnt(bth, fix_packet_length - packet_length); 201 | pib_packet_bth_set_solicited(bth, send_wqe->send_flags & IB_SEND_SOLICITED); 202 | 203 | dev->thread.port_num = port_num; 204 | dev->thread.src_qp_num = qp->ib_qp.qp_num; 205 | dev->thread.slid = slid; 206 | dev->thread.dlid = dlid; 207 | dev->thread.trace_id = send_wqe->trace_id; 208 | dev->thread.ready_to_send = 1; 209 | 210 | qp->ib_qp_attr.sq_psn++; 211 | 212 | list_del_init(&send_wqe->list); 213 | qp->requester.nr_sending_swqe--; 214 | send_wqe->processing.list_type = PIB_SWQE_FREE; 215 | 216 | if (!push_wc) 217 | return 0; 218 | else { 219 | struct ib_wc wc = { 220 | .wr_id = send_wqe->wr_id, 221 | .status = IB_WC_SUCCESS, 222 | .opcode = pib_convert_wr_opcode_to_wc_opcode(send_wqe->opcode), 223 | .qp = &qp->ib_qp, 224 | }; 225 | 226 | ret = pib_util_insert_wc_success(qp->send_cq, &wc, 0); 227 | /* @todo チェック */ 228 | } 229 | 230 | return 0; 231 | 232 | completion_error: 233 | qp->state = IB_QPS_SQE; 234 | 235 | pib_util_insert_wc_error(qp->send_cq, qp, send_wqe->wr_id, 236 | status, send_wqe->opcode); 237 | 238 | BUG_ON(send_wqe->processing.list_type != PIB_SWQE_SENDING); 239 | list_del_init(&send_wqe->list); 240 | qp->requester.nr_sending_swqe--; 241 | send_wqe->processing.list_type = PIB_SWQE_FREE; 242 | 243 | pib_util_flush_qp(qp, 1); 244 | 245 | return -1; 246 | } 247 | 248 | 249 | void pib_receive_ud_qp_incoming_message(struct pib_dev *dev, u8 port_num, struct pib_qp *qp, struct pib_packet_lrh *lrh, struct ib_grh *grh, struct pib_packet_bth *bth, void *buffer, int size) 250 | { 251 | struct pib_pd *pd; 252 | struct pib_recv_wqe *recv_wqe = NULL; 253 | struct pib_packet_deth *deth; 254 | u32 qkey; 255 | enum ib_wc_status status = IB_WC_SUCCESS; 256 | __be32 imm_data = 0; 257 | unsigned long flags; 258 | 259 | if (!pib_is_recv_ok(qp->state)) 260 | goto silently_drop; 261 | 262 | switch (bth->OpCode) { 263 | 264 | case IB_OPCODE_UD_SEND_ONLY: 265 | case IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE: 266 | break; 267 | 268 | default: 269 | goto silently_drop; 270 | } 271 | 272 | /* UD don't set acknowledge request bit */ 273 | if (be32_to_cpu(bth->psn) & 0x80000000U) /* A-bit */ 274 | goto silently_drop; 275 | 276 | /* Analyze Datagram Extended Transport Header */ 277 | if (size < sizeof(struct pib_packet_deth)) 278 | goto silently_drop; 279 | 280 | deth = (struct pib_packet_deth*)buffer; 281 | 282 | buffer += sizeof(*deth); 283 | size -= sizeof(*deth); 284 | 285 | if (qp->qp_type == IB_QPT_UD) /* ignore port_num check if SMI QP and GSI QP */ 286 | if (qp->ib_qp_attr.port_num != port_num) 287 | goto silently_drop; 288 | 289 | qkey = be32_to_cpu(deth->qkey); 290 | 291 | /* BTH: Q_Key check */ 292 | switch (qp->qp_type) { 293 | case IB_QPT_SMI: 294 | break; 295 | 296 | case IB_QPT_GSI: 297 | if (qkey != IB_QP1_QKEY) 298 | goto silently_drop; 299 | break; 300 | 301 | default: 302 | if (qkey != qp->ib_qp_attr.qkey) 303 | goto silently_drop; 304 | break; 305 | } 306 | 307 | pib_trace_recv_ok(dev, port_num, bth->OpCode, be32_to_cpu(bth->psn), qp->ib_qp.qp_num, size); 308 | 309 | /* Analyze Immediate Extended Transport Header */ 310 | if (bth->OpCode == IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE) { 311 | if (size < 4) 312 | goto silently_drop; 313 | 314 | imm_data = *(__be32*)buffer; /* @odo */ 315 | 316 | buffer += 4; 317 | size -= 4; 318 | } 319 | 320 | if (qp->ib_qp_init_attr.srq) { 321 | recv_wqe = pib_util_get_srq(to_psrq(qp->ib_qp_init_attr.srq)); 322 | if (!recv_wqe) 323 | goto silently_drop; 324 | 325 | } else { 326 | 327 | if (list_empty(&qp->responder.recv_wqe_head)) 328 | goto silently_drop; 329 | 330 | recv_wqe = list_first_entry(&qp->responder.recv_wqe_head, struct pib_recv_wqe, list); 331 | list_del_init(&recv_wqe->list); 332 | qp->responder.nr_recv_wqe--; 333 | } 334 | 335 | if (recv_wqe->total_length < size) 336 | goto silently_drop; /* UD don't cause local length error */ 337 | 338 | pd = to_ppd(qp->ib_qp.pd); 339 | 340 | spin_lock_irqsave(&pd->lock, flags); 341 | 342 | if (grh) 343 | status = pib_util_mr_copy_data(pd, recv_wqe->sge_array, recv_wqe->num_sge, 344 | grh, 0, sizeof(*grh), 345 | IB_ACCESS_LOCAL_WRITE, 346 | PIB_MR_COPY_TO); 347 | if (status == IB_WC_SUCCESS) 348 | status = pib_util_mr_copy_data(pd, recv_wqe->sge_array, recv_wqe->num_sge, 349 | buffer, sizeof(*grh), size, 350 | IB_ACCESS_LOCAL_WRITE, 351 | PIB_MR_COPY_TO); 352 | 353 | spin_unlock_irqrestore(&pd->lock, flags); 354 | 355 | if (status != IB_WC_SUCCESS) { 356 | if (status == IB_WC_LOC_LEN_ERR) { 357 | if (qp->ib_qp_init_attr.srq) 358 | goto abort_error; 359 | else 360 | goto silently_drop; 361 | } 362 | goto completion_error; 363 | } 364 | 365 | { 366 | int ret; 367 | struct ib_wc wc = { 368 | .wr_id = recv_wqe->wr_id, 369 | .status = IB_WC_SUCCESS, 370 | .opcode = IB_WC_RECV, 371 | .byte_len = size + 40, 372 | .qp = &qp->ib_qp, 373 | .ex.imm_data = imm_data, 374 | .src_qp = be32_to_cpu(deth->srcQP) & PIB_QPN_MASK, 375 | .slid = be16_to_cpu(lrh->slid), 376 | }; 377 | 378 | if (grh) 379 | wc.wc_flags |= IB_WC_GRH; 380 | 381 | if (bth->OpCode == IB_OPCODE_UD_SEND_ONLY_WITH_IMMEDIATE) 382 | wc.wc_flags |= IB_WC_WITH_IMM; 383 | 384 | ret = pib_util_insert_wc_success(qp->recv_cq, &wc, pib_packet_bth_get_padcnt(bth)); 385 | } 386 | 387 | qp->push_rcqe = 1; 388 | qp->ib_qp_attr.rq_psn++; 389 | pib_util_free_recv_wqe(qp, recv_wqe); 390 | 391 | return; 392 | 393 | silently_drop: 394 | if (recv_wqe) 395 | pib_util_free_recv_wqe(qp, recv_wqe); /* @todo WC をあげるべき */ 396 | 397 | return; 398 | 399 | completion_error: 400 | qp->state = IB_QPS_ERR; 401 | 402 | pib_util_insert_wc_error(qp->send_cq, qp, recv_wqe->wr_id, 403 | status, IB_WC_RECV); 404 | 405 | pib_util_flush_qp(qp, 0); 406 | qp->push_rcqe = 1; 407 | pib_util_free_recv_wqe(qp, recv_wqe); 408 | 409 | return; 410 | 411 | abort_error: 412 | pib_util_insert_wc_error(qp->send_cq, qp, recv_wqe->wr_id, 413 | IB_WC_REM_ABORT_ERR, IB_WC_RECV); 414 | qp->push_rcqe = 1; 415 | pib_util_free_recv_wqe(qp, recv_wqe); 416 | 417 | return; 418 | } 419 | -------------------------------------------------------------------------------- /libpib/.gitignore: -------------------------------------------------------------------------------- 1 | libpib-rdmav2.so 2 | -------------------------------------------------------------------------------- /libpib/AUTHORS: -------------------------------------------------------------------------------- 1 | NAKAMURA Minoru 2 | -------------------------------------------------------------------------------- /libpib/Makefile: -------------------------------------------------------------------------------- 1 | all: libpib-rdmav2.so 2 | 3 | libpib-rdmav2.so: src/pib.c 4 | gcc -g -Wall -fPIC -shared -Wl,--version-script=src/pib.map $< -o $@ 5 | 6 | clean: 7 | rm -rf libpib-rdmav2.so 8 | 9 | .PHONY: all clean 10 | -------------------------------------------------------------------------------- /libpib/README: -------------------------------------------------------------------------------- 1 | Introduction 2 | ============ 3 | 4 | libpib is a userspace driver for Pseudo InfiniBand HCA. 5 | It is a plug-in module for libibverbs for Pseudo InfiniBand driver(pib) 6 | 7 | Using libpib 8 | ============== 9 | 10 | libpib will be loaded and used automatically by programs linked with 11 | libibverbs. The pib kernel module must be loaded for HCA devices 12 | to be detected and used. 13 | 14 | Supported OS 15 | ================== 16 | 17 | libpib supports the following Linux: 18 | 19 | CentOS 6.x 20 | 21 | -------------------------------------------------------------------------------- /libpib/libpib-modprobe.conf: -------------------------------------------------------------------------------- 1 | install pib /sbin/modprobe --ignore-install pib && (if [ -f /etc/rdma/setup-pib.awk -a -f /etc/rdma/pib.conf ]; then awk -f /etc/rdma/setup-pib.awk /etc/rdma/pib.conf; fi; /sbin/modprobe pib_en; /sbin/modprobe mlx4_ib) 2 | -------------------------------------------------------------------------------- /libpib/libpib-pib.conf: -------------------------------------------------------------------------------- 1 | # Config file for mlx4 hardware port settings 2 | # This file is read when the mlx4_core module is loaded and used to 3 | # set the port types for any hardware found. If a card is not listed 4 | # in this file, then its port types are left alone. 5 | # 6 | # Format: 7 | # 8 | # 9 | # Example: 10 | # 0000:0b:00.0 eth eth 11 | # 12 | # You can find the right pci device to use for any given card by loading 13 | # the mlx4_core module, then going to /sys/bus/pci/drivers/mlx4_core and 14 | # seeing what possible PCI devices are listed there. The possible values 15 | # for ports are: ib, eth, and auto. However, not all cards support all 16 | # types, so if you get messages from the kernel that your selected port 17 | # type isn't supported, there's nothing this script can do about it. Also, 18 | # some cards don't support using different types on the two ports (aka, 19 | # both ports must be either eth or ib). Again, we can't set what the kernel 20 | # won't support. Also, on single port cards, any setting for the 21 | # second port will be ignored. 22 | -------------------------------------------------------------------------------- /libpib/libpib-setup-pib.awk: -------------------------------------------------------------------------------- 1 | BEGIN { 2 | dir="/sys/bus/pci/drivers/pib" 3 | if (system("[ -d "dir" ]") != 0) 4 | exit 1 5 | } 6 | /^[[:xdigit:]]+:[[:xdigit:]]+:[[:xdigit:]]+\.[[:xdigit:]]([[:blank:]](ib|eth|auto))+/ { 7 | device=$1 8 | port1=$2 9 | port2=$3 10 | if (system("[ -d "dir"/"device" ]") != 0) 11 | next 12 | if (system("[ -f "dir"/"device"/port_trigger ]") == 0) 13 | print "all" > dir"/"device"/port_trigger" 14 | if (system("[ -f "dir"/"device"/mlx4_port2 ]") == 0) 15 | print port2 > dir"/"device"/mlx4_port2" 16 | if (system("[ -f "dir"/"device"/mlx4_port1 ]") == 0) 17 | print port1 > dir"/"device"/mlx4_port1" 18 | } 19 | -------------------------------------------------------------------------------- /libpib/libpib.spec: -------------------------------------------------------------------------------- 1 | Name: libpib 2 | Version: 0.0.6 3 | Release: 1%{?dist} 4 | Summary: Pseudo InfiniBand HCA Userspace Driver 5 | Provides: libibverbs-driver.%{_arch} 6 | Group: System Environment/Libraries 7 | License: GPLv2 or BSD 8 | Url: http://www.nminoru.jp/ 9 | Source: %{name}-%{version}.tar.gz 10 | BuildRoot: %(mktemp -ud %{_tmppath}/%{name}-%{version}-%{release}-XXXXXX) 11 | Provides: libpib-devel = %{version}-%{release} 12 | Requires: libibverbs > 1.1.4 13 | BuildRequires: libibverbs-devel > 1.1.4 14 | # BuildArch: x86_64 15 | # ExcludeArch: s390 s390x 16 | 17 | %description 18 | InfiniBand HCAs for use with the libibverbs library. 19 | 20 | %prep 21 | 22 | %setup -q 23 | 24 | %build 25 | make 26 | 27 | %install 28 | rm -rf $RPM_BUILD_ROOT 29 | install -D -m 644 libpib-rdmav2.so ${RPM_BUILD_ROOT}%{_libdir}/libpib-rdmav2.so 30 | install -D -m 644 pib.driver ${RPM_BUILD_ROOT}%{_sysconfdir}/libibverbs.d/pib.driver 31 | # install -D -m 644 %{SOURCE1} ${RPM_BUILD_ROOT}%{_sysconfdir}/modprobe.d/libpib.conf 32 | # install -D -m 644 %{SOURCE2} ${RPM_BUILD_ROOT}%{_sysconfdir}/rdma/pib.conf 33 | # install -D -m 644 %{SOURCE3} ${RPM_BUILD_ROOT}%{_sysconfdir}/rdma/setup-mlx4.awk 34 | # remove unpackaged files from the buildroot 35 | rm -f $RPM_BUILD_ROOT%{_libdir}/libpib.so 36 | 37 | %clean 38 | rm -rf $RPM_BUILD_ROOT 39 | 40 | %files 41 | %defattr(-,root,root,-) 42 | %{_libdir}/libpib-rdmav2.so 43 | %{_sysconfdir}/libibverbs.d/pib.driver 44 | %doc AUTHORS COPYING README 45 | 46 | %changelog 47 | * Tue Dec 09 2013 Minoru NAKAMURA - 0.0.5 48 | - Hack for the IB/core bug to Pass imm_data from ib_uverbs_send_wr to 49 | ib_send_wr correctly when sending UD messages. 50 | 51 | * Tue Oct 30 2013 Minoru NAKAMURA - 0.0.2 52 | - Initial spec file 53 | -------------------------------------------------------------------------------- /libpib/pib.driver: -------------------------------------------------------------------------------- 1 | driver pib 2 | -------------------------------------------------------------------------------- /libpib/src/pib.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2013,2014 Minoru NAKAMURA 3 | * 4 | * This code is licenced under the GPL version 2 or BSD license. 5 | */ 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include 14 | 15 | 16 | struct pib_ibv_device { 17 | struct ibv_device base; 18 | uint32_t imm_data_lkey; 19 | }; 20 | 21 | 22 | static int pib_query_device(struct ibv_context *context, 23 | struct ibv_device_attr *device_attr) 24 | { 25 | struct ibv_query_device cmd; 26 | uint64_t raw_fw_ver; 27 | unsigned major, minor, sub_minor; 28 | int ret; 29 | 30 | ret = ibv_cmd_query_device(context, device_attr, &raw_fw_ver, &cmd, sizeof cmd); 31 | if (ret) 32 | return ret; 33 | 34 | major = (raw_fw_ver >> 32) & 0xffff; 35 | minor = (raw_fw_ver >> 16) & 0xffff; 36 | sub_minor = raw_fw_ver & 0xffff; 37 | 38 | snprintf(device_attr->fw_ver, sizeof device_attr->fw_ver, 39 | "%d.%d.%03d", major, minor, sub_minor); 40 | 41 | return 0; 42 | } 43 | 44 | static int pib_query_port(struct ibv_context *context, uint8_t port_num, 45 | struct ibv_port_attr *port_attr) 46 | { 47 | struct ibv_query_port cmd; 48 | 49 | return ibv_cmd_query_port(context, port_num, port_attr, &cmd, sizeof cmd); 50 | } 51 | 52 | static struct ibv_pd *pib_alloc_pd(struct ibv_context *context) 53 | { 54 | struct ibv_pd *pd; 55 | struct ibv_alloc_pd cmd; 56 | struct ibv_alloc_pd_resp resp; 57 | int ret; 58 | 59 | pd = calloc(1, sizeof *pd); 60 | if (!pd) 61 | return NULL; 62 | 63 | ret = ibv_cmd_alloc_pd(context, pd, 64 | &cmd, sizeof cmd, 65 | &resp, sizeof resp); 66 | if (ret) { 67 | free(pd); 68 | errno = ret; 69 | return NULL; 70 | } 71 | 72 | return pd; 73 | } 74 | 75 | static int pib_dealloc_pd(struct ibv_pd *pd) 76 | { 77 | int ret; 78 | 79 | ret = ibv_cmd_dealloc_pd(pd); 80 | 81 | if (pd) 82 | free(pd); 83 | 84 | return ret; 85 | } 86 | 87 | static struct ibv_mr *pib_reg_mr(struct ibv_pd *pd, void *addr, size_t length, 88 | int access) 89 | { 90 | struct ibv_mr *mr; 91 | struct ibv_reg_mr cmd; 92 | struct ibv_reg_mr_resp resp; 93 | int ret; 94 | 95 | mr = calloc(1, sizeof *mr); 96 | if (!mr) 97 | return NULL; 98 | 99 | ret = ibv_cmd_reg_mr(pd, addr, length, 100 | (uintptr_t)addr, /* hca_va */ 101 | access, mr, &cmd, sizeof cmd, 102 | &resp, sizeof resp); 103 | if (ret) { 104 | free(mr); 105 | errno = ret; 106 | return NULL; 107 | } 108 | 109 | return mr; 110 | } 111 | 112 | struct ibv_mr *pib_rereg_mr(struct ibv_mr *mr, 113 | int flags, 114 | struct ibv_pd *pd, void *addr, 115 | size_t length, 116 | int access) 117 | { 118 | errno = ENOSYS; 119 | 120 | return NULL; 121 | } 122 | 123 | static int pib_dereg_mr(struct ibv_mr *mr) 124 | { 125 | int ret; 126 | 127 | ret = ibv_cmd_dereg_mr(mr); 128 | 129 | if (mr) 130 | free(mr); 131 | 132 | return ret; 133 | } 134 | 135 | static struct ibv_mw *pib_alloc_mw(struct ibv_pd *pd, enum ibv_mw_type type) 136 | { 137 | errno = ENOSYS; 138 | 139 | return NULL; 140 | } 141 | 142 | static int pib_bind_mw(struct ibv_qp *qp, struct ibv_mw *mw, 143 | struct ibv_mw_bind *mw_bind) 144 | { 145 | errno = ENOSYS; 146 | 147 | return -1; 148 | } 149 | 150 | static int pib_dealloc_mw(struct ibv_mw *mw) 151 | { 152 | errno = ENOSYS; 153 | 154 | return -1; 155 | } 156 | 157 | static struct ibv_cq *pib_create_cq(struct ibv_context *context, int cqe, 158 | struct ibv_comp_channel *channel, 159 | int comp_vector) 160 | { 161 | struct ibv_cq *cq; 162 | struct ibv_create_cq cmd; 163 | struct ibv_create_cq_resp resp; 164 | int ret; 165 | 166 | cq = calloc(1, sizeof *cq); 167 | if (!cq) 168 | return NULL; 169 | 170 | ret = ibv_cmd_create_cq(context, cqe, channel, comp_vector, 171 | cq, 172 | &cmd, sizeof cmd, 173 | &resp, sizeof resp); 174 | if (ret) { 175 | free(cq); 176 | errno = ret; 177 | return NULL; 178 | } 179 | 180 | return cq; 181 | } 182 | 183 | static int pib_poll_cq(struct ibv_cq *cq, int num_entries, struct ibv_wc *wc) 184 | { 185 | return ibv_cmd_poll_cq(cq, num_entries, wc); 186 | } 187 | 188 | static int pib_req_notify_cq(struct ibv_cq *cq, int solicited_only) 189 | { 190 | return ibv_cmd_req_notify_cq(cq, solicited_only); 191 | } 192 | 193 | static int pib_resize_cq(struct ibv_cq *cq, int cqe) 194 | { 195 | struct ibv_resize_cq cmd; 196 | struct ibv_resize_cq_resp resp; 197 | 198 | return ibv_cmd_resize_cq(cq, cqe, 199 | &cmd, sizeof cmd, 200 | &resp, sizeof resp); 201 | } 202 | 203 | static int pib_destroy_cq(struct ibv_cq *cq) 204 | { 205 | int ret; 206 | 207 | ret = ibv_cmd_destroy_cq(cq); 208 | 209 | if (cq) 210 | free(cq); 211 | 212 | return ret; 213 | } 214 | 215 | static struct ibv_srq *pib_create_srq(struct ibv_pd *pd, 216 | struct ibv_srq_init_attr *srq_init_attr) 217 | { 218 | struct ibv_srq *srq; 219 | struct ibv_create_srq cmd; 220 | struct ibv_create_srq_resp resp; 221 | int ret; 222 | 223 | srq = calloc(1, sizeof *srq); 224 | if (!srq) 225 | return NULL; 226 | 227 | ret = ibv_cmd_create_srq(pd, srq, srq_init_attr, 228 | &cmd, sizeof cmd, 229 | &resp, sizeof resp); 230 | if (ret) { 231 | free(srq); 232 | errno = ret; 233 | return NULL; 234 | } 235 | 236 | return srq; 237 | } 238 | 239 | static int pib_modify_srq(struct ibv_srq *srq, 240 | struct ibv_srq_attr *srq_attr, 241 | int srq_attr_mask) 242 | { 243 | struct ibv_modify_srq cmd; 244 | 245 | return ibv_cmd_modify_srq(srq, srq_attr, srq_attr_mask, 246 | &cmd, sizeof cmd); 247 | } 248 | 249 | static int pib_query_srq(struct ibv_srq *srq, struct ibv_srq_attr *srq_attr) 250 | { 251 | struct ibv_query_srq cmd; 252 | 253 | return ibv_cmd_query_srq(srq, srq_attr, 254 | &cmd, sizeof cmd); 255 | } 256 | 257 | static int pib_destroy_srq(struct ibv_srq *srq) 258 | { 259 | int ret; 260 | 261 | ret = ibv_cmd_destroy_srq(srq); 262 | 263 | if (srq) 264 | free(srq); 265 | 266 | return ret; 267 | } 268 | 269 | static int pib_post_srq_recv(struct ibv_srq *srq, 270 | struct ibv_recv_wr *recv_wr, 271 | struct ibv_recv_wr **bad_recv_wr) 272 | { 273 | return ibv_cmd_post_srq_recv(srq, recv_wr, bad_recv_wr); 274 | } 275 | 276 | static struct ibv_qp *pib_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *attr) 277 | { 278 | struct ibv_qp *qp; 279 | struct ibv_create_qp cmd; 280 | struct ibv_create_qp_resp resp; 281 | int ret; 282 | 283 | qp = calloc(1, sizeof *qp); 284 | if (!qp) 285 | return NULL; 286 | 287 | ret = ibv_cmd_create_qp(pd, qp, attr, 288 | &cmd, sizeof cmd, 289 | &resp, sizeof resp); 290 | if (ret) { 291 | free(qp); 292 | errno = ret; 293 | return NULL; 294 | } 295 | 296 | return qp; 297 | } 298 | 299 | static int pib_query_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, 300 | int attr_mask, 301 | struct ibv_qp_init_attr *init_attr) 302 | { 303 | struct ibv_query_qp cmd; 304 | 305 | return ibv_cmd_query_qp(qp, attr, attr_mask, init_attr, 306 | &cmd, sizeof cmd); 307 | } 308 | 309 | static int pib_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, 310 | int attr_mask) 311 | { 312 | struct ibv_modify_qp cmd; 313 | 314 | return ibv_cmd_modify_qp(qp, attr, attr_mask, 315 | &cmd, sizeof cmd); 316 | } 317 | 318 | static int pib_destroy_qp(struct ibv_qp *qp) 319 | { 320 | int ret; 321 | 322 | ret = ibv_cmd_destroy_qp(qp); 323 | 324 | if (qp) 325 | free(qp); 326 | 327 | return ret; 328 | } 329 | 330 | static int ud_qp_post_send_with_imm(struct ibv_qp *qp, struct ibv_send_wr *wr, 331 | struct ibv_send_wr **bad_wr, uint32_t imm_data_lkey) 332 | { 333 | int i; 334 | struct ibv_send_wr wr_temp = *wr; 335 | 336 | wr_temp.next = NULL; 337 | 338 | if (wr_temp.opcode != IBV_WR_SEND_WITH_IMM) 339 | goto done; 340 | 341 | /* Add special a s/g entry */ 342 | 343 | wr_temp.sg_list = (struct ibv_sge*)alloca(sizeof(struct ibv_sge) * (wr_temp.num_sge + 1)); 344 | 345 | for (i=0 ; isg_list[i]; 347 | 348 | wr_temp.sg_list[i].addr = 0; 349 | wr_temp.sg_list[i].length = wr->imm_data; 350 | wr_temp.sg_list[i].lkey = imm_data_lkey; 351 | 352 | wr_temp.num_sge++; 353 | 354 | done: 355 | return ibv_cmd_post_send(qp, &wr_temp, bad_wr); 356 | } 357 | 358 | static int pib_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr, 359 | struct ibv_send_wr **bad_wr) 360 | { 361 | uint32_t imm_data_lkey; 362 | struct ibv_send_wr *i; 363 | 364 | if (qp->qp_type == IBV_QPT_UD && qp->context->device) { 365 | imm_data_lkey = ((struct pib_ibv_device*)qp->context->device)->imm_data_lkey; 366 | if (imm_data_lkey) 367 | goto hack_imm_data_lkey; 368 | } 369 | 370 | return ibv_cmd_post_send(qp, wr, bad_wr); 371 | 372 | hack_imm_data_lkey: 373 | for (i = wr; i ; i = i->next) { 374 | int ret; 375 | ret = ud_qp_post_send_with_imm(qp, i, bad_wr, imm_data_lkey); 376 | if (ret) { 377 | *bad_wr = i; 378 | return ret; 379 | } 380 | } 381 | 382 | return 0; 383 | } 384 | 385 | static int pib_post_recv(struct ibv_qp *qp, struct ibv_recv_wr *wr, 386 | struct ibv_recv_wr **bad_wr) 387 | { 388 | return ibv_cmd_post_recv(qp, wr, bad_wr); 389 | } 390 | 391 | static int pib_attach_mcast(struct ibv_qp *qp, const union ibv_gid *gid, uint16_t lid) 392 | { 393 | return ibv_cmd_attach_mcast(qp, gid, lid); 394 | } 395 | 396 | static int pib_detach_mcast(struct ibv_qp *qp, const union ibv_gid *gid, uint16_t lid) 397 | { 398 | return ibv_cmd_detach_mcast(qp, gid, lid); 399 | } 400 | 401 | static struct ibv_ah *pib_create_ah(struct ibv_pd *pd, struct ibv_ah_attr *attr) 402 | { 403 | struct ibv_ah *ah; 404 | int ret; 405 | 406 | ah = calloc(1, sizeof *ah); 407 | if (!ah) 408 | return NULL; 409 | 410 | ret = ibv_cmd_create_ah(pd, ah, attr); 411 | if (ret) { 412 | free(ah); 413 | errno = ret; 414 | return NULL; 415 | } 416 | 417 | return ah; 418 | } 419 | 420 | static int pib_destroy_ah(struct ibv_ah *ah) 421 | { 422 | int ret; 423 | 424 | ret = ibv_cmd_destroy_ah(ah); 425 | 426 | if (ah) 427 | free(ah); 428 | 429 | return ret; 430 | } 431 | 432 | static void pib_cq_event(struct ibv_cq *cq) 433 | { 434 | /* ibv_get_cq_event の下請けだが channel->fd が直接返すなら不要 */ 435 | } 436 | 437 | static void pib_async_event(struct ibv_async_event *event) 438 | { 439 | /* ibv_get_async_event が context->async_fd を直接返すだけなら不要 */ 440 | } 441 | 442 | static struct ibv_context_ops pib_ctx_ops = { 443 | .query_device = pib_query_device, 444 | .query_port = pib_query_port, 445 | .alloc_pd = pib_alloc_pd, 446 | .dealloc_pd = pib_dealloc_pd, 447 | .reg_mr = pib_reg_mr, 448 | .dereg_mr = pib_dereg_mr, 449 | .alloc_mw = pib_alloc_mw, 450 | .bind_mw = pib_bind_mw, 451 | .dealloc_mw = pib_dealloc_mw, 452 | .create_cq = pib_create_cq, 453 | .poll_cq = pib_poll_cq, 454 | .req_notify_cq = pib_req_notify_cq, 455 | .cq_event = pib_cq_event, 456 | .resize_cq = pib_resize_cq, 457 | .destroy_cq = pib_destroy_cq, 458 | .create_srq = pib_create_srq, 459 | .modify_srq = pib_modify_srq, 460 | .query_srq = pib_query_srq, 461 | .destroy_srq = pib_destroy_srq, 462 | .post_srq_recv = pib_post_srq_recv, 463 | .create_qp = pib_create_qp, 464 | .query_qp = pib_query_qp, 465 | .modify_qp = pib_modify_qp, 466 | .destroy_qp = pib_destroy_qp, 467 | .post_send = pib_post_send, 468 | .post_recv = pib_post_recv, 469 | .create_ah = pib_create_ah, 470 | .destroy_ah = pib_destroy_ah, 471 | .attach_mcast = pib_attach_mcast, 472 | .detach_mcast = pib_detach_mcast, 473 | .async_event = pib_async_event, 474 | }; 475 | 476 | static struct ibv_context *pib_alloc_context(struct ibv_device *ibdev, int cmd_fd) 477 | { 478 | struct ibv_context *context; 479 | struct ibv_get_context cmd; 480 | struct ibv_get_context_resp resp; 481 | int ret; 482 | 483 | context = calloc(1, sizeof *context); 484 | if (!context) 485 | return NULL; 486 | 487 | context->cmd_fd = cmd_fd; 488 | 489 | ret = ibv_cmd_get_context(context, 490 | &cmd, sizeof cmd, 491 | &resp, sizeof resp); 492 | if (ret) { 493 | free(context); 494 | errno = ret; 495 | return NULL; 496 | } 497 | 498 | context->ops = pib_ctx_ops; 499 | 500 | return context; 501 | } 502 | 503 | static void pib_free_context(struct ibv_context *context) 504 | { 505 | if (context) 506 | free(context); 507 | } 508 | 509 | static struct ibv_device_ops pib_dev_ops = { 510 | .alloc_context = pib_alloc_context, 511 | .free_context = pib_free_context 512 | }; 513 | 514 | static struct ibv_device *pib_driver_init(const char *uverbs_sys_path, int abi_version) 515 | { 516 | char device_name[24]; 517 | struct pib_ibv_device *dev; 518 | char ibdev_path[IBV_SYSFS_PATH_MAX]; 519 | char attr[41]; 520 | 521 | if (ibv_read_sysfs_file(uverbs_sys_path, "ibdev", 522 | device_name, sizeof device_name) < 0) 523 | return NULL; 524 | 525 | if (strncmp(device_name, "pib_", 4) != 0) 526 | return NULL; 527 | 528 | dev = calloc(1, sizeof *dev); 529 | if (!dev) { 530 | return NULL; 531 | } 532 | 533 | dev->base.ops = pib_dev_ops; 534 | dev->base.node_type = IBV_NODE_CA; 535 | dev->base.transport_type = IBV_TRANSPORT_IB; 536 | 537 | snprintf(ibdev_path, sizeof ibdev_path, 538 | "%s/class/infiniband/%s", ibv_get_sysfs_path(), 539 | device_name); 540 | 541 | if (ibv_read_sysfs_file(ibdev_path, "imm_data_lkey", 542 | attr, sizeof attr) < 0) 543 | goto done; 544 | 545 | if (sscanf(attr, "0x%08x", &dev->imm_data_lkey) != 1) 546 | dev->imm_data_lkey = 0U; 547 | 548 | done: 549 | return &dev->base; 550 | } 551 | 552 | static __attribute__((constructor)) void pib_register_driver(void) 553 | { 554 | ibv_register_driver("pib", pib_driver_init); 555 | } 556 | -------------------------------------------------------------------------------- /libpib/src/pib.map: -------------------------------------------------------------------------------- 1 | { 2 | global: 3 | openib_driver_init; 4 | local: *; 5 | }; 6 | -------------------------------------------------------------------------------- /pibnetd/.gitignore: -------------------------------------------------------------------------------- 1 | *.o 2 | pibnetd 3 | pibping 4 | 5 | 6 | -------------------------------------------------------------------------------- /pibnetd/Makefile: -------------------------------------------------------------------------------- 1 | TARGET=pibnetd pibping 2 | OBJS=main.o smp.o perf.o logger.o 3 | CFLAGS=-g -Wall 4 | 5 | ALL: $(TARGET) 6 | 7 | pibnetd: $(OBJS) 8 | gcc $(CFLAGS) $^ -o $@ 9 | 10 | pibping: pibping.c 11 | gcc $(CFLAGS) $^ -o $@ 12 | 13 | clean: 14 | rm -f $(TARGET) $(OBJS) 15 | 16 | .PHONY: clean 17 | -------------------------------------------------------------------------------- /pibnetd/byteorder.h: -------------------------------------------------------------------------------- 1 | /* 2 | * byteorder.h - Endian convert macros 3 | * 4 | * Copyright (c) 2014 Minoru NAKAMURA 5 | * 6 | * This code is licenced under the GPL version 2 or BSD license. 7 | */ 8 | #ifndef _BYTEORDER_H_ 9 | #define _BYTEORDER_H_ 10 | 11 | #include 12 | #include 13 | 14 | typedef uint8_t u8; 15 | typedef uint16_t u16; 16 | typedef uint32_t u32; 17 | typedef uint64_t u64; 18 | typedef uint16_t __be16; 19 | typedef uint32_t __be32; 20 | typedef uint64_t __be64; 21 | 22 | #if __BYTE_ORDER == __LITTLE_ENDIAN 23 | #define cpu_to_be16(x) ({ \ 24 | uint16_t _v = (uint16_t)(x); \ 25 | (uint16_t)((_v << 8) | ((_v >> 8) & 0xFF)); \ 26 | }) 27 | #define be16_to_cpu(x) cpu_to_be16(x) 28 | #define cpu_to_be32(x) __builtin_bswap32(x) 29 | #define be32_to_cpu(x) __builtin_bswap32(x) 30 | #define cpu_to_be64(x) __builtin_bswap64(x) 31 | #define be64_to_cpu(x) __builtin_bswap64(x) 32 | #elif __BYTE_ORDER == __BIG_ENDIAN 33 | #define cpu_to_be16(x) (x) 34 | #define cpu_to_be32(x) (x) 35 | #define cpu_to_be64(x) (x) 36 | #define be16_to_cpu(x) (x) 37 | #define be32_to_cpu(x) (x) 38 | #define be64_to_cpu(x) (x) 39 | #endif 40 | 41 | #endif /* _BYTEORDER_H_ */ 42 | 43 | -------------------------------------------------------------------------------- /pibnetd/logger.c: -------------------------------------------------------------------------------- 1 | /* 2 | * logger.c - Write messages to system log and log file 3 | * 4 | * Copyright (c) 2014 Minoru NAKAMURA 5 | * 6 | * This code is licenced under the GPL version 2 or BSD license. 7 | */ 8 | 9 | #include 10 | #include 11 | #include 12 | 13 | #include "pibnetd.h" 14 | 15 | 16 | void __pib_report_info(const char *filename, int lineno, const char *format, ...) 17 | { 18 | int ret; 19 | va_list arg; 20 | char buffer[1024]; 21 | 22 | va_start(arg, format); 23 | ret = vsprintf(buffer, format, arg); 24 | va_end(arg); 25 | 26 | sprintf(buffer + ret, "\n"); 27 | 28 | fputs(buffer, stdout); 29 | fflush(stdout); 30 | 31 | syslog(LOG_INFO, buffer); 32 | } 33 | 34 | 35 | void __pib_report_debug(const char *filename, int lineno, const char *format, ...) 36 | { 37 | int ret; 38 | va_list arg; 39 | char buffer[1024]; 40 | 41 | va_start(arg, format); 42 | ret = vsprintf(buffer, format, arg); 43 | va_end(arg); 44 | 45 | sprintf(buffer + ret, " at %s(%u)\n", filename, lineno); 46 | 47 | fputs(buffer, stdout); 48 | fflush(stdout); 49 | 50 | syslog(LOG_INFO, buffer); 51 | } 52 | 53 | 54 | void __pib_report_err(const char *filename, int lineno, const char *format, ...) 55 | { 56 | int ret; 57 | va_list arg; 58 | char buffer[1024]; 59 | 60 | va_start(arg, format); 61 | ret = vsprintf(buffer, format, arg); 62 | va_end(arg); 63 | 64 | sprintf(buffer + ret, " at %s(%u)\n", filename, lineno); 65 | 66 | fputs(buffer, stderr); 67 | fflush(stderr); 68 | 69 | syslog(LOG_ERR, buffer); 70 | } 71 | -------------------------------------------------------------------------------- /pibnetd/pibnetd.h: -------------------------------------------------------------------------------- 1 | /* 2 | * pibnetd.h - General definitions for pibnetd 3 | * 4 | * Copyright (c) 2014 Minoru NAKAMURA 5 | * 6 | * This code is licenced under the GPL version 2 or BSD license. 7 | */ 8 | #ifndef _PIBNETD_H_ 9 | #define _PIBNETD_H_ 10 | 11 | #include 12 | #include 13 | #include 14 | 15 | #include "byteorder.h" 16 | #include "pibnetd_packet.h" 17 | 18 | #ifndef ARRAY_SIZE 19 | #define ARRAY_SIZE(a) (sizeof (a) / sizeof ((a)[0])) 20 | #endif 21 | 22 | 23 | #define PIB_SWITCH_DESCRIPTION "Pseudo InfiniBand HCA switch" 24 | 25 | #define PIB_VERSION_MAJOR 0 26 | #define PIB_VERSION_MINOR 4 27 | #define PIB_VERSION_REVISION 6 28 | #define PIB_DRIVER_VERSION "0.4.6" 29 | 30 | #define PIB_DRIVER_FW_VERSION \ 31 | (((u64)PIB_VERSION_MAJOR << 32) | ((u64)PIB_VERSION_MINOR << 16) | PIB_VERSION_REVISION) 32 | 33 | #define PIB_DRIVER_DEVICE_ID (1) 34 | #define PIB_DRIVER_REVISION (1) 35 | 36 | #define PIB_NETD_DEFAULT_PORT (8432) 37 | 38 | #define PIB_MAX_PORTS (32 + 1) 39 | 40 | #define PIB_MAX_LID (0x10000) 41 | #define PIB_MCAST_LID_BASE (0x0C000) 42 | 43 | #define PIB_QP0 (0) 44 | #define PIB_QP1 (1) 45 | 46 | #define PIB_QPN_MASK (0xFFFFFF) 47 | #define PIB_PSN_MASK (0xFFFFFF) 48 | #define PIB_PACKET_BUFFER (8192) 49 | #define PIB_GID_PER_PORT (16) 50 | #define PIB_MAX_PAYLOAD_LEN (0x40000000) 51 | #define PIB_MULTICAST_QPN (0xFFFFFF) 52 | 53 | #define PIB_PKEY_PER_BLOCK (32) 54 | #define PIB_PKEY_TABLE_LEN (PIB_PKEY_PER_BLOCK * 1) 55 | 56 | #define PIB_MCAST_QP_ATTACH (128) 57 | #define PIB_LID_PERMISSIVE (0xFFFF) 58 | 59 | #define PIB_DEFAULT_PKEY_FULL (0xFFFF) 60 | 61 | #define PIB_DEVICE_CAP_FLAGS (IBV_DEVICE_CHANGE_PHY_PORT |\ 62 | IBV_DEVICE_SYS_IMAGE_GUID |\ 63 | IBV_DEVICE_RC_RNR_NAK_GEN) 64 | 65 | #define PIB_PORT_CAP_FLAGS (PIB_PORT_TRAP_SUP|PIB_PORT_SYS_IMAGE_GUID_SUP|PIB_PORT_CM_SUP) 66 | 67 | #define PIB_LINK_WIDTH_SUPPORTED (PIB_WIDTH_1X | PIB_WIDTH_4X | PIB_WIDTH_8X | PIB_WIDTH_12X) 68 | #define PIB_LINK_SPEED_SUPPORTED (7) /* 2.5 or 5.0 or 10.0 Gbps */ 69 | 70 | #define PIB_PACKET_BUFFER (8192) 71 | #define PIB_GID_PER_PORT (16) 72 | #define PIB_PKEY_PER_BLOCK (32) 73 | #define PIB_PKEY_TABLE_LEN (PIB_PKEY_PER_BLOCK * 1) 74 | 75 | 76 | enum pib_link_cmd { 77 | PIB_LINK_CMD_CONNECT = 1, 78 | PIB_LINK_CMD_CONNECT_ACK, 79 | PIB_LINK_CMD_DISCONNECT, 80 | PIB_LINK_CMD_DISCONNECT_ACK, 81 | PIB_LINK_SHUTDOWN, 82 | }; 83 | 84 | 85 | enum pib_hys_port_state_{ 86 | PIB_PHYS_PORT_SLEEP = 1, 87 | PIB_PHYS_PORT_POLLING = 2, 88 | PIB_PHYS_PORT_DISABLED = 3, 89 | PIB_PHYS_PORT_PORT_CONFIGURATION_TRAINNING = 4, 90 | PIB_PHYS_PORT_LINK_UP = 5, 91 | PIB_PHYS_PORT_LINK_ERROR_RECOVERY = 6, 92 | PIB_PHYS_PORT_PHY_TEST = 7 93 | }; 94 | 95 | 96 | enum pib_port_type { 97 | PIB_PORT_CA = 1, 98 | PIB_PORT_SW_EXT, 99 | PIB_PORT_BASE_SP0, 100 | PIB_PORT_ENH_SP0 101 | }; 102 | 103 | 104 | enum pib_port_cap_flags { 105 | PIB_PORT_SM = 1 << 1, 106 | PIB_PORT_NOTICE_SUP = 1 << 2, 107 | PIB_PORT_TRAP_SUP = 1 << 3, 108 | PIB_PORT_OPT_IPD_SUP = 1 << 4, 109 | PIB_PORT_AUTO_MIGR_SUP = 1 << 5, 110 | PIB_PORT_SL_MAP_SUP = 1 << 6, 111 | PIB_PORT_MKEY_NVRAM = 1 << 7, 112 | PIB_PORT_PKEY_NVRAM = 1 << 8, 113 | PIB_PORT_LED_INFO_SUP = 1 << 9, 114 | PIB_PORT_SM_DISABLED = 1 << 10, 115 | PIB_PORT_SYS_IMAGE_GUID_SUP = 1 << 11, 116 | PIB_PORT_PKEY_SW_EXT_PORT_TRAP_SUP = 1 << 12, 117 | PIB_PORT_EXTENDED_SPEEDS_SUP = 1 << 14, 118 | PIB_PORT_CM_SUP = 1 << 16, 119 | PIB_PORT_SNMP_TUNNEL_SUP = 1 << 17, 120 | PIB_PORT_REINIT_SUP = 1 << 18, 121 | PIB_PORT_DEVICE_MGMT_SUP = 1 << 19, 122 | PIB_PORT_VENDOR_CLASS_SUP = 1 << 20, 123 | PIB_PORT_DR_NOTICE_SUP = 1 << 21, 124 | PIB_PORT_CAP_MASK_NOTICE_SUP = 1 << 22, 125 | PIB_PORT_BOOT_MGMT_SUP = 1 << 23, 126 | PIB_PORT_LINK_LATENCY_SUP = 1 << 24, 127 | PIB_PORT_CLIENT_REG_SUP = 1 << 25 128 | }; 129 | 130 | 131 | enum pib_port_speed { 132 | PIB_SPEED_SDR = 1, 133 | PIB_SPEED_DDR = 2, 134 | PIB_SPEED_QDR = 4, 135 | PIB_SPEED_FDR10 = 8, 136 | PIB_SPEED_FDR = 16, 137 | PIB_SPEED_EDR = 32 138 | }; 139 | 140 | 141 | enum pib_port_width { 142 | PIB_WIDTH_1X = 1, 143 | PIB_WIDTH_4X = 2, 144 | PIB_WIDTH_8X = 4, 145 | PIB_WIDTH_12X = 8 146 | }; 147 | 148 | 149 | struct pib_port_perf { 150 | uint8_t OpCode; /* all 0xFF */ 151 | uint16_t tag; 152 | uint16_t counter_select[16]; 153 | uint64_t counter[16]; 154 | uint64_t symbol_error_counter; 155 | uint64_t link_error_recovery_counter; 156 | uint64_t link_downed_counter; 157 | uint64_t rcv_errors; 158 | uint64_t rcv_remphys_errors; 159 | uint64_t rcv_switch_relay_errors; 160 | uint64_t xmit_discards; 161 | uint64_t xmit_constraint_errors; 162 | uint64_t rcv_constraint_errors; 163 | uint64_t local_link_integrity_errors; 164 | uint64_t excessive_buffer_overrun_errors; 165 | uint64_t vl15_dropped; 166 | uint64_t xmit_data; 167 | uint64_t rcv_data; 168 | uint64_t xmit_packets; 169 | uint64_t rcv_packets; 170 | uint64_t xmit_wait; 171 | uint64_t unicast_xmit_packets; 172 | uint64_t unicast_rcv_packets; 173 | uint64_t multicast_xmit_packets; 174 | uint64_t multicast_rcv_packets; 175 | }; 176 | 177 | 178 | struct pib_port { 179 | uint8_t port_num; 180 | struct ibv_port_attr ibv_port_attr; 181 | 182 | uint8_t mkey; 183 | uint8_t mkeyprot; 184 | uint16_t mkey_lease_period; 185 | uint8_t link_down_default_state; 186 | uint8_t link_width_enabled; 187 | uint8_t link_speed_enabled; 188 | uint8_t master_smsl; 189 | uint8_t client_reregister; 190 | uint8_t subnet_timeout; 191 | uint8_t local_phy_errors; 192 | uint8_t overrun_errors; 193 | 194 | struct pib_port_perf perf; 195 | 196 | union ibv_gid gid[PIB_GID_PER_PORT]; 197 | uint16_t pkey_table[PIB_PKEY_TABLE_LEN]; 198 | 199 | uint64_t port_guid; 200 | struct sockaddr *sockaddr; 201 | socklen_t socklen; 202 | }; 203 | 204 | 205 | struct pib_port_bits { 206 | uint16_t pm_blocks[16]; /* portmask blocks */ 207 | }; 208 | 209 | 210 | struct pib_control { 211 | void *buffer; /* buffer for sendmsg/recvmsg */ 212 | int sockfd; 213 | struct sockaddr *sockaddr; 214 | }; 215 | 216 | 217 | struct pib_switch { 218 | struct pib_control *control; 219 | uint8_t port_cnt; /* include port 0 */ 220 | struct pib_port ports[PIB_MAX_PORTS]; 221 | 222 | uint16_t linear_fdb_top; 223 | uint8_t default_port; 224 | uint8_t default_mcast_primary_port; 225 | uint8_t default_mcast_not_primary_port; 226 | uint8_t life_time_value; 227 | uint8_t port_state_change; 228 | 229 | uint8_t *ucast_fwd_table; 230 | struct pib_port_bits *mcast_fwd_table; 231 | }; 232 | 233 | 234 | extern struct pib_control pib_control; 235 | extern uint64_t pib_hca_guid_base; 236 | 237 | struct pib_smp; 238 | 239 | extern int pib_process_smp(struct pib_smp *smp, struct pib_switch *sw, uint8_t in_port_num); 240 | extern int pib_process_pma_mad(struct pib_pma_mad *pmp, struct pib_switch *sw, uint8_t port_num); 241 | 242 | #define pib_report_debug(fmt, ...) \ 243 | do { \ 244 | __pib_report_debug(__FILE__, __LINE__, fmt, ##__VA_ARGS__); \ 245 | } while(0) 246 | 247 | #define pib_report_info(fmt, ...) \ 248 | do { \ 249 | __pib_report_info(__FILE__, __LINE__, fmt, ##__VA_ARGS__); \ 250 | } while(0) 251 | 252 | #define pib_report_err(fmt, ...) \ 253 | do { \ 254 | __pib_report_err(__FILE__, __LINE__, fmt, ##__VA_ARGS__); \ 255 | } while(0) 256 | 257 | extern void __pib_report_debug(const char *filename, int lineno, const char *format, ...); 258 | extern void __pib_report_info(const char *filename, int lineno, const char *format, ...); 259 | extern void __pib_report_err(const char *filename, int lineno, const char *format, ...); 260 | 261 | #endif /* _PIBNETD_H_ */ 262 | -------------------------------------------------------------------------------- /pibnetd/pibnetd.spec: -------------------------------------------------------------------------------- 1 | Name: pibnetd 2 | Version: 0.4.6 3 | Release: 1%{?dist} 4 | Summary: Pseudo InfiniBand Fabric emulation daemon 5 | Group: System Environment/Daemons 6 | License: GPLv2 or BSD 7 | Url: http://www.nminoru.jp/ 8 | Source: %{name}-%{version}.tar.gz 9 | BuildRoot: %(mktemp -ud %{_tmppath}/%{name}-%{version}-%{release}-XXXXXX) 10 | Provides: libpib-devel = %{version}-%{release} 11 | Requires: libibverbs > 1.1.4 12 | BuildRequires: libibverbs-devel > 1.1.4 13 | # ExcludeArch: s390 s390x 14 | 15 | %description 16 | pibnetd is the Pseudo InfiniBand Fabric emulation daemon for pib. 17 | 18 | %prep 19 | 20 | %setup -q 21 | 22 | %build 23 | make 24 | 25 | %install 26 | rm -rf $RPM_BUILD_ROOT 27 | install -D -m755 pibnetd %{buildroot}%{_sbindir}/pibnetd 28 | install -D -m755 scripts/redhat-pibnetd.init %{buildroot}%{_initddir}/pibnetd 29 | 30 | %clean 31 | rm -rf $RPM_BUILD_ROOT 32 | 33 | %files 34 | %defattr(-,root,root,-) 35 | %{_sbindir}/pibnetd 36 | %{_initddir}/pibnetd 37 | 38 | %changelog 39 | * Thu Feb 12 2015 Minoru NAKAMURA - 0.4.6 40 | - Fix problem that pibnetd fails to reconnect to a node that has been shut down abnormally 41 | 42 | * Tue Nov 06 2014 Minoru NAKAMURA - 0.4.1 43 | - Fix the wrong path for pibnetd in /etc/rc.d/init.d/pibnetd 44 | 45 | * Mon Apr 07 2014 Minoru NAKAMURA - 0.4.0 46 | - Initial spec file 47 | -------------------------------------------------------------------------------- /pibnetd/pibnetd_packet.h: -------------------------------------------------------------------------------- 1 | /* 2 | * pibnetd_packet.h - Structures of IB Management Datagram(MAD) packets 3 | * 4 | * Copyright (c) 2014 Minoru NAKAMURA 5 | * 6 | * This code is licenced under the GPL version 2 or BSD license. 7 | */ 8 | #ifndef _PIBNETD_PACKET_H_ 9 | #define _PIBNETD_PACKET_H_ 10 | 11 | #include 12 | #include "byteorder.h" 13 | 14 | #define PIB_MGMT_BASE_VERSION (1) 15 | #define PIB_MGMT_CLASS_VERSION (1) 16 | 17 | /* Management classes */ 18 | #define PIB_MGMT_CLASS_SUBN_LID_ROUTED (0x01) 19 | #define PIB_MGMT_CLASS_SUBN_DIRECTED_ROUTE (0x81) 20 | #define PIB_MGMT_CLASS_SUBN_ADM (0x03) 21 | #define PIB_MGMT_CLASS_PERF_MGMT (0x04) 22 | 23 | /* Management methods */ 24 | #define PIB_MGMT_METHOD_GET (0x01) 25 | #define PIB_MGMT_METHOD_SET (0x02) 26 | #define PIB_MGMT_METHOD_GET_RESP (0x81) 27 | #define PIB_MGMT_METHOD_SEND (0x03) 28 | #define PIB_MGMT_METHOD_TRAP (0x05) 29 | #define PIB_MGMT_METHOD_REPORT (0x06) 30 | #define PIB_MGMT_METHOD_REPORT_RESP (0x86) 31 | #define PIB_MGMT_METHOD_TRAP_REPRESS (0x07) 32 | 33 | #define PIB_SMP_UNSUP_VERSION cpu_to_be16(0x0004) 34 | #define PIB_SMP_UNSUP_METHOD cpu_to_be16(0x0008) 35 | #define PIB_SMP_UNSUP_METH_ATTR cpu_to_be16(0x000C) 36 | #define PIB_SMP_INVALID_FIELD cpu_to_be16(0x001C) 37 | 38 | #define PIB_SMP_DIRECTION cpu_to_be16(0x8000) 39 | 40 | /* Subnet management attributes */ 41 | #define PIB_SMP_ATTR_NOTICE (0x0002) 42 | #define PIB_SMP_ATTR_NODE_DESC (0x0010) 43 | #define PIB_SMP_ATTR_NODE_INFO (0x0011) 44 | #define PIB_SMP_ATTR_SWITCH_INFO (0x0012) 45 | #define PIB_SMP_ATTR_GUID_INFO (0x0014) 46 | #define PIB_SMP_ATTR_PORT_INFO (0x0015) 47 | #define PIB_SMP_ATTR_PKEY_TABLE (0x0016) 48 | #define PIB_SMP_ATTR_SL_TO_VL_TABLE (0x0017) 49 | #define PIB_SMP_ATTR_VL_ARB_TABLE (0x0018) 50 | #define PIB_SMP_ATTR_LINEAR_FORWARD_TABLE (0x0019) 51 | #define PIB_SMP_ATTR_RANDOM_FORWARD_TABLE (0x001A) 52 | #define PIB_SMP_ATTR_MCAST_FORWARD_TABLE (0x001B) 53 | #define PIB_SMP_ATTR_SM_INFO (0x0020) 54 | #define PIB_SMP_ATTR_VENDOR_DIAG (0x0030) 55 | #define PIB_SMP_ATTR_LED_INFO (0x0031) 56 | #define PIB_SMP_ATTR_VENDOR_MASK (0xFF00) 57 | 58 | 59 | enum pib_mad_result { 60 | PIB_MAD_RESULT_FAILURE = 0, /* (!SUCCESS is the important flag) */ 61 | PIB_MAD_RESULT_SUCCESS = 1 << 0, /* MAD was successfully processed */ 62 | PIB_MAD_RESULT_REPLY = 1 << 1, /* Reply packet needs to be sent */ 63 | PIB_MAD_RESULT_CONSUMED = 1 << 2 /* Packet consumed: stop processing */ 64 | }; 65 | 66 | 67 | /* Local Route Header */ 68 | struct pib_packet_lrh { 69 | __be16 dlid; 70 | 71 | /* 72 | * Virtual Lane 4 bits 73 | * Link Version 4 bits 74 | */ 75 | u8 vl_lver; 76 | 77 | /* 78 | * Service Level 4 bits 79 | * Reserved 2 bits 80 | * Link Next Header 2 bits 81 | */ 82 | u8 sl_rsv_lnh; 83 | 84 | __be16 slid; 85 | 86 | /* 87 | * Reserved 5 bits 88 | * Packet Length 11 bits 89 | */ 90 | __be16 pktlen; 91 | 92 | } __attribute__ ((packed)); 93 | 94 | 95 | static inline u16 pib_packet_lrh_get_pktlen(const struct pib_packet_lrh *lrh) 96 | { 97 | return be16_to_cpu(lrh->pktlen) & 0x7FF; 98 | } 99 | 100 | 101 | static inline void pib_packet_lrh_set_pktlen(struct pib_packet_lrh *lrh, u16 value) 102 | { 103 | lrh->pktlen = cpu_to_be16(value & 0x7FF); 104 | } 105 | 106 | 107 | struct pib_grh { 108 | __be32 version_tclass_flow; 109 | __be16 paylen; 110 | u8 next_hdr; 111 | u8 hop_limit; 112 | union ibv_gid sgid; 113 | union ibv_gid dgid; 114 | } __attribute__ ((packed)); 115 | 116 | 117 | /* Base Transport Header */ 118 | struct pib_packet_bth { 119 | u8 OpCode; /* Opcode */ 120 | 121 | /* 122 | * Solicited Event 1 bit 123 | * MigReq 1 bit 124 | * Pad Count 2 bits 125 | * Transport Header Version 4 bits 126 | */ 127 | u8 se_m_padcnt_tver; 128 | 129 | __be16 pkey; /* Partition Key */ 130 | __be32 destQP; /* Destinatino QP (The most significant 8-bits must be zero.) */ 131 | __be32 psn; /* Packet Sequence Number (The MSB is A bit) */ 132 | } __attribute__ ((packed)); 133 | 134 | 135 | static inline u8 pib_packet_bth_get_padcnt(const struct pib_packet_bth *bth) 136 | { 137 | return (bth->se_m_padcnt_tver >> 4) & 0x3; 138 | } 139 | 140 | 141 | static inline void pib_packet_bth_set_padcnt(struct pib_packet_bth *bth, u8 padcnt) 142 | { 143 | bth->se_m_padcnt_tver &= ~0x30; 144 | bth->se_m_padcnt_tver |= ((padcnt & 0x3) << 4); 145 | } 146 | 147 | 148 | /* Datagram Extended Transport Header */ 149 | struct pib_packet_deth { 150 | __be32 qkey; /* Queue Key */ 151 | __be32 srcQP; /* Source QP (The most significant 8-bits must be zero.) */ 152 | } __attribute__ ((packed)); 153 | 154 | 155 | struct pib_packet_link { 156 | __be32 cmd; 157 | } __attribute__ ((packed)); 158 | 159 | 160 | union pib_packet_footer { 161 | struct { 162 | __be16 vcrc; /* Variant CRC */ 163 | } native; 164 | struct { 165 | __be64 port_guid; 166 | } pib; 167 | } __attribute__ ((packed)); 168 | 169 | 170 | struct pib_mad_hdr { 171 | u8 base_version; 172 | u8 mgmt_class; 173 | u8 class_version; 174 | u8 method; 175 | __be16 status; 176 | __be16 class_specific; 177 | __be64 tid; 178 | __be16 attr_id; 179 | __be16 resv; 180 | __be32 attr_mod; 181 | }; 182 | 183 | 184 | enum { 185 | PIB_MGMT_MAD_DATA = 232 186 | }; 187 | 188 | 189 | struct pib_mad { 190 | struct pib_mad_hdr mad_hdr; 191 | u8 data[PIB_MGMT_MAD_DATA]; 192 | }; 193 | 194 | 195 | enum { 196 | PIB_SMP_DATA_SIZE = 64, 197 | PIB_SMP_MAX_PATH_HOPS = 64 198 | }; 199 | 200 | 201 | struct pib_smp { 202 | u8 base_version; 203 | u8 mgmt_class; 204 | u8 class_version; 205 | u8 method; 206 | __be16 status; 207 | u8 hop_ptr; 208 | u8 hop_cnt; 209 | __be64 tid; 210 | __be16 attr_id; 211 | __be16 resv; 212 | __be32 attr_mod; 213 | __be64 mkey; 214 | __be16 dr_slid; 215 | __be16 dr_dlid; 216 | u8 reserved[28]; 217 | u8 data[PIB_SMP_DATA_SIZE]; 218 | u8 initial_path[PIB_SMP_MAX_PATH_HOPS]; 219 | u8 return_path[PIB_SMP_MAX_PATH_HOPS]; 220 | } __attribute__ ((packed)); 221 | 222 | 223 | struct pib_smp_node_info { 224 | u8 base_version; 225 | u8 class_version; 226 | u8 node_type; 227 | u8 node_ports; 228 | __be64 sys_image_guid; 229 | __be64 node_guid; 230 | __be64 port_guid; 231 | __be16 partition_cap; 232 | __be16 device_id; 233 | __be32 revision; 234 | u8 local_port_num; 235 | u8 vendor_id[3]; 236 | } __attribute__ ((packed)); 237 | 238 | 239 | struct pib_smp_switch_info { 240 | __be16 linear_fdb_cap; 241 | __be16 random_fdb_cap; 242 | __be16 multicast_fdb_cap; 243 | __be16 linear_fdb_top; 244 | u8 default_port; 245 | u8 default_mcast_primary_port; 246 | u8 default_mcast_not_primary_port; 247 | 248 | /* 249 | * LifeTimeValue 5 bits 250 | * PortStateChange 1 bit 251 | * OptimizedSLtoVLMappingProgramming 2bits 252 | */ 253 | u8 various1; 254 | 255 | __be16 lids_per_port; 256 | __be16 partition_enforcement_cap; 257 | 258 | /* 259 | * InboundEnforcementCap 1 bit 260 | * OutboundEnforcementCap 1 bit 261 | * FilterRawInboundCap 1 bit 262 | * EnhancedPort0 1 bit 263 | * Reserved 3 bits 264 | */ 265 | u8 various2; 266 | } __attribute__ ((packed)); 267 | 268 | 269 | struct pib_port_info { 270 | __be64 mkey; 271 | __be64 gid_prefix; 272 | __be16 lid; 273 | __be16 sm_lid; 274 | __be32 cap_mask; 275 | __be16 diag_code; 276 | __be16 mkey_lease_period; 277 | u8 local_port_num; 278 | u8 link_width_enabled; 279 | u8 link_width_supported; 280 | u8 link_width_active; 281 | u8 linkspeed_portstate; /* 4 bits, 4 bits */ 282 | u8 portphysstate_linkdown; /* 4 bits, 4 bits */ 283 | u8 mkeyprot_resv_lmc; /* 2 bits, 3, 3 */ 284 | u8 linkspeedactive_enabled; /* 4 bits, 4 bits */ 285 | u8 neighbormtu_mastersmsl; /* 4 bits, 4 bits */ 286 | u8 vlcap_inittype; /* 4 bits, 4 bits */ 287 | u8 vl_high_limit; 288 | u8 vl_arb_high_cap; 289 | u8 vl_arb_low_cap; 290 | u8 inittypereply_mtucap; /* 4 bits, 4 bits */ 291 | u8 vlstallcnt_hoqlife; /* 3 bits, 5 bits */ 292 | u8 operationalvl_pei_peo_fpi_fpo; /* 4 bits, 1, 1, 1, 1 */ 293 | __be16 mkey_violations; 294 | __be16 pkey_violations; 295 | __be16 qkey_violations; 296 | u8 guid_cap; 297 | u8 clientrereg_resv_subnetto; /* 1 bit, 2 bits, 5 */ 298 | u8 resv_resptimevalue; /* 3 bits, 5 bits */ 299 | u8 localphyerrors_overrunerrors; /* 4 bits, 4 bits */ 300 | __be16 max_credit_hint; 301 | u8 resv; 302 | u8 link_roundtrip_latency[3]; 303 | } __attribute__ ((packed)); 304 | 305 | 306 | struct pib_pma_mad { 307 | struct pib_mad_hdr mad_hdr; 308 | u8 reserved[40]; 309 | u8 data[192]; 310 | } __attribute__ ((packed)); 311 | 312 | 313 | struct pib_class_port_info { 314 | u8 base_version; 315 | u8 class_version; 316 | __be16 capability_mask; 317 | u8 reserved[3]; 318 | u8 resp_time_value; 319 | u8 redirect_gid[16]; 320 | __be32 redirect_tcslfl; 321 | __be16 redirect_lid; 322 | __be16 redirect_pkey; 323 | __be32 redirect_qp; 324 | __be32 redirect_qkey; 325 | u8 trap_gid[16]; 326 | __be32 trap_tcslfl; 327 | __be16 trap_lid; 328 | __be16 trap_pkey; 329 | __be32 trap_hlqp; 330 | __be32 trap_qkey; 331 | } __attribute__ ((packed)); 332 | 333 | 334 | struct pib_pma_portsamplescontrol { 335 | u8 opcode; 336 | u8 port_select; 337 | u8 tick; 338 | u8 counter_width; /* resv: 7:3, counter width: 2:0 */ 339 | __be32 counter_mask0_9; /* 2, 10 3-bit fields */ 340 | __be16 counter_mask10_14; /* 1, 5 3-bit fields */ 341 | u8 sample_mechanisms; 342 | u8 sample_status; /* only lower 2 bits */ 343 | __be64 option_mask; 344 | __be64 vendor_mask; 345 | __be32 sample_start; 346 | __be32 sample_interval; 347 | __be16 tag; 348 | __be16 counter_select[15]; 349 | __be32 reserved1; 350 | __be64 samples_only_option_mask; 351 | __be32 reserved2[28]; 352 | } __attribute__ ((packed)); 353 | 354 | 355 | struct pib_pma_portsamplesresult { 356 | __be16 tag; 357 | __be16 sample_status; /* only lower 2 bits */ 358 | __be32 counter[15]; 359 | } __attribute__ ((packed)); 360 | 361 | 362 | struct pib_pma_portsamplesresult_ext { 363 | __be16 tag; 364 | __be16 sample_status; /* only lower 2 bits */ 365 | __be32 extended_width; /* only upper 2 bits */ 366 | __be64 counter[15]; 367 | } __attribute__ ((packed)); 368 | 369 | 370 | struct pib_pma_portcounters { 371 | u8 reserved; 372 | u8 port_select; 373 | __be16 counter_select; 374 | __be16 symbol_error_counter; 375 | u8 link_error_recovery_counter; 376 | u8 link_downed_counter; 377 | __be16 port_rcv_errors; 378 | __be16 port_rcv_remphys_errors; 379 | __be16 port_rcv_switch_relay_errors; 380 | __be16 port_xmit_discards; 381 | u8 port_xmit_constraint_errors; 382 | u8 port_rcv_constraint_errors; 383 | u8 reserved1; 384 | u8 link_overrun_errors; /* LocalLink: 7:4, BufferOverrun: 3:0 */ 385 | __be16 reserved2; 386 | __be16 vl15_dropped; 387 | __be32 port_xmit_data; 388 | __be32 port_rcv_data; 389 | __be32 port_xmit_packets; 390 | __be32 port_rcv_packets; 391 | __be32 port_xmit_wait; 392 | } __attribute__ ((packed)); 393 | 394 | 395 | struct pib_pma_portcounters_ext { 396 | u8 reserved; 397 | u8 port_select; 398 | __be16 counter_select; 399 | __be32 reserved1; 400 | __be64 port_xmit_data; 401 | __be64 port_rcv_data; 402 | __be64 port_xmit_packets; 403 | __be64 port_rcv_packets; 404 | __be64 port_unicast_xmit_packets; 405 | __be64 port_unicast_rcv_packets; 406 | __be64 port_multicast_xmit_packets; 407 | __be64 port_multicast_rcv_packets; 408 | } __attribute__ ((packed)); 409 | 410 | 411 | struct pib_trap { 412 | /* 413 | * - IsGeneric 414 | * - Type 415 | * - ProducerType / VendorID 416 | */ 417 | __be32 generice_type_prodtype; /* 1 bit, 7 bits, 24 bits */ 418 | __be16 trapnum; 419 | 420 | /* IssuerLID */ 421 | __be16 issuerlid; 422 | 423 | /* 424 | * - NoticeToggle 425 | * - NoticeCount 426 | */ 427 | __be16 toggle_count; /* 1bit, 15 bits */ 428 | 429 | union { 430 | struct { 431 | u8 details[54]; 432 | } raw_data; 433 | 434 | struct { 435 | __be16 lidaddr; 436 | } __attribute__ ((packed)) ntc_128; 437 | } details; 438 | } __attribute__ ((packed)); 439 | 440 | #endif /* _PIBNETD_PACKET_H_ */ 441 | -------------------------------------------------------------------------------- /pibnetd/pibping.c: -------------------------------------------------------------------------------- 1 | /* 2 | * pibping.c - Test tool for pibnetd 3 | * 4 | * Copyright (c) 2014 Minoru NAKAMURA 5 | * 6 | * This code is licenced under the GPL version 2 or BSD license. 7 | */ 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | 18 | #include "pibnetd.h" 19 | 20 | static int verbose; 21 | static uint32_t port_num = PIB_NETD_DEFAULT_PORT; 22 | 23 | int main(int argc, char** argv) 24 | { 25 | struct option longopts[] = { 26 | {"port", required_argument, NULL, 'p' }, 27 | {"verbose", no_argument, NULL, 'v' }, 28 | }; 29 | 30 | int ch, option_index; 31 | 32 | while ((ch = getopt_long(argc, argv, "p:v", longopts, &option_index)) != -1) { 33 | switch (ch) { 34 | 35 | case 'p': 36 | port_num = atoi(optarg); 37 | assert((0 < port_num) && (port_num < 65536)); 38 | break; 39 | 40 | case 'v': // verbose 41 | verbose = 1; 42 | break; 43 | 44 | default: 45 | break; 46 | } 47 | } 48 | 49 | int sockfd; 50 | sockfd = socket(AF_INET, SOCK_DGRAM, 0); 51 | if (sockfd < 0) { 52 | int eno = errno; 53 | fprintf(stderr, "pibping: socket(errno=%d)\n", eno); 54 | exit(EXIT_FAILURE); 55 | } 56 | 57 | struct sockaddr_in sockaddr; 58 | memset(&sockaddr, 0, sizeof(sockaddr)); 59 | sockaddr.sin_family = AF_INET; 60 | sockaddr.sin_addr.s_addr = htonl(INADDR_ANY); 61 | sockaddr.sin_port = 0; 62 | 63 | int ret = bind(sockfd, (struct sockaddr*)&sockaddr, (socklen_t)sizeof(sockaddr)); 64 | if (ret != 0) { 65 | int eno = errno; 66 | fprintf(stderr, "pibping: bind(errno=%d)\n", eno); 67 | exit(EXIT_FAILURE); 68 | } 69 | 70 | char buffer[4096]; 71 | struct msghdr msghdr; 72 | struct iovec iovec; 73 | 74 | sockaddr.sin_port = htons(port_num); 75 | 76 | iovec.iov_base = buffer; 77 | iovec.iov_len = sizeof(buffer); 78 | msghdr.msg_name = &sockaddr; 79 | msghdr.msg_namelen = sizeof(sockaddr); 80 | 81 | msghdr.msg_iov = &iovec; 82 | msghdr.msg_iovlen = 1; 83 | 84 | ret = sendmsg(sockfd, &msghdr, 0); 85 | printf("sendmsg: ret=%d\n", ret); 86 | 87 | return 0; 88 | } 89 | -------------------------------------------------------------------------------- /pibnetd/scripts/redhat-pibnetd.init: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # 3 | # Bring up/down pibnetd 4 | # 5 | # chkconfig: - 15 84 6 | # description: Pseudo InfiniBand HCA switch 7 | # 8 | ### BEGIN INIT INFO 9 | # Provides: pibnetd 10 | ### END INIT INFO 11 | # 12 | # Copyright (c) 2014 Minoru NAKAMURA 13 | # 14 | # This Software is licensed under one of the following licenses: 15 | # 16 | # 1) under the terms of the "Common Public License 1.0" a copy of which is 17 | # available from the Open Source Initiative, see 18 | # http://www.opensource.org/licenses/cpl.php. 19 | # 20 | # 2) under the terms of the "The BSD License" a copy of which is 21 | # available from the Open Source Initiative, see 22 | # http://www.opensource.org/licenses/bsd-license.php. 23 | # 24 | # 3) under the terms of the "GNU General Public License (GPL) Version 2" a 25 | # copy of which is available from the Open Source Initiative, see 26 | # http://www.opensource.org/licenses/gpl-license.php. 27 | # 28 | # Licensee has the right to choose one of the above licenses. 29 | # 30 | # Redistributions of source code must retain the above copyright 31 | # notice and one of the license notices. 32 | # 33 | # Redistributions in binary form must reproduce both the above copyright 34 | # notice, one of the license notices in the documentation 35 | # and/or other materials provided with the distribution. 36 | # 37 | # processname: ${exec_prefix}/sbin/pibnetd 38 | # config: ${prefix}/etc/sysconfig/pibnetd 39 | # pidfile: /var/run/pibnetd.pid 40 | 41 | prefix=/usr 42 | exec_prefix=${prefix} 43 | 44 | . /etc/rc.d/init.d/functions 45 | 46 | CONFIG=${prefix}/etc/sysconfig/pibnetd 47 | if [ -f $CONFIG ]; then 48 | . $CONFIG 49 | fi 50 | 51 | prog=${exec_prefix}/sbin/pibnetd 52 | bin=${prog##*/} 53 | 54 | ACTION=$1 55 | 56 | # Setting pibnetd start parameters 57 | PID_FILE=/var/run/${bin}.pid 58 | touch $PID_FILE 59 | 60 | ######################################################################### 61 | 62 | start() 63 | { 64 | local OSM_PID= 65 | 66 | pid="" 67 | 68 | if [ -f $PID_FILE ]; then 69 | local line p 70 | read line < $PID_FILE 71 | for p in $line ; do 72 | [ -z "${p//[0-9]/}" -a -d "/proc/$p" ] && pid="$pid $p" 73 | done 74 | fi 75 | 76 | if [ -z "$pid" ]; then 77 | pid=`pidof -o $$ -o $PPID -o %PPID -x $bin` 78 | fi 79 | 80 | if [ -n "${pid:-}" ] ; then 81 | echo $"${bin} (pid $pid) is already running..." 82 | else 83 | 84 | # Start pibnetd 85 | echo -n "Starting Pseudo InfiniBand HCA switch" 86 | $prog --daemon ${OPTIONS} > /dev/null 87 | cnt=0; alive=0 88 | while [ $cnt -lt 6 -a $alive -ne 1 ]; do 89 | echo -n "."; 90 | sleep 1 91 | alive=0 92 | OSM_PID=`pidof $prog` 93 | if [ "$OSM_PID" != "" ]; then 94 | alive=1 95 | fi 96 | let cnt++; 97 | done 98 | 99 | echo $OSM_PID > $PID_FILE 100 | checkpid $OSM_PID 101 | RC=$? 102 | [ $RC -eq 0 ] && echo_success || echo_failure 103 | [ $RC -eq 0 ] && touch /var/lock/subsys/pibnetd 104 | echo 105 | 106 | fi 107 | return $RC 108 | } 109 | 110 | stop() 111 | { 112 | local pid= 113 | local pid1= 114 | local pid2= 115 | 116 | if [ -f $PID_FILE ]; then 117 | local line p 118 | read line < $PID_FILE 119 | for p in $line ; do 120 | [ -z "${p//[0-9]/}" -a -d "/proc/$p" ] && pid1="$pid1 $p" 121 | done 122 | fi 123 | 124 | pid2=`pidof -o $$ -o $PPID -o %PPID -x $bin` 125 | 126 | pid=`echo "$pid1 $pid2" | sed -e 's/\ /\n/g' | sort -n | uniq | sed -e 's/\n/\ /g'` 127 | 128 | if [ -n "${pid:-}" ] ; then 129 | # Kill pibnetd 130 | echo -n "Stopping Pseudo InfiniBand HCA switch" 131 | kill -15 $pid > /dev/null 2>&1 132 | cnt=0; alive=1 133 | while [ $cnt -lt 6 -a $alive -ne 0 ]; do 134 | echo -n "."; 135 | alive=0 136 | for p in $pid; do 137 | if checkpid $p ; then alive=1; echo -n "-"; fi 138 | done 139 | let cnt++; 140 | sleep $alive 141 | done 142 | 143 | for p in $pid 144 | do 145 | while checkpid $p ; do 146 | kill -KILL $p > /dev/null 2>&1 147 | echo -n "+" 148 | sleep 1 149 | done 150 | done 151 | checkpid $pid 152 | RC=$? 153 | [ $RC -eq 0 ] && echo_failure || echo_success 154 | echo 155 | RC=$((! $RC)) 156 | else 157 | echo -n "Stopping Pseudo InfiniBand HCA switch" 158 | echo_failure 159 | echo 160 | RC=1 161 | fi 162 | 163 | # Remove pid file if any. 164 | rm -f $PID_FILE 165 | rm -f /var/lock/subsys/pibnetd 166 | return $RC 167 | } 168 | 169 | status() 170 | { 171 | local pid 172 | 173 | # First try "pidof" 174 | pid=`pidof -o $$ -o $PPID -o %PPID -x ${bin}` 175 | if [ -n "$pid" ]; then 176 | echo $"${bin} (pid $pid) is running..." 177 | return 0 178 | fi 179 | 180 | # Next try "/var/run/pibnetd.pid" files 181 | if [ -f $PID_FILE ] ; then 182 | read pid < $PID_FILE 183 | if [ -n "$pid" ]; then 184 | echo $"${bin} dead but pid file $PID_FILE exists" 185 | return 1 186 | fi 187 | fi 188 | echo $"${bin} is stopped" 189 | return 3 190 | } 191 | 192 | 193 | 194 | case $ACTION in 195 | start) 196 | start 197 | ;; 198 | stop) 199 | stop 200 | ;; 201 | restart) 202 | stop 203 | start 204 | ;; 205 | status) 206 | status 207 | ;; 208 | condrestart) 209 | pid=`pidof -o $$ -o $PPID -o %PPID -x $bin` 210 | if [ -n "$pid" ]; then 211 | stop 212 | sleep 1 213 | start 214 | fi 215 | ;; 216 | *) 217 | echo 218 | echo "Usage: `basename $0` {start|stop|restart|status}" 219 | echo 220 | exit 1 221 | ;; 222 | esac 223 | 224 | RC=$? 225 | exit $RC 226 | -------------------------------------------------------------------------------- /test/.gitignore: -------------------------------------------------------------------------------- 1 | *.o 2 | comp_vector 3 | show_mem_reg 4 | qp-roundrobin 5 | query_pkey 6 | test-ib_reg_mr-01 7 | test-ipoib-01 8 | -------------------------------------------------------------------------------- /test/Makefile: -------------------------------------------------------------------------------- 1 | TARGETS = \ 2 | test-ipoib-01 \ 3 | test-ib_reg_mr-01 \ 4 | comp_vector \ 5 | show_mem_reg \ 6 | qp-roundrobin \ 7 | query_pkey 8 | 9 | CFLAGS = -g -O1 -Wall -D_GNU_SOURCE 10 | 11 | LIBS = -libverbs -lpthread -lrt -lm 12 | 13 | ALL: $(TARGETS) 14 | 15 | $(TARGETS): %: %.c 16 | gcc $(CFLAGS) $(LIBS) $^ -o $@ 17 | 18 | clean: 19 | rm -f $(TARGETS) *.o *~ 20 | 21 | .PHONY: clean 22 | -------------------------------------------------------------------------------- /test/comp_vector.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Show the number of completion vectors 3 | * 4 | * Copyright (c) 2014 Minoru NAKAMURA 5 | */ 6 | 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | 14 | 15 | static void usage(const char *argv0) 16 | { 17 | printf("Usage: [-d ] [-i ]\n"); 18 | printf(" -d, --ib-dev= use IB device (default first device found)\n"); 19 | printf(" -i, --ib-port= use port of IB device (default 1)\n"); 20 | } 21 | 22 | 23 | int main(int argc, char *argv[]) 24 | { 25 | struct ibv_device *ib_dev; 26 | char *ib_devname = NULL; 27 | int ib_port = 1; 28 | 29 | while (1) { 30 | int c; 31 | 32 | static struct option long_options[] = { 33 | { .name = "ib-dev", .has_arg = 1, .val = 'd' }, 34 | { .name = "ib-port", .has_arg = 1, .val = 'i' }, 35 | { 0 } 36 | }; 37 | 38 | c = getopt_long(argc, argv, "d:i:", long_options, NULL); 39 | if (c == -1) 40 | break; 41 | 42 | switch (c) { 43 | 44 | case 'd': 45 | ib_devname = strdupa(optarg); 46 | break; 47 | 48 | case 'i': 49 | ib_port = strtol(optarg, NULL, 0); 50 | if (ib_port < 0) { 51 | usage(argv[0]); 52 | return 1; 53 | } 54 | break; 55 | 56 | default: 57 | usage(argv[0]); 58 | return 1; 59 | } 60 | } 61 | 62 | struct ibv_device **dev_list = ibv_get_device_list(NULL); 63 | if (!dev_list) { 64 | fprintf(stderr, "Failed to get IB devices list: errnor=%d\n", errno); 65 | return 1; 66 | } 67 | 68 | if (!ib_devname) { 69 | ib_dev = *dev_list; 70 | if (!ib_dev) { 71 | fprintf(stderr, "No IB devices found\n"); 72 | return 1; 73 | } 74 | } else { 75 | int i; 76 | for (i = 0; dev_list[i]; ++i) 77 | if (!strcmp(ibv_get_device_name(dev_list[i]), ib_devname)) 78 | break; 79 | ib_dev = dev_list[i]; 80 | if (!ib_dev) { 81 | fprintf(stderr, "IB device %s not found\n", ib_devname); 82 | return 1; 83 | } 84 | } 85 | 86 | struct ibv_context* context = ibv_open_device(ib_dev); 87 | if (!context) { 88 | fprintf(stderr, "Couldn't get context for %s: errno=%d\n", 89 | ibv_get_device_name(ib_dev), errno); 90 | return 1; 91 | } 92 | 93 | printf("num_comp_vectors: %d\n", context->num_comp_vectors); 94 | 95 | if (ibv_close_device(context)) { 96 | fprintf(stderr, "Couldn't release context\n"); 97 | return 1; 98 | } 99 | 100 | ibv_free_device_list(dev_list); 101 | 102 | return 0; 103 | } 104 | -------------------------------------------------------------------------------- /test/qp-roundrobin.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Check whether a sequnece of QP number is monotonic increasing or not. 3 | * 4 | * Copyright (c) 2014 Minoru NAKAMURA 5 | */ 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include 14 | #include 15 | #include 16 | 17 | 18 | enum { 19 | SERVICE_LEVEL = 0 20 | }; 21 | 22 | 23 | static struct { 24 | int phys_port_cnt; 25 | struct { 26 | struct ibv_pd *pd; 27 | } ibv_port_data[2]; 28 | } ibv_device_data[4]; 29 | 30 | 31 | static void do_device(struct ibv_device *ib_dev, int dev_id); 32 | static void do_port(struct ibv_context *context, int dev_id, uint8_t port_num, int max_wqe); 33 | 34 | int main(int argc, char** argv) 35 | { 36 | struct ibv_device **dev_list, **dev_it; 37 | 38 | dev_list = ibv_get_device_list(NULL); 39 | if (!dev_list) { 40 | fprintf(stderr, "Failed to get IB devices list"); 41 | return 1; 42 | } 43 | 44 | int dev_id = 0; 45 | for (dev_it = dev_list ; *dev_it ; dev_it++) { 46 | do_device(*dev_it, dev_id++); 47 | } 48 | 49 | ibv_free_device_list(dev_list); 50 | 51 | return 0; 52 | } 53 | 54 | 55 | static void do_device(struct ibv_device *ib_dev, int dev_id) 56 | { 57 | struct ibv_context *context; 58 | 59 | context = ibv_open_device(ib_dev); 60 | if (!context) { 61 | fprintf(stderr, "Couldn't get context for %s\n", 62 | ibv_get_device_name(ib_dev)); 63 | return; 64 | } 65 | 66 | struct ibv_device_attr device_attr; 67 | if (ibv_query_device(context, &device_attr)) { 68 | return; 69 | } 70 | 71 | ibv_device_data[dev_id].phys_port_cnt = device_attr.phys_port_cnt; 72 | 73 | uint8_t port_num; 74 | for (port_num = 1 ; port_num < device_attr.phys_port_cnt + 1 ; port_num++) { 75 | printf("%s %d\n", ibv_get_device_name(ib_dev), port_num); 76 | do_port(context, dev_id, port_num, device_attr.max_qp_wr); 77 | } 78 | 79 | if (ibv_close_device(context)) { 80 | fprintf(stderr, "Couldn't release context\n"); 81 | return; 82 | } 83 | } 84 | 85 | 86 | static void do_port(struct ibv_context *context, int dev_id, uint8_t port_num, int max_wqe) 87 | { 88 | struct ibv_pd *pd; 89 | pd = ibv_alloc_pd(context); 90 | assert(pd); 91 | 92 | struct ibv_cq *cq1, *cq2; 93 | cq1 = ibv_create_cq(context, max_wqe + 100, NULL, NULL, 0); 94 | assert(cq1); 95 | 96 | cq2 = ibv_create_cq(context, max_wqe + 100, NULL, NULL, 0); 97 | assert(cq2); 98 | 99 | ibv_device_data[dev_id].ibv_port_data[port_num - 1].pd = pd; 100 | 101 | uint64_t counter = 0; 102 | uint32_t prev_qp_num = 0; 103 | 104 | for (;;) { 105 | struct ibv_qp *qp; 106 | struct ibv_qp_init_attr qp_attr = { 107 | .send_cq = cq1, 108 | .recv_cq = cq2, 109 | .cap = { 110 | .max_send_wr = max_wqe, 111 | .max_recv_wr = max_wqe, 112 | .max_send_sge = 1, 113 | .max_recv_sge = 2, 114 | }, 115 | .qp_type = IBV_QPT_UD, 116 | }; 117 | 118 | qp = ibv_create_qp(pd, &qp_attr); 119 | if (qp == NULL) { 120 | printf("ERROR: ibv_create_qp errno=%d\n", errno); 121 | printf("counter=%lu\n", counter); 122 | exit(EXIT_FAILURE); 123 | } 124 | 125 | if (qp->qp_num <= prev_qp_num) { 126 | printf("%lu: %06x -> %06x\n", 127 | counter, prev_qp_num, qp->qp_num); 128 | } 129 | 130 | prev_qp_num = qp->qp_num; 131 | 132 | ibv_destroy_qp(qp); 133 | 134 | counter++; 135 | } 136 | } 137 | -------------------------------------------------------------------------------- /test/query_pkey.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2014 Minoru NAKAMURA 3 | */ 4 | 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | 13 | int main(int argc, char **argv) 14 | { 15 | struct ibv_device **dev_list; 16 | 17 | dev_list = ibv_get_device_list(NULL); 18 | 19 | for (; *dev_list ; dev_list++) { 20 | int ret, port_index; 21 | struct ibv_context *context; 22 | 23 | printf("%s\n", ibv_get_device_name(*dev_list)); 24 | 25 | context = ibv_open_device(*dev_list); 26 | assert(context != NULL); 27 | 28 | struct ibv_device_attr device_attr; 29 | ret = ibv_query_device(context, &device_attr); 30 | assert(ret == 0); 31 | 32 | for (port_index = 0 ; port_index < device_attr.phys_port_cnt ; port_index++) { 33 | int i, snip; 34 | struct ibv_port_attr port_attr; 35 | 36 | printf("\tport_num = %d\n", port_index + 1); 37 | 38 | ret = ibv_query_port(context, port_index + 1, &port_attr); 39 | assert(ret == 0); 40 | 41 | assert(port_attr.pkey_tbl_len > 0); 42 | assert(port_attr.gid_tbl_len > 0); 43 | 44 | uint16_t pkey, prev_pkey; 45 | 46 | ret = ibv_query_pkey(context, port_index + 1, 0, &pkey); 47 | assert(ret == 0); 48 | printf("\t\tindex = %3d, pkey = %4x\n", 0, pkey); 49 | 50 | snip = 0; 51 | for (i = 1 ; i < port_attr.pkey_tbl_len ; i++) { 52 | prev_pkey = pkey; 53 | ret = ibv_query_pkey(context, port_index + 1, i, &pkey); 54 | if ((pkey != prev_pkey) || (i == port_attr.pkey_tbl_len - 1)) { 55 | printf("\t\tindex = %3d, pkey = %4x\n", i, pkey); 56 | snip = 0; 57 | } else if (snip == 0) { 58 | printf("\t\t\t(snip)\n"); 59 | snip = 1; 60 | } 61 | } 62 | printf("\n"); 63 | 64 | union ibv_gid gid, prev_gid; 65 | ret = ibv_query_gid(context, port_index + 1, 0, &gid); 66 | assert(ret == 0); 67 | 68 | printf("\t\tindex = %3d, GID: %016" PRIx64 ":%016" PRIx64 "\n", 69 | 0, ntohll(gid.global.subnet_prefix), ntohll(gid.global.interface_id)); 70 | 71 | snip = 0; 72 | for (i = 1 ; i < port_attr.gid_tbl_len ; i++) { 73 | prev_gid = gid; 74 | 75 | ret = ibv_query_gid(context, port_index + 1, i, &gid); 76 | assert(ret == 0); 77 | 78 | if ((memcmp(&gid, &prev_gid, sizeof(gid)) != 0) || 79 | (i == port_attr.gid_tbl_len - 1)) { 80 | printf("\t\tindex = %3d, GID: %016" PRIx64 ":%016" PRIx64 "\n", 81 | i, ntohll(gid.global.subnet_prefix), ntohll(gid.global.interface_id)); 82 | snip = 0; 83 | } else if (snip == 0) { 84 | printf("\t\t\t(snip)\n"); 85 | snip = 1; 86 | } 87 | } 88 | printf("\n"); 89 | } 90 | 91 | ret = ibv_close_device(context); 92 | assert(ret == 0); 93 | } 94 | 95 | return 0; 96 | } 97 | -------------------------------------------------------------------------------- /test/show_mem_reg.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2014 Minoru NAKAMURA 3 | */ 4 | 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | 13 | enum { 14 | MAX_MR = 65536 15 | }; 16 | 17 | struct ibv_mr *mr_array[MAX_MR]; 18 | 19 | static void usage(const char *argv0) 20 | { 21 | printf("Usage: [-d ] [-i ] [# of MRs]\n"); 22 | printf(" -d, --ib-dev= use IB device (default first device found)\n"); 23 | printf(" -i, --ib-port= use port of IB device (default 1)\n"); 24 | printf(" -s, --size= allocating size of memory region (default 4096)\n"); 25 | } 26 | 27 | 28 | int main(int argc, char *argv[]) 29 | { 30 | int num_mr = 16; 31 | size_t size = 4096; 32 | struct ibv_device *ib_dev; 33 | char *ib_devname = NULL; 34 | int ib_port = 1; 35 | 36 | while (1) { 37 | int c; 38 | 39 | static struct option long_options[] = { 40 | { .name = "ib-dev", .has_arg = 1, .val = 'd' }, 41 | { .name = "ib-port", .has_arg = 1, .val = 'i' }, 42 | { .name = "size", .has_arg = 1, .val = 's' }, 43 | { 0 } 44 | }; 45 | 46 | c = getopt_long(argc, argv, "d:i:s:", long_options, NULL); 47 | if (c == -1) 48 | break; 49 | 50 | switch (c) { 51 | 52 | case 'd': 53 | ib_devname = strdupa(optarg); 54 | break; 55 | 56 | case 'i': 57 | ib_port = strtol(optarg, NULL, 10); 58 | if (ib_port < 0) { 59 | usage(argv[0]); 60 | return 1; 61 | } 62 | break; 63 | 64 | case 's': 65 | size = strtol(optarg, NULL, 10); 66 | break; 67 | 68 | default: 69 | usage(argv[0]); 70 | return 1; 71 | } 72 | } 73 | 74 | if (optind < argc) { 75 | num_mr = strtol(argv[optind], NULL, 10); 76 | assert((0 <= num_mr) && (num_mr <= MAX_MR)); 77 | } 78 | 79 | struct ibv_device **dev_list = ibv_get_device_list(NULL); 80 | if (!dev_list) { 81 | fprintf(stderr, "Failed to get IB devices list: errnor=%d\n", errno); 82 | return 1; 83 | } 84 | 85 | if (!ib_devname) { 86 | ib_dev = *dev_list; 87 | if (!ib_dev) { 88 | fprintf(stderr, "No IB devices found\n"); 89 | return 1; 90 | } 91 | } else { 92 | int i; 93 | for (i = 0; dev_list[i]; ++i) 94 | if (!strcmp(ibv_get_device_name(dev_list[i]), ib_devname)) 95 | break; 96 | ib_dev = dev_list[i]; 97 | if (!ib_dev) { 98 | fprintf(stderr, "IB device %s not found\n", ib_devname); 99 | return 1; 100 | } 101 | } 102 | 103 | struct ibv_context *context = ibv_open_device(ib_dev); 104 | if (!context) { 105 | fprintf(stderr, "Couldn't get context for %s: errno=%d\n", 106 | ibv_get_device_name(ib_dev), errno); 107 | return 1; 108 | } 109 | 110 | struct ibv_pd *pd = ibv_alloc_pd(context); 111 | if (!pd) { 112 | fprintf(stderr, "Couldn't allocate protection domain: errno=%d\n", 113 | errno); 114 | return 1; 115 | } 116 | 117 | int i; 118 | for (i = 0 ; i < num_mr ; i++) { 119 | char *buffer = malloc(size); 120 | 121 | mr_array[i] = ibv_reg_mr(pd, buffer, size, IBV_ACCESS_LOCAL_WRITE); 122 | if (!mr_array[i]) { 123 | fprintf(stderr, "Couldn't register memory region: errno=%d\n", 124 | errno); 125 | return 1; 126 | } 127 | printf("[%2d] %08x %08x\n", i, mr_array[i]->lkey, mr_array[i]->rkey); 128 | } 129 | 130 | for (i = num_mr - 1 ; i >= 0 ; i--) 131 | ibv_dereg_mr(mr_array[i]); 132 | 133 | if (ibv_dealloc_pd(pd)) { 134 | fprintf(stderr, "Couldn't deallocate PD\n"); 135 | return 1; 136 | } 137 | 138 | if (ibv_close_device(context)) { 139 | fprintf(stderr, "Couldn't release context\n"); 140 | return 1; 141 | } 142 | 143 | ibv_free_device_list(dev_list); 144 | 145 | return 0; 146 | } 147 | -------------------------------------------------------------------------------- /test/show_mem_reg.txt: -------------------------------------------------------------------------------- 1 | [ 0] 54078000 5407b000 2 | [ 1] 2547a001 d5475001 3 | [ 2] 5586b002 45868002 4 | [ 3] c2f69003 c2f6e003 5 | [ 4] a326f004 a326c004 6 | [ 5] c386d005 c3892005 7 | [ 6] 30c92006 30c91006 8 | [ 7] 71090007 71097007 9 | [ 8] 4e496008 4e495008 10 | [ 9] beb94009 aeb9b009 11 | [10] 7fe9900a 7fe9a00a 12 | [11] 4d19b00b 4d19c00b 13 | [12] 4a49d00c 4a49e00c 14 | [13] 7ab9e00d 7ab8100d 15 | [14] 9be8000e 8be8300e 16 | [15] 7858200f 7858500f 17 | -------------------------------------------------------------------------------- /test/test-ib_reg_mr-01.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2014 Minoru NAKAMURA 3 | */ 4 | 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | 13 | static void usage(const char *argv0) 14 | { 15 | printf("Usage: [-d ] [-i ]\n"); 16 | printf(" -d, --ib-dev= use IB device (default first device found)\n"); 17 | printf(" -i, --ib-port= use port of IB device (default 1)\n"); 18 | } 19 | 20 | int main(int argc, char *argv[]) 21 | { 22 | struct ibv_device *ib_dev; 23 | char *ib_devname = NULL; 24 | int ib_port = 1; 25 | 26 | while (1) { 27 | int c; 28 | 29 | static struct option long_options[] = { 30 | { .name = "ib-dev", .has_arg = 1, .val = 'd' }, 31 | { .name = "ib-port", .has_arg = 1, .val = 'i' }, 32 | { 0 } 33 | }; 34 | 35 | c = getopt_long(argc, argv, "d:i:", long_options, NULL); 36 | if (c == -1) 37 | break; 38 | 39 | switch (c) { 40 | 41 | case 'd': 42 | ib_devname = strdupa(optarg); 43 | break; 44 | 45 | case 'i': 46 | ib_port = strtol(optarg, NULL, 10); 47 | if (ib_port < 0) { 48 | usage(argv[0]); 49 | return 1; 50 | } 51 | break; 52 | 53 | default: 54 | usage(argv[0]); 55 | return 1; 56 | } 57 | } 58 | 59 | struct ibv_device **dev_list = ibv_get_device_list(NULL); 60 | if (!dev_list) { 61 | fprintf(stderr, "Failed to get IB devices list: errnor=%d\n", errno); 62 | return 1; 63 | } 64 | 65 | if (!ib_devname) { 66 | ib_dev = *dev_list; 67 | if (!ib_dev) { 68 | fprintf(stderr, "No IB devices found\n"); 69 | return 1; 70 | } 71 | } else { 72 | int i; 73 | for (i = 0; dev_list[i]; ++i) 74 | if (!strcmp(ibv_get_device_name(dev_list[i]), ib_devname)) 75 | break; 76 | ib_dev = dev_list[i]; 77 | if (!ib_dev) { 78 | fprintf(stderr, "IB device %s not found\n", ib_devname); 79 | return 1; 80 | } 81 | } 82 | 83 | struct ibv_context *context = ibv_open_device(ib_dev); 84 | if (!context) { 85 | fprintf(stderr, "Couldn't get context for %s: errno=%d\n", 86 | ibv_get_device_name(ib_dev), errno); 87 | return 1; 88 | } 89 | 90 | struct ibv_pd *pd = ibv_alloc_pd(context); 91 | if (!pd) { 92 | fprintf(stderr, "Couldn't allocate protection domain: errno=%d\n", 93 | errno); 94 | return 1; 95 | } 96 | 97 | size_t size; 98 | for (size = 1 ; size < 256 * 1024 * 1024 ; size *= 2) { 99 | char *buffer = malloc(size); 100 | struct ibv_mr *mr; 101 | 102 | mr = ibv_reg_mr(pd, buffer, size, IBV_ACCESS_LOCAL_WRITE); 103 | if (mr) { 104 | printf("ibv_reg_mr: size = %zu OK\n", size); 105 | } else { 106 | printf("ibv_reg_mr: size = %zu NG\n", size); 107 | fprintf(stderr, "Couldn't register memory region: errno=%d\n", 108 | errno); 109 | return 1; 110 | } 111 | ibv_dereg_mr(mr); 112 | } 113 | 114 | if (ibv_dealloc_pd(pd)) { 115 | fprintf(stderr, "Couldn't deallocate PD\n"); 116 | return 1; 117 | } 118 | 119 | if (ibv_close_device(context)) { 120 | fprintf(stderr, "Couldn't release context\n"); 121 | return 1; 122 | } 123 | 124 | ibv_free_device_list(dev_list); 125 | 126 | return 0; 127 | } 128 | -------------------------------------------------------------------------------- /test/test-ipoib-01.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Copyright (c) 2014 Minoru NAKAMURA 3 | */ 4 | 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | #include 18 | 19 | 20 | enum { 21 | MAX_IPOIB_NETDEV = 8, 22 | MAX_RETRY_COUNT = 100 23 | }; 24 | 25 | 26 | static int num_ipoib_netdev; 27 | 28 | static struct { 29 | int sockfd; 30 | struct sockaddr_in sockaddr; 31 | } ipoib_netdevs[MAX_IPOIB_NETDEV]; 32 | 33 | 34 | static void setup_sockets(void); 35 | static void run_test(void); 36 | static void test_one_iteration(int from, int to); 37 | 38 | int main(int argc, char **argv) 39 | { 40 | 41 | printf( 42 | "Before this program runs, input the following commands\n" 43 | "\techo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore\n" 44 | "\techo 2 > /proc/sys/net/ipv4/conf/all/arp_announce\n\n"); 45 | 46 | setup_sockets(); 47 | 48 | if (num_ipoib_netdev == 0) { 49 | fprintf(stderr, "Not found any IPoIB netdev.\n"); 50 | exit(EXIT_FAILURE); 51 | } 52 | 53 | run_test(); 54 | 55 | printf("OK\n"); 56 | 57 | return 0; 58 | } 59 | 60 | 61 | static void setup_sockets(void) 62 | { 63 | struct ifaddrs *ifaddr, *ifa; 64 | int family; 65 | 66 | if (getifaddrs(&ifaddr) < 0) { 67 | perror("getifaddrs"); 68 | exit(EXIT_FAILURE); 69 | } 70 | 71 | for (ifa = ifaddr; ifa != NULL; ifa = ifa->ifa_next) { 72 | int sockfd; 73 | if (ifa->ifa_addr == NULL) 74 | continue; 75 | 76 | family = ifa->ifa_addr->sa_family; 77 | 78 | if (family != AF_INET) 79 | continue; 80 | 81 | if (strncmp(ifa->ifa_name, "ib", 2) != 0) 82 | continue; 83 | 84 | sockfd = socket(AF_INET, SOCK_DGRAM | SOCK_NONBLOCK, 0); 85 | if (sockfd < 0) { 86 | perror("socket"); 87 | exit(EXIT_FAILURE); 88 | } 89 | 90 | struct sockaddr_in sockaddr; 91 | 92 | memcpy(&sockaddr, ifa->ifa_addr, sizeof(struct sockaddr_in)); 93 | sockaddr.sin_port = 0; 94 | 95 | int ret; 96 | 97 | ret = bind(sockfd, (struct sockaddr*)&sockaddr, (socklen_t)sizeof(sockaddr)); 98 | if (ret < 0) { 99 | perror("bind"); 100 | exit(EXIT_FAILURE); 101 | } 102 | 103 | #if 0 104 | ret = setsockopt(sockfd, SOL_SOCKET, SO_BINDTODEVICE, ifa->ifa_name, strlen(ifa->ifa_name) + 1); 105 | if (ret < 0) { 106 | perror("setsockopt(SO_BINDTODEVICE)"); 107 | exit(EXIT_FAILURE); 108 | } 109 | #endif 110 | 111 | ipoib_netdevs[num_ipoib_netdev].sockfd = sockfd; 112 | 113 | socklen_t socklen = sizeof(ipoib_netdevs[num_ipoib_netdev].sockaddr); 114 | ret = getsockname(sockfd, (struct sockaddr*)&ipoib_netdevs[num_ipoib_netdev].sockaddr, 115 | &socklen); 116 | if (ret < 0) { 117 | perror("getsockname"); 118 | exit(EXIT_FAILURE); 119 | } 120 | 121 | printf("setup %s %s:%u\n", 122 | ifa->ifa_name, 123 | inet_ntoa(ipoib_netdevs[num_ipoib_netdev].sockaddr.sin_addr), 124 | ntohs(ipoib_netdevs[num_ipoib_netdev].sockaddr.sin_port)); 125 | 126 | num_ipoib_netdev++; 127 | } 128 | 129 | freeifaddrs(ifaddr); 130 | } 131 | 132 | 133 | static void run_test(void) 134 | { 135 | int j, k; 136 | 137 | for (j = 0 ; j 0) 230 | assert(memcmp(send_buf, recv_buf, send_size) == 0); 231 | } 232 | --------------------------------------------------------------------------------