Commit 4549117
committed
Optimization of custom_reduce_over_group function.
The function used to perform custom reduction in a single
work-item (leader of the work-group sequentially).
It now does so cooperatively few iterations, and
processes remaining non-reduced elements sequentially
in the leading work-item.
The custom_reduce_over_group got sped up about a factor of 3x.
The following now shows timing of the reduction kernel
```
unitrace -d -v -i 20 python -c "import dpctl.tensor as dpt; dpt.min(dpt.ones(10**7, dtype=dpt.float32)).sycl_queue.wait()"
```
or par (less that 10%) slower than the int32 kernel, which uses
built-in sycl::reduce_over_group:
```
unitrace -d -v -i 20 python -c "import dpctl.tensor as dpt; dpt.min(dpt.ones(10**7, dtype=dpt.int32)).sycl_queue.wait()"
```1 parent 0bcd635 commit 4549117
1 file changed
+59
-5
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
132 | 132 | | |
133 | 133 | | |
134 | 134 | | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
135 | 163 | | |
136 | 164 | | |
137 | 165 | | |
138 | 166 | | |
139 | 167 | | |
140 | 168 | | |
141 | | - | |
142 | | - | |
| 169 | + | |
| 170 | + | |
143 | 171 | | |
| 172 | + | |
144 | 173 | | |
145 | 174 | | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
146 | 200 | | |
147 | 201 | | |
148 | | - | |
| 202 | + | |
149 | 203 | | |
150 | 204 | | |
151 | 205 | | |
152 | 206 | | |
153 | | - | |
| 207 | + | |
154 | 208 | | |
155 | | - | |
| 209 | + | |
156 | 210 | | |
157 | 211 | | |
158 | 212 | | |
| |||
0 commit comments