Skip to content

Replace the use of OpenCL kernels with API calls for copies of rectangular buffers#1169

Merged
rok-cesnovar merged 6 commits intostan-dev:developfrom
bstatcomp:feature_use_api_calls_for_rectangular_copies
Mar 21, 2019
Merged

Replace the use of OpenCL kernels with API calls for copies of rectangular buffers#1169
rok-cesnovar merged 6 commits intostan-dev:developfrom
bstatcomp:feature_use_api_calls_for_rectangular_copies

Conversation

@rok-cesnovar
Copy link
Member

Summary

This replaces the use of OpenCL kernels with the API calls for rectangular buffers. For triangular copies the kernels are still faster than API+zeros() kernel. See #1168 for benchmark results. Will post the benchmarks for triangular copies in a reply below.

Tests

/

Side Effects

/

Checklist

@rok-cesnovar rok-cesnovar self-assigned this Mar 20, 2019
ncols, A.rows(), A.cols(), this->rows(),
this->cols(), triangular_view);
if (triangular_view == TriangularViewCL::Entire) {
cl::size_t<3> src_offset;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could find a way to initialize this cl::size_t vectors in a shorter manner :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah. this is a silly problem with their API it seems. I think the general purpose solution is something like:

template <typename Arg, typename... Ts, typename std::enable_if<std::is_integral<Arg>::value>::type * = nullptr>
auto to_size_t(Arg a, Ts... all) {
  constexpr size_t len = sizeof...(Ts)+1;
  int values[len] = {a, all...};
  cl::size_t<len> s;
  for (size_t i = 0; i < len; i++)
    s[i] = values[i];
  return s;
}

(we do all the variadic weirdness to make sure we're getting ints at compile time).

this one is a little hackier and if you wanted to pass more than three, you'd have to specify a template parameter. But it's much less confusing imo and is probably fine forever for us:

template <int len=3>
cl::size_t<len> to_size_t(std::vector<int> values) {
  assert(values.size() ==len);
  cl::size_t<len> s;
  for (size_t i = 0; i < len; i++)
    s[i] = values[i];
  return s;
}

Called like this:

int main() {
  cl::size_t<3> s = to_size_t({1, 2, 3});
  return s[0];
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Sean!! I am going to try and incorporate the second one today, I thinks this one is going to be fine forever, for sure :)

@rok-cesnovar
Copy link
Member Author

The benchmark file: https://gist.github.com/rok-cesnovar/a9e7196bcd0730f29ca2591f54fd3a10
The benchmark times for the triangular copies to confirm its not faster:
triangular_copies.txt

seantalts
seantalts previously approved these changes Mar 21, 2019
Copy link
Member

@seantalts seantalts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Just put a comment for a possible semi-improvement on specifying cl::size_t; up to you if you think something like that is worth implementing here.

@rok-cesnovar
Copy link
Member Author

I added the to_size_t to the context file as it maybe come in handy in other places. If that is OK with you, please approve again and I will merge once it passes. Thanks again for the help!

Copy link
Member

@seantalts seantalts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you!

@rok-cesnovar rok-cesnovar merged commit ca2d77f into stan-dev:develop Mar 21, 2019
@rok-cesnovar rok-cesnovar deleted the feature_use_api_calls_for_rectangular_copies branch March 21, 2019 18:20
@rok-cesnovar
Copy link
Member Author

rok-cesnovar commented Mar 21, 2019

@seantalts sooo, this one is weird.

on a routine ./runTests.py test/unit -f opencl after I made some unrelated changes I got this message:

In file included from test/unit/math/opencl/opencl_context_test.cpp:2:0:
./stan/math/opencl/opencl_context.hpp: In function ‘cl::size_t<len> stan::math::opencl::to_size_t(std::vector<long unsigned int>)’:
./stan/math/opencl/opencl_context.hpp:43:3: error: there are no arguments to ‘assert’ that depend on a template parameter, so a declaration of ‘assert’ must be available [-fpermissive]
   assert(values.size() == len);
   ^~~~~~
./stan/math/opencl/opencl_context.hpp:43:3: note: (if you use ‘-fpermissive’, G++ will accept your code, but allowing the use of an undeclared name is deprecated)

This also happens on develop, although not on Jenkins apparently. So I thought it must be g++. clang then gave this error:

In file included from test/unit/math/opencl/opencl_context_test.cpp:2:
./stan/math/opencl/opencl_context.hpp:43:3: error: use of undeclared identifier 'assert'
  assert(values.size() == len);

Google told me to add #include <cassert> and after I did that, no more errors.

So the extra weird is that this passed Jenkins, and this exact same code compiles with every other test
that includes the same file except for opencl_context_test.

Am I missing something? If not, what is the process here? Revert this PR or open a new one that fixes this? Anyways, I apologize...

@bob-carpenter
Copy link
Member

I had thought we'd decided not to include C-style asserts in any of our code.

If that test is to catch user errors, it should throw rather than assert. If it's there to catch programmer errors, it should be replaced with unit tests.

Hopefully, @syclik and @seantalts will weigh in on policy and correct me if I'm misremembering.

@seantalts
Copy link
Member

I haven't heard about the assert policy. This one is to help developers and I'm not sure how to replace it with unit tests.

@seantalts
Copy link
Member

Am I missing something? If not, what is the process here? Revert this PR or open a new one that fixes this? Anyways, I apologize...

I think we should roll forward since the build isn't broken. Bob, what would you do here? We're just creating a simple helper function that creates a cl::size_t, which is kind of an annoying type to create. We talked about two solutions here: #1169 (comment) one of which is fairly complicated templating that is always correct and doesn't have an assert, and the other is simpler but has an assert. Is there a way to get rid of the assert there? Or should we go with the full template solution? or some other thing?

@bob-carpenter
Copy link
Member

bob-carpenter commented Mar 21, 2019 via email

@seantalts
Copy link
Member

I don't anticipate it happening at runtime but I do anticipate it happening during development...

@syclik
Copy link
Member

syclik commented Mar 21, 2019 via email

@seantalts
Copy link
Member

Do we have an exception we use for Stan developer error? I guess we could use that? Or else just code it the dumb way for the specific case of 3 inputs:

auto to_size_t(int a, int b, int c) {
  cl::size_t<3> s;
  s[0] = a;
  s[1] = b;
  s[2] = c;
  return s;
}

int main() {
  cl::size_t<3> s = to_size_t(1, 2, 3);
  return s[0];
}

@syclik
Copy link
Member

syclik commented Mar 22, 2019

So far, we've been squeezing our exceptions into things that subclass std::exception: https://en.cppreference.com/w/cpp/error/exception

It's been ok, but there's no reason we can't define our own subclass. I haven't followed the content of where this came up from, so I don't have a good suggestion. Hopefully one of them makes sense. If not, create one that's sensible?

@seantalts
Copy link
Member

I'm going to put up a PR changing it to the correct-but-complicated templated version, above. If we have a policy on assert, maybe we can add a test for it as well similar to the math dependencies one?

@seantalts
Copy link
Member

Still weirded out about why it passed on Jenkins before:

So the extra weird is that this passed Jenkins, and this exact same code compiles with every other test that includes the same file except for opencl_context_test.

Rok did you figure out why? The "Full Unit with GPU" tests run on every pull request now, so I don't think that's it...?

@syclik
Copy link
Member

syclik commented Mar 22, 2019

Let me know if you want extra eyes on it. (Just tag me if you need it.)

@rok-cesnovar
Copy link
Member Author

@seantalts unfortunately no.

this doesnt error out on this branch

/usr/local/opt/llvm@6/bin/clang++ -Werror -std=c++1y -Wno-unknown-warning-option -Wno-tautological-compare -Wno-sign-compare    -I lib/opencl_1.2.8  -O3  -I . -I lib/eigen_3.3.3 -I lib/boost_1.69.0 -I lib/sundials_4.1.0/include -I lib/gtest_1.8.1/include -I lib/gtest_1.8.1 -I lib/gtest_1.8.1/include -I lib/gtest_1.8.1      -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -DSTAN_OPENCL -DOPENCL_DEVICE_ID=1 -DOPENCL_PLATFORM_ID=0 -DCL_USE_DEPRECATED_OPENCL_1_2_APIS  -DGTEST_USE_OWN_TR1_TUPLE -DGTEST_USE_OWN_TR1_TUPLE  -c -o test/unit/math/opencl/sub_block_test.o test/unit/math/opencl/sub_block_test.cpp

This failed on Sebastian's PR:

clang++-6.0 -Werror -std=c++1y  -Wno-sign-compare   -I lib/opencl_1.2.8  -O3  -I . -I lib/eigen_3.3.3 -I lib/boost_1.69.0 -I lib/sundials_4.1.0/include -I lib/gtest_1.8.1/include -I lib/gtest_1.8.1 -I lib/gtest_1.8.1/include -I lib/gtest_1.8.1      -DBOOST_RESULT_OF_USE_TR1 -DBOOST_NO_DECLTYPE -DBOOST_DISABLE_ASSERTS -DBOOST_PHOENIX_NO_VARIADIC_EXPRESSION  -DSTAN_OPENCL -DOPENCL_DEVICE_ID=0 -DOPENCL_PLATFORM_ID=0 -DCL_USE_DEPRECATED_OPENCL_1_2_APIS  -DGTEST_USE_OWN_TR1_TUPLE -DGTEST_USE_OWN_TR1_TUPLE  -c -o test/unit/math/opencl/sub_block_test.o test/unit/math/opencl/sub_block_test.cpp

The other difference that in this PR it was on the Always run part 1, on the other PR it was Always run part 2. Not sure if that makes any difference.

@serban-nicusor-toptal serban-nicusor-toptal modified the milestone: 2.19.2 Jul 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants