Split Travis jobs into multiple stages#396
Split Travis jobs into multiple stages#396whisperity wants to merge 2 commits intoEricsson:masterfrom
Conversation
8b73b4c to
15bb9e8
Compare
|
IMHO this is kind of a misuse of build stages in the concept of CI, however have no better suggestion if we are hitting the time limit. Well, there would be one: don't use the free tier, but pay for it, ... we can drop that I guess 😃 |
|
Another option would be to switch from Travis to another CI provider. Of course if we don't want to hit similar boundaries we either have to pay for it or use a CI solution which can be installed on our servers, e.g. GitLab CI. However this goes far, so I would only recommend this if cannot solve this with Travis. |
|
You can check one of the recent runs, where ODB had to be compiled for 20.04: 44 minutes. Just for Build2-ODB-Thrift. https://travis-ci.com/github/Ericsson/CodeCompass/builds/168855806 However, even for a cached build, it took a bit more time to obtain the cache and figure out that it is valid on 20.04. https://travis-ci.com/github/Ericsson/CodeCompass/builds/168860585 3 minutes versus 1 minute.
Why would it be? A fully fledged CI for CodeCompass would be like this, assuming we had tests for each parser and service:
None of these steps can reasonably take place if the previous one fail. The easiest solution would be ODB providing upstream packages. Or even if only they gave out DEBs, not necessarily in the upstream repositories, would be a plus. Problem is: build2 takes an absolutely abhorrent time to download and install, for a build system. And there isn't a package manager version in the official repositories for Build2 either... And ODB, by the looks of it, dropped their Makefile/CMake support... |
|
There could be another way of us hosting the pre-compiled ODB and Thrift as a tarball somewhere, given these are environmental things, not directly related to CodeCompass. And the CodeCompass-specific CI run could just fetch these from the remote... maybe a GitHub repository where changes to the environment are reflected by a separate Travis job updating the release? |
- Boost: Work around removal of a deprecated header in 1.68.0. - [gitservice] Decouple the Thrift API type from Libgit2 macros, fixing API break in v0.28. - Update the user guide. - Add Travis job for 20.04 testing.
|
@whisperity You wouldn't put an APT install, an NPM dependency install, etc. into a separate stage, although they must precede the building of the actual project. Because we are not building a CI for testing whether the dependencies can be built. And they do not differ so much, e.g. for the node modules, you shall also cache them to boost the CI performance. You are only separating the ODB compilation into a separate stage, because it takes a hell lot of time to compile it and the free tier of Travis cannot handle it. From the viewpoint of the CI process there is no real reason to separate these stages (build of dependencies and build of project) in my opinion. Maybe we can look for a PPA to fetch Build2 and also ODB. If there are none (for the beta verison ODB it would be absolutely no surprise), the tarball would be our best shot for Travis (in terms of performance), but then we have to maintain it 😃 (Container images could also work, but Travis does not support custom images unfortunately.) |
|
Yeah and unfortunately it (that is, using Travis' cache) doesn't seem to work on 20.04. The postgres job isn't, and the sqlite job isn't always finding Thrift. Which is a joke, because literally 5 lines above cmake doing an error, the paths are set and both However, we have three ways of going forward:
The third option seems the most deterministic and safest. And it does not sound like that much stuff to maintain, honestly. We're already maintaining the If we'd have a roll of binary release we could help the users even further to go "hey, download this binary". |
|
I have an idea how could we significantly fasten our travis build. Travis has a built in docker feature, it can rely on pre build images while also being able build and push them. We only need the first part though. If we uploaded the development dependency image to Dockerhub it could be pulled from there directly into the Travis build, so there would be no need to build the build2 toolchain and ODB each time. And not just ODB, but all the other dependencies, they would be pulled directly. For this to work we would have to have an image for each supported version of Ubuntu, not just 18.04 like we have now. But I believe that this wouldn't be too hard. |
|
That would be great @filbeofITK , but as far as I know Travis does not support custom container images, e.g. like GitLab. Of course you can launch docker inside the Travis VM, but that is another story. |
|
Sorry, this took longer than it should. What I meant was that we use the docker service. That way any containers on Docker Hub can be pulled, and used after. If you check the link below you can see there is an actual docker pull there, and not some workaround from Travis's part, so it should work for any public container. So I'm quite optimistic about this. One great problem of course is testing it out, the Travis script locally. |
|
Right. There is something really wrong with the caching logic's determinism on Travis. Put simply, caches from a previous stage is not always used properly in later stages of the same build... I'll come up with a better solution. |
Unfortunately, self-compiling ODB with only 2 cores available in the VM of Travis generally results in a job timeout at the 50-minute mark.
Hopefully this patch will change this. It splits the CI process into two steps: first, configuring the dependencies - checking if the dependencies can be built as described in the script. The second stage focuses on re-using the cache of built dependencies and executing only CodeCompass' build.
As per https://docs.travis-ci.com/user/customizing-the-build/#build-timeouts:
It's easy to see that with the self-compiling ODB in patch #385, the timeouts are hit. It is weird that in some cases Travis isn't enforcing the timeout, like here: https://travis-ci.com/github/Ericsson/CodeCompass/builds/167752949 jobs running for 1:10, but here: https://travis-ci.com/github/Ericsson/CodeCompass/jobs/341696845 the ODB build took up a majority of the time, in contrast with other jobs that if cache is used, take up only 5-10 minutes. Here is another one: https://travis-ci.com/github/Ericsson/CodeCompass/builds/168759981, where the 50-minute marker killed the job, during the "build CodeCompass" phase, because ODB's build took too much time.
Builds (i.e. a full CI run) have no limits, only individual jobs.
With a multi-stage configuration, we have a more reliable way of warming the cache with all self-compiled dependencies (w.r.t the respective requirements and guides - i.e. no ODB on 16.04) already "installed", and thus the "build and test CodeCompass" is its own unique phase, which should use the warm cache to run.