Skip to content

Split Travis jobs into multiple stages#396

Closed
whisperity wants to merge 2 commits intoEricsson:masterfrom
whisperity:multistage-CI
Closed

Split Travis jobs into multiple stages#396
whisperity wants to merge 2 commits intoEricsson:masterfrom
whisperity:multistage-CI

Conversation

@whisperity
Copy link
Contributor

@whisperity whisperity commented May 30, 2020

Depends on #393.

Unfortunately, self-compiling ODB with only 2 cores available in the VM of Travis generally results in a job timeout at the 50-minute mark.
Hopefully this patch will change this. It splits the CI process into two steps: first, configuring the dependencies - checking if the dependencies can be built as described in the script. The second stage focuses on re-using the cache of built dependencies and executing only CodeCompass' build.

As per https://docs.travis-ci.com/user/customizing-the-build/#build-timeouts:

Travis CI has specific time limits for each job, and will stop the build and add an error message to the build log in the following situations:

  • When a job produces no log output for 10 minutes.
  • When a job on a public repository takes longer than 50 minutes.

It's easy to see that with the self-compiling ODB in patch #385, the timeouts are hit. It is weird that in some cases Travis isn't enforcing the timeout, like here: https://travis-ci.com/github/Ericsson/CodeCompass/builds/167752949 jobs running for 1:10, but here: https://travis-ci.com/github/Ericsson/CodeCompass/jobs/341696845 the ODB build took up a majority of the time, in contrast with other jobs that if cache is used, take up only 5-10 minutes. Here is another one: https://travis-ci.com/github/Ericsson/CodeCompass/builds/168759981, where the 50-minute marker killed the job, during the "build CodeCompass" phase, because ODB's build took too much time.

Builds (i.e. a full CI run) have no limits, only individual jobs.
With a multi-stage configuration, we have a more reliable way of warming the cache with all self-compiled dependencies (w.r.t the respective requirements and guides - i.e. no ODB on 16.04) already "installed", and thus the "build and test CodeCompass" is its own unique phase, which should use the warm cache to run.

@whisperity whisperity added the Target: Developer environment Developer environment issues consist of CodeCompass or 3rd-party build tooling, configuration or CI. label May 30, 2020
@whisperity whisperity force-pushed the multistage-CI branch 2 times, most recently from 8b73b4c to 15bb9e8 Compare May 30, 2020 15:00
@mcserep
Copy link
Collaborator

mcserep commented May 30, 2020

IMHO this is kind of a misuse of build stages in the concept of CI, however have no better suggestion if we are hitting the time limit. Well, there would be one: don't use the free tier, but pay for it, ... we can drop that I guess 😃

@mcserep
Copy link
Collaborator

mcserep commented May 30, 2020

Another option would be to switch from Travis to another CI provider. Of course if we don't want to hit similar boundaries we either have to pay for it or use a CI solution which can be installed on our servers, e.g. GitLab CI.

However this goes far, so I would only recommend this if cannot solve this with Travis.

@whisperity
Copy link
Contributor Author

You can check one of the recent runs, where ODB had to be compiled for 20.04: 44 minutes. Just for Build2-ODB-Thrift. https://travis-ci.com/github/Ericsson/CodeCompass/builds/168855806

However, even for a cached build, it took a bit more time to obtain the cache and figure out that it is valid on 20.04. https://travis-ci.com/github/Ericsson/CodeCompass/builds/168860585 3 minutes versus 1 minute.

IMHO this is kind of a misuse of build stages in the concept of CI

Why would it be? A fully fledged CI for CodeCompass would be like this, assuming we had tests for each parser and service:

  • obtain dependencies -- this is a necessary part of the "CodeCompass experience" due to lack of packages...
  • build and test CodeCompass core
  • unit test each parser individually and then test the services working for the parsers
  • system test the project as a whole, i.e. all parsers and services on a big enough dummy to see if the individual plugins mess up each other
  • +1: frontend testing (one wishes...)
  • +2: publish deployed image to a central demo location
    • parse large enough demo projects
    • restart the running server on the demo

None of these steps can reasonably take place if the previous one fail.


The easiest solution would be ODB providing upstream packages. Or even if only they gave out DEBs, not necessarily in the upstream repositories, would be a plus.

Problem is: build2 takes an absolutely abhorrent time to download and install, for a build system. And there isn't a package manager version in the official repositories for Build2 either... And ODB, by the looks of it, dropped their Makefile/CMake support...

@whisperity
Copy link
Contributor Author

There could be another way of us hosting the pre-compiled ODB and Thrift as a tarball somewhere, given these are environmental things, not directly related to CodeCompass. And the CodeCompass-specific CI run could just fetch these from the remote... maybe a GitHub repository where changes to the environment are reflected by a separate Travis job updating the release?

 - Boost: Work around removal of a deprecated header in 1.68.0.
 - [gitservice] Decouple the Thrift API type from Libgit2 macros, fixing
   API break in v0.28.
 - Update the user guide.
 - Add Travis job for 20.04 testing.
@mcserep
Copy link
Collaborator

mcserep commented May 30, 2020

@whisperity You wouldn't put an APT install, an NPM dependency install, etc. into a separate stage, although they must precede the building of the actual project. Because we are not building a CI for testing whether the dependencies can be built. And they do not differ so much, e.g. for the node modules, you shall also cache them to boost the CI performance.

You are only separating the ODB compilation into a separate stage, because it takes a hell lot of time to compile it and the free tier of Travis cannot handle it. From the viewpoint of the CI process there is no real reason to separate these stages (build of dependencies and build of project) in my opinion.
But it is not such a great issue, and I would have done the same with the options given, I just haven't realized this issue, as somehow I got lucky in #385 and the timeout did not applied.

Maybe we can look for a PPA to fetch Build2 and also ODB. If there are none (for the beta verison ODB it would be absolutely no surprise), the tarball would be our best shot for Travis (in terms of performance), but then we have to maintain it 😃 (Container images could also work, but Travis does not support custom images unfortunately.)

@whisperity
Copy link
Contributor Author

Yeah and unfortunately it (that is, using Travis' cache) doesn't seem to work on 20.04. The postgres job isn't, and the sqlite job isn't always finding Thrift. Which is a joke, because literally 5 lines above cmake doing an error, the paths are set and both which thrift and thrift --version report correctly... I'm thinking maybe there's an issue with the cache itself, or the cmake?

However, we have three ways of going forward:

  • making the CI an optional thing and accepting that there might be timeouts
  • while hoping that we barely scrape it and manage to fit into the time limits, deciding that we will never increase the test complexity of CodeCompass, i.e. never add new tests that slow the job down even further
  • we start maintaining a binary release of the dependencies, purely for testing purposes, not for the users but for only the CI to use.

The third option seems the most deterministic and safest. And it does not sound like that much stuff to maintain, honestly. We're already maintaining the build_odb "script" in the CI config. All that would happen is this script moving to another repository, and being replaced by a "curl github.com/releases/blabla" in CodeCompass' CI.

If we'd have a roll of binary release we could help the users even further to go "hey, download this binary".
The CI for that other repo could take care of pushing the new binary release to GitHub whenever our dependencies change... Which, for the most part, should not happen often, especially for the self-compiled section.

@filbeofITK
Copy link
Contributor

I have an idea how could we significantly fasten our travis build. Travis has a built in docker feature, it can rely on pre build images while also being able build and push them. We only need the first part though. If we uploaded the development dependency image to Dockerhub it could be pulled from there directly into the Travis build, so there would be no need to build the build2 toolchain and ODB each time. And not just ODB, but all the other dependencies, they would be pulled directly.
This would solve our timeout issue entirely, and would make the maintenance of the CI easier since we wouldn't have to manage dependencies there.

For this to work we would have to have an image for each supported version of Ubuntu, not just 18.04 like we have now. But I believe that this wouldn't be too hard.

@mcserep
Copy link
Collaborator

mcserep commented Jun 1, 2020

That would be great @filbeofITK , but as far as I know Travis does not support custom container images, e.g. like GitLab. Of course you can launch docker inside the Travis VM, but that is another story.
Maybe I am just missing something, can you link the relevant documentation on this?

@filbeofITK
Copy link
Contributor

Sorry, this took longer than it should.

What I meant was that we use the docker service. That way any containers on Docker Hub can be pulled, and used after. If you check the link below you can see there is an actual docker pull there, and not some workaround from Travis's part, so it should work for any public container.
https://docs.travis-ci.com/user/docker/

So I'm quite optimistic about this. One great problem of course is testing it out, the Travis script locally.

@whisperity
Copy link
Contributor Author

Right. There is something really wrong with the caching logic's determinism on Travis. Put simply, caches from a previous stage is not always used properly in later stages of the same build...

I'll come up with a better solution.

@whisperity whisperity closed this Jul 13, 2020
@whisperity whisperity deleted the multistage-CI branch July 14, 2020 14:23
@whisperity whisperity mentioned this pull request Jul 26, 2020
5 tasks
@mcserep mcserep added this to the Release Flash milestone Aug 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Target: Developer environment Developer environment issues consist of CodeCompass or 3rd-party build tooling, configuration or CI.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants