The sbt build task is slow after adding some dependencies

I'm a little new to deploy to scala and I configured the sbt-assembly plugin, everything worked fine.

A few days ago I added hasoop, a spark and some other dependencies, then the assembly task became extremely slow (from 8 to 10 minutes), and before that it was <30s. Most of the time is used to create the collector (it takes a few seconds for the banner size to increase by 1 MB).

I noticed that there are many merge conflicts that are resolved by the first strategy. Does it affect build speed?

I played with the -Xmx option for sbt (add -Xmx4096m), but that doesn't help.

I am using sbt 12.4 and sbt-assembly . Any suggestions or pointers to optimize this task?

+6
source share
1 answer

So 0__ comment right on:

You read the Readme . It specifically states that you can change the cacheUnzip and cacheOutput . I would try.

cacheUnzip is an optimization function, but cacheOutput not. The purpose of cacheOutput is that you get an identical jar when your source has not changed. For some people, it is important that the weekend tanks do not change unnecessarily. The caveat is that it checks the SHA-1 hash of all * .class files. Therefore, readme says:

If there are a large number of class files, this can take a long time.

From what I can tell, unzipping and applying the merge strategy merges in about a minute or two, but the SHA-1 check seems to be forever. Here is assembly.sbt , which disables the output cache:

 import AssemblyKeys._ // put this at the top of the file assemblySettings mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) => { case PathList("javax", "servlet", xs @ _*) => MergeStrategy.first case PathList("org", "apache", "commons", xs @ _*) => MergeStrategy.first // commons-beanutils-core-1.8.0.jar vs commons-beanutils-1.7.0.jar case PathList("com", "esotericsoftware", "minlog", xs @ _*) => MergeStrategy.first // kryo-2.21.jar vs minlog-1.2.jar case "about.html" => MergeStrategy.rename case x => old(x) } } assemblyCacheOutput in assembly := false 

Assembly completed 58 seconds after cleaning, and the second run without cleaning took 15 seconds. Although some of the runs also took 200 + sec.

Looking at the source, I could probably optimize cacheOutput , but for now, turning it off should make the build much faster.

Edit :

I added # 96 Performance degradation when adding library dependencies based on this question and added some fixes in sbt-assembly 0.10.1 for sbt 0.13.

sbt-assembly 0.10.1 avoids hashing the contents of unpacked byte elements of dependent libraries. It also skips the jar caching performed by sbt, since the sbt assembly already caches the output.

Changes make the build task more consistent. Using deps-heavy spark as a sample project, the build task was performed 15 times after a small change in source. sbt-assembly 0.10.0 took 19 +/- 157 seconds (mostly within 20 seconds, but for 150 seconds - 26% of the runs). On the other hand, sbt-assembly 0.10.1 took 16 +/- 1 seconds.

+6
source

Source: https://habr.com/ru/post/956563/


All Articles