Why are the source code of the kubernets an order of magnitude larger than other participants?

Question

Why are the source code of the kubernets an order of magnitude larger than other participants?

Given other orchestration tools such as dokku , dcos, deis, flynn , docker swarm, etc. Kubernetes is not next to them in terms of lines of code; on average, these tools have about 100 to 200 thousand lines of code.

It is intuitively strange that managing containers, i.e. checking health, scaling containers up and down, killing them, restarting them, etc., does not have to consist of 2.4M + lines of code (which is the scale of the entire code base of the operating system), I feel that there is something else.

What is different from Kubernetes from other orchestration solutions that make it so big?

I have no knowledge of serving more than 5-6 servers. Please explain why it is so large, what functions play a big role in it.

+5

docker containers docker-swarm kubernetes flynn

user3713466 Jan 11 '17 at 8:58

source share

2 answers

In addition to the reasons given by @abronan, the Kubernetes code base contains many duplicates and generated files that artificially increase the size of the code. The actual size of the code that does the “real work” is much smaller.

For example, check out the staging directory . This directory is 500,000 LOC, but it does not have the original code; they are all copied from another location in the Kubernetes repository and reordered. This artificially inflates a common LOC.

There are also things like generating the Swagger API, which are auto-generated files that describe the Kubernetes API in OpenAPI format. Here are some places where I found these files:

Together, these files comprise ~ 116,000 LOC, and all they do is describe the Kubernetes API in OpenAPI format!

And these are just OpenAPI definition files — the total amount of LOC needed to support OpenAPI is probably much higher. For example, I found ~ 12,000 LOC files and ~ 13,000 LOC files that are associated with Swagger / OpenAPI support. I am sure there are many more files associated with this feature.

The fact is that code that performs the actual heavy lift behind the scenes can be a small part of the supporting code that is needed to make Kubernete a convenient and scalable project.

+4

Pixel elephant Jan 11 '17 at 18:17

source share

abronan · Accepted Answer · 2017-01-11T10:52:26+0000

First of all : do not mislead the number of lines in the code, most of them are dependencies in the vendor folder, which does not take into account the main logic (utilities, client libraries, gRPC, etcd, etc.).

Original LoC analysis with cloc

To put things in perspective, for Kubernetes :

 $ cloc kubernetes --exclude-dir=vendor,_vendor,build,examples,docs,Godeps,translations 7072 text files. 6728 unique files. 1710 files ignored. github.com/AlDanial/cloc v 1.70 T=38.72 s (138.7 files/s, 39904.3 lines/s) -------------------------------------------------------------------------------- Language files blank comment code -------------------------------------------------------------------------------- Go 4485 115492 139041 1043546 JSON 94 5 0 118729 HTML 7 509 1 29358 Bourne Shell 322 5887 10884 27492 YAML 244 374 508 10434 JavaScript 17 1550 2271 9910 Markdown 75 1468 0 5111 Protocol Buffers 43 2715 8933 4346 CSS 3 0 5 1402 make 45 346 868 976 Python 11 202 305 958 Bourne Again Shell 13 127 213 655 sed 6 5 41 152 XML 3 0 0 88 Groovy 1 2 0 16 -------------------------------------------------------------------------------- SUM: 5369 128682 163070 1253173 --------------------------------------------------------------------------------

For Docker (and not for Swarm or Swarm mode, as this includes more features, such as volumes, networks, and plugins, which are not included in these repositories). We do not include projects such as Machine, Compose, libnetwork, so in practice the entire docker platform can include much more LoC:

 $ cloc docker --exclude-dir=vendor,_vendor,build,docs 2165 text files. 2144 unique files. 255 files ignored. github.com/AlDanial/cloc v 1.70 T=8.96 s (213.8 files/s, 30254.0 lines/s) ----------------------------------------------------------------------------------- Language files blank comment code ----------------------------------------------------------------------------------- Go 1618 33538 21691 178383 Markdown 148 3167 0 11265 YAML 6 216 117 7851 Bourne Again Shell 66 838 611 5702 Bourne Shell 46 768 612 3795 JSON 10 24 0 1347 PowerShell 2 87 120 292 make 4 60 22 183 C 8 27 12 179 Windows Resource File 3 10 3 32 Windows Message File 1 7 0 32 vim script 2 9 5 18 Assembly 1 0 0 7 ----------------------------------------------------------------------------------- SUM: 1915 38751 23193 209086 -----------------------------------------------------------------------------------

Please note that these are very rough estimates using cloc . This may be worth a deeper analysis.

Roughly speaking, it seems that the project takes into account half of the LoC ( ~ 1250K LoC ) mentioned in the question (regardless of whether you are dependent or not, which is subjective).

What is included in Kubernetes, what makes it so big?

Most bloating comes from libraries supporting various cloud providers to facilitate loading on their platform or to support certain functions (volumes, etc.) through plugins. It also has Lot Examples to reject a row count. A fair assessment of LoC should eliminate a lot of unnecessary documentation and sample directories.

It also has much more functionality compared to Docker Swarm, Nomad or Dokku, to give a few. It supports advanced network scenarios, has a built-in load balancing function, includes PetSets , Cluster Federation , volume plugins or other functions that are not yet supported by other projects.

It supports several container engines , so it does not work only with docker containers, but can run other engines (for example, rkt ).

Most of the core logic involves interacting with other components: key value repositories, client libraries, plugins, etc. that extend far beyond simple scripts.

Distributed systems are known to be complex, and Kubernetes seems to support most of the tools from key players in the container industry without compromise (where other solutions make such a compromise). As a result, the project may look artificially inflated and too large for its main mission (deployment of containers on a scale). In fact, these statistics are not so surprising.

main idea

Comparing Kubernetes with Docker or Dokku does not really work. The project is much larger and includes many more features, as it is not limited to the Docker family of tools.

While Docker has many features scattered across several libraries, Kubernetes has everything under its main repository (which significantly increases the number of lines, but also explains the popularity of the project).

Given this, LoC statistics are not surprising.

Why are the source code of the kubernets an order of magnitude larger than other participants?

Original LoC analysis with cloc

What is included in Kubernetes, what makes it so big?

main idea

More articles: