List of used files in the Docker context

Question

List of used files in the Docker context

Let's say I have a repository with several projects structured as follows:

Root ├── bar │  ├── Dockerfile │  └── index.js ├── baz │  ├── Dockerfile │  └── index.js ├── foo │  ├── Dockerfile │  └── index.js └── shared └── utils.js └── shared.js

The Foo , Bar and Baz projects share some libraries in the shared folder. I am currently sending the root folder as context to create these three Docker images to enable the shared folder.

To increase the build time and reduce the deployment time of my Docker images, I need to get the minimum context size sent to these images.

To do this, I plan to create a temporary folder for each image, which will be used as context . The thing is, I need to know which shared files are used for each image.

In this example, its quite simple because there are several shared files and several projects. But actually there are hundreds of shared files and about 20 projects, and I don’t want to check which shared files are used for which projects.

Here is an example of my Dockerfile :

 FROM node:boron RUN mkdir /app WORKDIR /app COPY package.json package.json RUN yarn COPY . . RUN yarn release CMD node release/server.js

And I create a Docker image with:

 docker build -t foo:latest ..

Pay attention to .. pointing to the Root folder. This will cause all shared files to be sent to the context, even those that are not needed.

Is there an easy way to find out which sent context files for Docker they use and which ones not?

+5

docker

Erazihel Jul 16 '17 at 17:48

source share

5 answers

Jouster500 · Answer 1 · 2017-07-19T21:12:43+0000

Before I begin, let me clarify a few misconceptions and define some terminology for new and old users. First, docker images are more or less instant copies of container configurations. Everything from file systems to network configurations is contained in the image and can be used to quickly create new instances (containers) of the specified image.

Containers launch instances of a particular image, and that is where all the magic happens. Dock containers can be thought of as tiny virtual machines, but unlike virtual machines, system resources are unanimously shared and have several other functions that VMs do not have. You can get more information about this in another article.

Creating an image is done either by saving the container ( docker commit *container* *repoTag* ), or by creating from the Dockerfile , which is an automatic assembly instruction, as if you were making changes to the container yourself. It also provides the Transaction end-user with all the commands necessary to run your application.

Reduce build time ... of my Docker images

Correct me if I'm wrong, but it looks like you are trying to create your own image for each new container. Images of dockers are needed only to turn the container around. Yes, it takes some time to create them, especially for dockers, but once they are built, it actually takes a trivial amount of time to deploy the container with the right application that you really need. Again, docker images are the preservation of the states of previous container configurations, and loading the preservation state does not require and should not spend much time , so you really should not worry about dockerfiles build time.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Despite this, work to reduce the Dockerfiles build time and the size of the final container file remains a relevant issue, and resorting to automatic dependency resolution is a common approach. In fact, I asked a similar question almost 2 years ago , so it may have some information that can help in this endeavor. But...

To reduce build time and reduce the deployment time of my Docker images, I need to get the minimum size of the context sent to these images.

To which Taco, the person who answered my previous question would answer

Docker is not going to offer you painless builds. Docker does not know what you want.

Yes, of course, it would be less trouble if Docker knew what you want from get-go, but the fact remains: you need to say exactly what you want, if you strive to create it using the best size and best time . However, there is more than one way to get the best build time and / or build size.

One frankly obvious, as mentioned by Andreas Wederbrand in
this is the same post that you can get application logs from a previous run to check what it does or does not need. Suppose you created one of your project applications, dropping all possible dependencies into it.
You can systematically extract all dependencies, run the application,
check for failures in your logs, add a dependency, check the output
difference. If the output is the same, remove the broken dependency,
otherwise, keep the addiction.

If I wrote this specific command in the docker file, something like this might happen, assuming the container is built from a Linux system:

 #ASSUMING LINUX CONTAINER! ... WORKDIR path/to/place/project RUN mkdir dependencyTemp COPY path/to/project/and/dependencies/ . #Next part is written in pseudo code for the time being RUN move all dependencies to dependencyTemp \ && run app and store state and logs\ && while [$appState != running]; do {\ add dependency to folder && run app and store state and logs \ if [$logsOriginal == $logsNew]; then remove dependency from folder \ else keep dependency && logsOriginal = logsNew fi}

However, this is terribly inefficient as you start and stop your application internally to find the dependencies needed for your application, resulting in an awfully long build time. True, this will somewhat hide the problem of finding dependencies on its own and reduce some size, but it may not work for 100% of the time, and it will probably take less time to find the dependencies needed to run the application, and not to develop the code, to avoid this gap.

Another solution / alternative, although more complex, is to link containers across the network . Network containers have remained a challenge for me, but its simplicity is what you want for this. Say you are deploying 3 containers, of which 2 projects, and the other a dependency container. Through the network, one container can refer to the dependency container and receive all necessary dependencies similar to the current configuration. Unlike yours, however, the dependencies are not in the application, which means that your other applications can be built with minimal size and time.

However, if the dependency container drops, other applications will also drop, which may not lead to a stable system in the long run. In addition, you will have to stop and start each container every time you need to add a new dependency or project.

Finally, if your containers will be stored locally, you can take a look at volumes . Volumes are a great way to install file systems to active containers, so applications in containers can reference files that are not explicitly specified. This translates to a more elegant docker, since all dependencies can legitimately be “shared” without being explicitly included.

With its live mount, you can add dependencies and files to update all of your applications that they need at the same time, as an added bonus. However, volumes do not work very well when you plan to scale your projects outside of your local system and undergo local intervention.

~~~~~~~~~~~~~~~~~~~

The bottom line contains dockers that cannot automatically resolve dependencies for you, and the workarounds for it are too complicated and / or require a lot of time to even remotely consider possible options for your desired solution, since it would be much faster if you figured out and determine the dependencies yourself. If you want to go out and develop a strategy yourself, go straight ahead.

Andreas Wederbrand · Answer 2 · 2017-07-19T07:36:40+0000

The only way to find out if an application uses an image of a specific file inside docker images is to know the application or analyze it from a previous run.

I suggest another way to solve your problem. This will reduce assembly time and image size, but not necessarily deployment time.

You will create a base image for all of your other images that contains shared libraries.

 FROM node:boron COPY shared /shared

AND

 docker build -t erazihel/base:1.0 .

You must create all other images in this image.

 FROM erazihel/base:1.0 RUN mkdir /app WORKDIR /app COPY package.json package.json RUN yarn RUN yarn release CMD node release/server.js

Since docker images are uploaded, the base image will only exist once on each deployment server, and the extra layer that each new docker image uses is very small. Build time should also be reduced since there is no COPY/ADD for shared libraries.

There is no cost to having one large base image, since all of the following images are much smaller. In fact, you are likely to save space.

Robert · Answer 3 · 2017-07-19T12:36:52+0000

What you can do is use inotify . This is a kernel function to sniff what is happening on the fly at the file system level.

There should be something like this:

Use this script inotify.sh (don't forget chmod +x inotify.sh ):

 #!/bin/sh DIRTOMONITOR=/src apk add --update inotify-tools || $(apt-get update && apt-get install -y inotify-tools) inotifywait -mr --timefmt '%H:%M' --format '%T %w %e %f' -e ACCESS $DIRTOMONITOR & " $@ "

Launch the application, for example:

 docker run \ -v $(pwd)/inotify.sh:/inotify.sh \ --entrypoint /inotify.sh \ <your-image> \ node server.js Watches established. 12:34 /src/ ACCESS server.js <--------- Server running at http://localhost:4000

Each read / written file will be displayed as ACCESS .

MrE · Answer 4 · 2017-07-20T19:02:37+0000

Create a basic image with shared files.
Create other images from this base image.

If your “shared” files are not used in the child image, then the file is not included in the shared folder.

By creating a base image with shared files, you can start the build for each image in your own folder / context without the problems you mention.

worker_bee · Answer 5 · 2017-07-19T13:39:39+0000

fuser and lsof can be used in the container to control open files.

lsof does not require a TARGET FILENAME, so this is best for your purpose. To use lsof refer to these examples: linux-lsof-usage

If you encounter problems using lsof , you can solve this problem by setting docker to the complaint mode on the host: sudo aa-complain /etc/apparmor.d/docker

References: How to use fuser and lsof

List of used files in the Docker context

More articles: