Regeneration based on changes by static site generators

Question

Regeneration based on changes by static site generators

It seems that all the creators of the static site that I found completely regenerate the whole site every time a file on the site changes.

For example, one of the most popular site generators is Jekyll, which runs Github Pages. Each time the author makes changes (say, correcting the grammar in the message file or changing the layout of about.html) and needs to regenerate this content, Jekyll gives no other choice for regenerating the entire site, even if there are hundreds of files whose output has not changed recent changes.

The time taken to regenerate large sites seems to be a common complaint against most static site generators.

Is there any technical reason (from the POV design or the development of static site generators) that prevents someone from writing a static site generator that is "reasonable" with respect to its contents and can be self-confident, to the extent that it can understand which files were changed and which files depend on it (or vice versa), and will only restore the necessary files?

Since most users (especially Jekyll / GH Pages) store their sites in the git repository, it seems that the site generator can use commit information and track changes and rely on this information to find out which files need to be regenerated and which can be left alone. Thoughts?

+4

code-generation website github-pages jekyll static-site

chsm Jul 12 '13 at 4:05

source share

1 answer

bobthecow · Accepted Answer · 2013-07-13T04:09:20+0000

Short answer: hard.

The hard part does not know which files have been modified. The tough part is knowing what output files are affected by files that have changed. For example, if you change the title of a blog post, you will need to update the main blog index. So will the tags. Thus, any page will be displayed in which another entry will be displayed as a “related entry”. If you have excerpts on your home page, then the same deal.

But this cannot be dealt with. You can save a directed acyclic graph that tracks dependencies for any page, and restore pages that include bits of other pages that change. This adds to the complexity and complexity of the code, as well as the computation time, but doing this is likely to be worth the effort.

However, the harder it is, the better it is to know which pages need to be restored as a result of changes to elements that they are not yet associated with. What happens if you add a new tag to your blog post? Now you need to refresh the tag page for this new tag. If you use tags to create “related messages”, all messages on your site should be restored, as the “best” relationship for any given message may be different. What happens when you add a new post? To avoid unnecessary compilation, the static site generator should know which pages would include this post if it were around, and also restore them.

Please note that in all these cases, false positives (pages that have not been changed but still recompiled) are permissible, but false negatives (pages that should be recompiled but are not) are completely unacceptable. Therefore, in each case, the site generator should be mistaken with caution: if any possibility of changing the page was compiled again, it must be recompiled.

Nanoc, for example, tracks changes as you mentioned. It maintains an oriented acyclic graph of pages that depend on other pages, and it caches it between compilers to limit the number of recompilations. It does not regenerate every page every time, but often recompiles some pages that do not need to be compiled. There is still much room for improvement.

Regeneration based on changes by static site generators

More articles: