Skip to content

Conversation

@benstepp
Copy link
Contributor

If you want to see incremental progress I put the time output on each commit. Here is the overall result:

# gatsbyjs/gatsby-starter-blog BEFORE
Command being timed: "gatsby build"
User time (seconds): 31.49
System time (seconds): 2.56
Percent of CPU this job got: 107%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:31.72

# gatsbyjs/gatsby-starter-blog AFTER
Command being timed: "gatsby build"
User time (seconds): 23.78
System time (seconds): 2.80
Percent of CPU this job got: 110%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:24.05
# kyleamathews/blog BEFORE
Command being timed: "gatsby build"
User time (seconds): 38.03
System time (seconds): 3.02
Percent of CPU this job got: 106%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:38.47

# kyleamathews/blog AFTER
Command being timed: "gatsby build"
User time (seconds): 26.70
System time (seconds): 2.79
Percent of CPU this job got: 112%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:26.30

benstepp added 4 commits May 22, 2016 14:26
This allows the caching of babel-loader across rebuilds of the same
site. The user can override this by passing the cacheDirectory param to
their babel config. A falsey value will disable the cacheDirectory.

This change also copies the normalized plugins and presets over the
babelConfig. This allows Gatsby to support additional babel options.

---

Building `gatsby-starer-blog` on a Macbook

Before cache implemented:
```
Command being timed: "gatsby build"
User time (seconds): 31.49
System time (seconds): 2.56
Percent of CPU this job got: 107%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:31.72
```

After cache implemented on rebuild:
```
Command being timed: "gatsby build"
User time (seconds): 26.80
System time (seconds): 2.41
Percent of CPU this job got: 107%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:27.11
```
This gets called 6 times over the course of the build process and hits
the filesystem to glob every time. This change makes it such that the
result of the first glob is cached in memory and returned on subsequent
calls. Sites with more pages will see a greater performance gain with
this change.

---
`gatsby-starter-blog`

Before:
```
Command being timed: "gatsby build"
User time (seconds): 26.80
System time (seconds): 2.41
Percent of CPU this job got: 107%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:27.11
```

After:
```
Command being timed: "gatsby build"
User time (seconds): 25.29
System time (seconds): 2.47
Percent of CPU this job got: 106%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:26.17
```

---
`kyleamathews/blog`

Before:
```
Command being timed: "gatsby build"
User time (seconds): 38.03
System time (seconds): 3.02
Percent of CPU this job got: 106%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:38.47
```

After:
```
Command being timed: "gatsby build"
User time (seconds): 32.10
System time (seconds): 2.57
Percent of CPU this job got: 107%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:32.31
```
This is a performance optimization recommended in gatsbyjs#280

---
`gatsby-starter-blog`

Before:
```
Command being timed: "gatsby build"
User time (seconds): 25.29
System time (seconds): 2.47
Percent of CPU this job got: 106%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:26.17
```

After:
```
Command being timed: "gatsby build"
User time (seconds): 24.29
System time (seconds): 2.83
Percent of CPU this job got: 110%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:24.56
```

---
`kyleamathews/blog`

Before:
```
Command being timed: "gatsby build"
User time (seconds): 32.10
System time (seconds): 2.57
Percent of CPU this job got: 107%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:32.31
```

After:
```
Command being timed: "gatsby build"
User time (seconds): 29.84
System time (seconds): 2.81
Percent of CPU this job got: 110%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:29.59
```
This is a performance optimization recommended in gatsbyjs#280

There are a lot less loaders than general modules, so the performance
gain is minimal, but every second counts.

---
`gatsby-starter-blog`

Before:
```
Command being timed: "gatsby build"
User time (seconds): 24.29
System time (seconds): 2.83
Percent of CPU this job got: 110%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:24.56
```

After:
```
Command being timed: "gatsby build"
User time (seconds): 23.78
System time (seconds): 2.80
Percent of CPU this job got: 110%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:24.05
```

---
`kyleamathews/blog`

Before:
```
Command being timed: "gatsby build"
User time (seconds): 29.84
System time (seconds): 2.81
Percent of CPU this job got: 110%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:29.59

After:
```
Command being timed: "gatsby build"
User time (seconds): 26.70
System time (seconds): 2.79
Percent of CPU this job got: 112%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:26.30
```
@KyleAMathews
Copy link
Contributor

Ummm wow! 👏 👏 👏 I hadn't realized there were such big perf gains sitting around esp for such easy fixes. Awesome! Code looks great but won't have time to merge / create new release until tomorrow probably.

BTW, curious if you've thought about parallelizing the various builds? I think there could be some big perf gains there as the builds are almost 100% CPU-bound.

@benstepp
Copy link
Contributor Author

I tried it and actually it got slower for me. I have a Macbook at home so I think the CoreM processor just doesn't cope well with multi-threaded stuff. I'm going to run a few benchmarks at work to see if the forked processes makes it faster given a real CPU.

@KyleAMathews
Copy link
Contributor

hmmm... odd, CoreM has two cores right? http://www.laptopmag.com/articles/intel-core-m-details

I have an I7 in my Mac Book Pro — is your code public? I'd love to test it out as well.

@benstepp
Copy link
Contributor Author

I just pushed to origin/bs-parrallelbuilds so you should be able to fetch and checkout

A lot of i/o from ProgressPlugin, but you can delete them.

@cusspvz
Copy link

cusspvz commented May 22, 2016

I'm not sure about what I'm about to say, but couldn't this cause some issues regarding file writing?
One of the performance bottlenecks I've spotted on my builds are the consequent image processing (I'm using sharp to resize images on different parts of the gatsby site), so performance is something I have interest to see improved.

Could you both please set me in and tell me what are the things that doesn't allow us to achieve the public resources on an unique build?

@benstepp
Copy link
Contributor Author

I'm not sure about what I'm about to say, but couldn't this cause some issues regarding file writing?

I don't see this being a problem. You typically file lock to prevent writing the same file that is being read by multiple processes, but the webpack output for each stage is explicitly set to different files.

Could you both please set me in and tell me what are the things that doesn't allow us to achieve the public resources on an unique build?

I'm sorry, I don't understand what you are asking.

@KyleAMathews
Copy link
Contributor

KyleAMathews commented May 23, 2016

Ran some quick tests on my blog:

Current master:

real    0m33.742s
user    0m28.331s
sys     0m2.840s

This PR:

real    0m23.164s
user    0m22.353s
sys     0m1.806s

Forking fork:

real    0m25.560s
user    0m31.893s
sys     0m3.229s

So oddly when forking the user time goes up a lot while the real time goes up a little. Don't have a lot of experience with node.js + child processes but this strikes me as odd.

I ran these all on my ~2015 Mac Book Pro

EDITED

@benstepp
Copy link
Contributor Author

The parrallelbuilds is really rough and doesn't take advantage of the memoized glob-pages here which gives a pretty big jump in performance. Especially right now when there is a lot of blocking i/o with fs.readFileSync inside of glob pages, running it more than once is very expensive.

I'd imagine modifying glob pages to request pages from main process could benefit from the memoized result. Something like a if (typeof process.send !== 'undefined) should do it. Then the child could request the cached pages from the parent process. I'd want to test it with a huge site to see if there are any limitations to using process.send as I would imagine that the object piped to the child processes could get pretty big.


One thing I noticed though is that when we separated the build-css from javascript, webpack still needs to go and compile every module that was in your bundle.js. You can see this with the ProgressPlugin counting up the modules in both child processes.

I'm not sure if separating them was the best solution, but parallelizing build-css and build-javascript will probably be faster than not doing so. I'm not too sure if parallelizing will be faster than just doing both at the same time in one compilation though.

@KyleAMathews
Copy link
Contributor

Sounds great 👍 I'll do a quick release tomorrow morning of this existing PR and then if child stuff proves out, release that too. Thanks!

@benstepp
Copy link
Contributor Author

My best time with process.send to use the cached glob pages still doesn't touch this PR by a pretty significant margin on a randomly generated 500 markdown post site.

this pr
        Command being timed: "gatsby build"
        User time (seconds): 23.35
        System time (seconds): 1.45
        Percent of CPU this job got: 104%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:23.72
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2908930048
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 233963
        Voluntary context switches: 17
        Involuntary context switches: 64948
        Swaps: 0
        File system inputs: 0
        File system outputs: 136
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

forked fork
        Command being timed: "gatsby build"
        User time (seconds): 26.77
        System time (seconds): 1.75
        Percent of CPU this job got: 133%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:21.31
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2326249472
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 316868
        Voluntary context switches: 14
        Involuntary context switches: 68012
        Swaps: 0
        File system inputs: 0
        File system outputs: 60
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 2
        Page size (bytes): 4096
        Exit status: 0


forked fork with process.send
       Command being timed: "gatsby build"
        User time (seconds): 26.26
        System time (seconds): 1.74
        Percent of CPU this job got: 133%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:20.95
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2320269312
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 310233
        Voluntary context switches: 15
        Involuntary context switches: 67928
        Swaps: 0
        File system inputs: 0
        File system outputs: 2
        Socket messages sent: 80
        Socket messages received: 80
        Signals delivered: 2
        Page size (bytes): 4096
        Exit status: 0

@KyleAMathews
Copy link
Contributor

@benstepp am I reading that wrong? Both the forked versions win on clock time (which is what we care about) by 2 & 2.5 secs correct?

@benstepp
Copy link
Contributor Author

It looks like you are correct. Elapsed time is the important one in this output.

There are definite advantages to multi processing the build stages, especially with large sites shaving off 20%+ of the total build time. I'll try and clean up the code by the weekend because right now I turned it into a complete mess.

[ben@thebest:gatsby-benchmark] git:(master|✚1 $ make huge
5000 posts generated
Generating CSS
Generating Static HTML

Gatsby is currently using the default template for HTML. You can override
this functionality by creating a React component at "/html.js"

You can see what this default template does by visiting:
https://github.com/gatsbyjs/gatsby/blob/master/lib/isomorphic/html.js

Compiling production bundle.js
Copying assets
Done
        Command being timed: "node --max-old-space-size=8192 node_modules/.bin/gatsby build"
        User time (seconds): 197.43
        System time (seconds): 8.40
        Percent of CPU this job got: 101%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 3:22.93
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 11846746112
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 1686089
        Voluntary context switches: 2138
        Involuntary context switches: 329010
        Swaps: 0
        File system inputs: 108
        File system outputs: 790
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

[ben@thebest:gatsby-benchmark] git:(master|✚1 $ make huge
5000 posts generated
build-javascript 36248
build-css 36247
Generating Static HTML

Gatsby is currently using the default template for HTML. You can override
this functionality by creating a React component at "/html.js"

You can see what this default template does by visiting:
https://github.com/gatsbyjs/gatsby/blob/master/lib/isomorphic/html.js

Copying assets
Done
        Command being timed: "node --max-old-space-size=8192 node_modules/.bin/gatsby build"
        User time (seconds): 181.53
        System time (seconds): 11.76
        Percent of CPU this job got: 122%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 2:37.92
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 11000496128
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 10
        Minor (reclaiming a frame) page faults: 2331751
        Voluntary context switches: 14848
        Involuntary context switches: 369450
        Swaps: 0
        File system inputs: 163
        File system outputs: 948
        Socket messages sent: 772
        Socket messages received: 772
        Signals delivered: 2
        Page size (bytes): 4096
        Exit status: 0

@KyleAMathews KyleAMathews merged commit 6387915 into gatsbyjs:master May 23, 2016
@KyleAMathews
Copy link
Contributor

👍

@KyleAMathews
Copy link
Contributor

Released 0.11.1 with this.

@benstepp benstepp deleted the bs-webpackperf branch May 23, 2016 21:12
ChristopherBiscardi pushed a commit to ChristopherBiscardi/gatsby that referenced this pull request Jun 27, 2019
The latest version of this library doesn't currently work on Windows
and we suspect it has to do with the loaders being handed a path. Using
path.join ensures that proper Windows pathing will be returned if on
a Windows system.

Related gatsbyjs#293
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants