Skip to content

Commit d145574

Browse files
cassiebeckleyKyleAMathews
authored andcommitted
Add gatsby-transformer-screenshot (#3526)
* Add gatsby-plugin-screenshot * Rename to gatsby-transformer-screenshot * Rename * Fix prepublish error * Expand on documentation * Run format-packages * Use API Gateway * Use official deployed Lambda * Update README.md * Format
1 parent 8261ac3 commit d145574

File tree

15 files changed

+608
-0
lines changed

15 files changed

+608
-0
lines changed
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
/*.js
2+
!index.js
3+
yarn.lock
4+
lambda-package.zip
5+
lambda-dist
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Logs
2+
logs
3+
*.log
4+
5+
# Runtime data
6+
pids
7+
*.pid
8+
*.seed
9+
10+
# Directory for instrumented libs generated by jscoverage/JSCover
11+
lib-cov
12+
13+
# Coverage directory used by tools like istanbul
14+
coverage
15+
16+
# Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files)
17+
.grunt
18+
19+
# node-waf configuration
20+
.lock-wscript
21+
22+
# Compiled binary addons (http://nodejs.org/api/addons.html)
23+
build/Release
24+
25+
# Dependency directory
26+
# https://www.npmjs.org/doc/misc/npm-faq.html#should-i-check-my-node_modules-folder-into-git
27+
node_modules
28+
*.un~
29+
yarn.lock
30+
src
31+
flow-typed
32+
coverage
33+
decls
34+
examples
35+
36+
# Lambda-related
37+
lambda
38+
lambda-dist
39+
chrome
40+
lambda-package.zip
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# gatsby-transformer-screenshot
2+
3+
Plugin for creating screenshots of website URLs using an AWS Lambda
4+
Function. This plugin looks for `SitesYaml` nodes with a `url`
5+
property, and creates `Screenshot` child nodes with an `screenshotFile` field.
6+
7+
[Live demo](https://thatotherperson.github.io/gatsby-screenshot-demo/)
8+
([source](https://github.com/ThatOtherPerson/gatsby-screenshot-demo))
9+
10+
Data should be in a yaml file named `sites.yml` and look like:
11+
12+
```yaml
13+
- url: https://reactjs.org/
14+
name: React
15+
- url: https://about.sourcegraph.com/
16+
name: Sourcegraph
17+
- url: https://simply.co.za/
18+
name: Simply
19+
```
20+
21+
## Install
22+
23+
`npm install gatsby-transformer-screenshot`
24+
25+
## How to use
26+
27+
```javascript
28+
// in your gatsby-config.js
29+
module.exports = {
30+
plugins: [`gatsby-transformer-screenshot`],
31+
};
32+
```
33+
34+
## How to query
35+
36+
You can query for screenshot files as shown below:
37+
38+
```graphql
39+
{
40+
allSitesYaml {
41+
edges {
42+
node {
43+
url
44+
childScreenshot {
45+
screenshotFile {
46+
id
47+
}
48+
}
49+
}
50+
}
51+
}
52+
}
53+
```
54+
55+
screenshotFile is a PNG file like any other loaded from your filesystem, so you can use this plugin in combination with `gatsby-image`.
56+
57+
## Lambda setup
58+
59+
Gatsby provides a hosted screenshot service for you to use; however, you can run the service yourself on AWS Lambda.
60+
61+
AWS Lambda is a "serverless" computing platform that lets you run code in response to events, without needing to set up a server. This plugin uses a Lambda function to take screenshots and store them in an AWS S3 bucket.
62+
63+
First, you will need to (create a S3 bucket)[https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html] for storing screenshots. Once you have done that, create a (Lifecycle Policy)[https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html] for the bucket that sets a number of days before files in the bucket expire. Screenshots will be cached until this date.
64+
65+
To build the Lambda package, run `npm run build-lambda-package` in this directory. A file called `lambda-package.zip` will be generated - upload this as the source of your AWS Lambda. Finally, you will need to set `S3_BUCKET` as an environment variable for the lambda.
66+
67+
To set up the HTTP interface, you will need to use AWS API Gateway. Create a new API, create a new resource under `/`, select "Configure as proxy resource", and leave all the settings with their defaults. Create a method on the new resource, selecting "Lambda Function Proxy" as the integration type, and fill in the details of your lambda.
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# build headless chrome on EC2
2+
# https://github.com/adieuadieu/serverless-chrome/blob/master/chrome/README.md
3+
4+
# sudo su
5+
6+
yum install -y git redhat-lsb python bzip2 tar pkgconfig atk-devel alsa-lib-devel bison binutils brlapi-devel bluez-libs-devel bzip2-devel cairo-devel cups-devel dbus-devel dbus-glib-devel expat-devel fontconfig-devel freetype-devel gcc-c++ GConf2-devel glib2-devel glibc.i686 gperf glib2-devel gtk2-devel gtk3-devel java-1.*.0-openjdk-devel libatomic libcap-devel libffi-devel libgcc.i686 libgnome-keyring-devel libjpeg-devel libstdc++.i686 libX11-devel libXScrnSaver-devel libXtst-devel libxkbcommon-x11-devel ncurses-compat-libs nspr-devel nss-devel pam-devel pango-devel pciutils-devel pulseaudio-libs-devel zlib.i686 httpd mod_ssl php php-cli python-psutil wdiff --enablerepo=epel
7+
8+
cd ~
9+
git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git
10+
echo "export PATH=$PATH:$HOME/depot_tools" >> ~/.bash_profile
11+
source ~/.bash_profile
12+
13+
mkdir Chromium
14+
cd Chromium
15+
fetch --no-history chromium
16+
cd src
17+
18+
# use /tmp instead of /dev/shm
19+
# https://groups.google.com/a/chromium.org/forum/#!msg/headless-dev/qqbZVZ2IwEw/CPInd55OBgAJ
20+
sed -i -e "s/use_dev_shm = true;/use_dev_shm = false;/g" base/files/file_util_posix.cc
21+
22+
mkdir -p out/Headless
23+
echo 'import("//build/args/headless.gn")' > out/Headless/args.gn
24+
echo 'is_debug = false' >> out/Headless/args.gn
25+
echo 'symbol_level = 0' >> out/Headless/args.gn
26+
echo 'is_component_build = false' >> out/Headless/args.gn
27+
echo 'remove_webcore_debug_symbols = true' >> out/Headless/args.gn
28+
echo 'enable_nacl = false' >> out/Headless/args.gn
29+
gn gen out/Headless
30+
ninja -C out/Headless headless_shell
31+
32+
cd out/Headless
33+
tar -zcvf /home/ec2-user/headless_shell.tar.gz headless_shell
34+
35+
# scp [email protected]:~/headless_shell.tar.gz .
Binary file not shown.

packages/gatsby-transformer-screenshot/index.js

Whitespace-only changes.
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
{
2+
"presets": [
3+
["env",
4+
{
5+
"targets": {
6+
"node": "6.10"
7+
}
8+
}]
9+
]
10+
}
Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
const setup = require(`./starter-kit/setup`)
2+
3+
const crypto = require(`crypto`)
4+
5+
const AWS = require(`aws-sdk`)
6+
const s3 = new AWS.S3({
7+
apiVersion: `2006-03-01`,
8+
})
9+
10+
exports.handler = async (event, context, callback) => {
11+
// For keeping the browser launch
12+
context.callbackWaitsForEmptyEventLoop = false
13+
14+
let request = {}
15+
if (event.body) {
16+
request = JSON.parse(event.body)
17+
}
18+
19+
const url = request.url
20+
21+
if (!url) {
22+
callback(null, proxyError(`no url provided`))
23+
return
24+
}
25+
26+
const width = request.width || 1024
27+
const height = request.height || 768
28+
29+
const browser = await setup.getBrowser()
30+
exports
31+
.run(browser, url, width, height)
32+
.then(result => {
33+
callback(null, proxyResponse(result))
34+
})
35+
.catch(err => {
36+
callback(null, proxyError(err))
37+
})
38+
}
39+
40+
exports.run = async (browser, url, width, height) => {
41+
console.log(`Invoked: ${url} (${width}x${height})`)
42+
43+
if (!process.env.S3_BUCKET) {
44+
throw new Error(
45+
`Provide the S3 bucket to use by adding an S3_BUCKET` +
46+
` environment variable to this Lambda's configuration`
47+
)
48+
}
49+
50+
const region = await s3GetBucketLocation(process.env.S3_BUCKET)
51+
52+
if (!region) {
53+
throw new Error(`invalid bucket ${process.env.S3_BUCKET}`)
54+
}
55+
56+
const keyBase = `${url}-(${width},${height})`
57+
const digest = crypto
58+
.createHash(`md5`)
59+
.update(keyBase)
60+
.digest(`hex`)
61+
const key = `${digest}.png`
62+
63+
const screenshotUrl = `https://s3-${region}.amazonaws.com/${
64+
process.env.S3_BUCKET
65+
}/${key}`
66+
67+
const metadata = await s3HeadObject(key)
68+
69+
const now = new Date()
70+
if (metadata) {
71+
if (metadata.Expiration) {
72+
const expires = getDateFromExpiration(metadata.Expiration)
73+
if (now < expires) {
74+
console.log(`Returning cached screenshot`)
75+
return { url: screenshotUrl, expires }
76+
}
77+
} else {
78+
throw new Error(`no expiration date set`)
79+
}
80+
}
81+
82+
console.log(`Taking new screenshot`)
83+
84+
const page = await browser.newPage()
85+
86+
await page.setViewport({ width, height })
87+
await page.goto(url, { waitUntil: [`load`, `networkidle0`] })
88+
89+
const screenshot = await page.screenshot()
90+
const up = await s3PutObject(key, screenshot)
91+
92+
await page.close()
93+
94+
let expires
95+
96+
if (up && up.Expiration) {
97+
expires = getDateFromExpiration(up.Expiration)
98+
}
99+
100+
return { url: screenshotUrl, expires }
101+
}
102+
103+
const proxyResponse = body => {
104+
body.success = true
105+
106+
return {
107+
statusCode: 200,
108+
body: JSON.stringify(body),
109+
}
110+
}
111+
112+
const proxyError = err => {
113+
let msg = err
114+
115+
if (err instanceof Error) {
116+
msg = err.message
117+
}
118+
119+
return {
120+
statusCode: 400,
121+
body: JSON.stringify({
122+
success: false,
123+
error: msg,
124+
}),
125+
}
126+
}
127+
128+
const s3PutObject = async (key, body) => {
129+
const params = {
130+
ACL: `public-read`,
131+
Bucket: process.env.S3_BUCKET,
132+
Key: key,
133+
Body: body,
134+
ContentType: `image/png`,
135+
}
136+
137+
return new Promise((resolve, reject) => {
138+
s3.putObject(params, (err, data) => {
139+
if (err) reject(err)
140+
else resolve(data)
141+
})
142+
})
143+
}
144+
145+
const s3GetBucketLocation = bucket => {
146+
const params = {
147+
Bucket: bucket,
148+
}
149+
150+
return new Promise((resolve, reject) => {
151+
s3.getBucketLocation(params, (err, data) => {
152+
if (err) resolve(null)
153+
else resolve(data.LocationConstraint)
154+
})
155+
})
156+
}
157+
158+
const s3HeadObject = key => {
159+
const params = {
160+
Bucket: process.env.S3_BUCKET,
161+
Key: key,
162+
}
163+
164+
return new Promise((resolve, reject) => {
165+
s3.headObject(params, (err, data) => {
166+
if (err) resolve(null)
167+
else resolve(data)
168+
})
169+
})
170+
}
171+
172+
const expiryPattern = /expiry-date="([^"]*)"/
173+
const getDateFromExpiration = expiration =>
174+
new Date(expiryPattern.exec(expiration)[1])
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
{
2+
"dependencies": {
3+
"puppeteer": "0.10.2",
4+
"tar": "^4.2.0"
5+
},
6+
"devDependencies": {
7+
"aws-sdk": "^2.181.0"
8+
}
9+
}
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
The MIT License (MIT)
2+
3+
Copyright (c) 2017 Taiki Sakamoto
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
6+
7+
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
8+
9+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

0 commit comments

Comments
 (0)