Skip to content

Commit 3d65a1c

Browse files
breaking(gatsby-plugin-sitemap): vNext rewrite (#25670)
Co-authored-by: Ward Peeters <[email protected]>
1 parent 2267632 commit 3d65a1c

File tree

14 files changed

+833
-594
lines changed

14 files changed

+833
-594
lines changed

packages/gatsby-plugin-sitemap/README.md

Lines changed: 163 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -21,89 +21,192 @@ plugins: [`gatsby-plugin-sitemap`]
2121
Above is the minimal configuration required to have it work. By default, the
2222
generated sitemap will include all of your site's pages, except the ones you exclude.
2323

24+
## Recommended usage
25+
26+
You probably do not want to use the defaults in this plugin. Here's an example of the default output:
27+
28+
```xml
29+
<?xml version="1.0" encoding="UTF-8"?>
30+
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
31+
<url>
32+
<loc>https://example.net/blog/</loc>
33+
<changefreq>daily</changefreq>
34+
<priority>0.7</priority>
35+
</url>
36+
<url>
37+
<loc>https://example.net/</loc>
38+
<changefreq>daily</changefreq>
39+
<priority>0.7</priority>
40+
</url>
41+
</urlset>
42+
```
43+
44+
See the `changefreq` and `priority` fields? Those will be the same for every page, no matter how important or how often it gets updated. They will most likely be wrong. But wait, there's more, in their [docs](https://support.google.com/webmasters/answer/183668?hl=en) Google says:
45+
46+
> - Google ignores `<priority>` and `<changefreq>` values, so don't bother adding them.
47+
> - Google reads the `<lastmod>` value, but if you misrepresent this value, we will stop reading it.
48+
49+
You really want to customize this plugin config to include an accurate `lastmod` date. Checkout the [example](#example) for an example of how to do this.
50+
2451
## Options
2552

26-
The `defaultOptions` [here](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby-plugin-sitemap/src/internals.js#L71) can be overridden.
53+
The [`default config`](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby-plugin-sitemap/src/options-validation.js) can be overridden.
2754

2855
The options are as follows:
2956

30-
- `query` (GraphQL Query) The query for the data you need to generate the sitemap. It's required to get the site's URL, if you are not fetching it from `site.siteMetadata.siteUrl`, you will need to set a custom `resolveSiteUrl` function. If you override the query, you probably will also need to set a `serializer` to return the correct data for the sitemap. Due to how this plugin was built it is currently expected/required to fetch the page paths from `allSitePage`, but you may use the `allSitePage.edges.node` or `allSitePage.nodes` query structure.
31-
- `output` (string) The filepath and name. Defaults to `/sitemap.xml`.
32-
- `exclude` (array of strings) An array of paths to exclude from the sitemap.
33-
- `createLinkInHead` (boolean) Whether to populate the `<head>` of your site with a link to the sitemap.
34-
- `serialize` (function) Takes the output of the data query and lets you return an array of sitemap entries.
35-
- `resolveSiteUrl` (function) Takes the output of the data query and lets you return the site URL.
57+
- `output` (string = `/sitemap`) Folder path where sitemaps are stored.
58+
- `createLinkInHead` (boolean = true) Whether to populate the `<head>` of your site with a link to the sitemap.
59+
- `entryLimit` (number = 45000) Number of entries per sitemap file, a sitemap index and multiple sitemaps are created if you have more entries.
60+
- `exclude` (string[] = []) An array of paths to exclude from the sitemap. While this is usually an array of strings it is possible to enter other data types into this array for custom filtering. Doing so will require customization of the [`filterPages`](#filterPages) function.
61+
- `query` (GraphQL Query) The query for the data you need to generate the sitemap. It's required to get the site's URL, if you are not fetching it from `site.siteMetadata.siteUrl`, you will need to set a custom [`resolveSiteUrl`](#resolveSiteUrl) function. If you override the query, you may need to pass in a custom [`resolvePagePath`](#resolvePagePath), [`resolvePages`](#resolvePages) to keep everything working. If you fetch pages without using `allSitePage.nodes` query structure you will definately need to customize the [`resolvePages`](#resolvePages) function.
62+
- [`resolveSiteUrl`](#resolveSiteUrl) (function) Takes the output of the data query and lets you return the site URL. Sync or async functions allowed.
63+
- [`resolvePagePath`](#resolvePagePath) (function) Takes a page object and returns the uri of the page (no domain or protocol).
64+
- [`resolvePages`](#resolvePagePath) (function) Takes the output of the data query and expects an array of page objects to be returned. Sync or async functions allowed.
65+
- [`filterPages`](#filterPages) (function) Takes the current page a string (or other object) from the `exclude` array and expects a boolean to be returned. `true` excludes the path, `false` keeps it.
66+
- [`serialize`](#serialize) (function) Takes the output of `filterPages` and lets you return a sitemap entry. Sync or async functions allowed.
3667

37-
We _ALWAYS_ exclude the following pages: `/dev-404-page`,`/404` &`/offline-plugin-app-shell-fallback`, this cannot be changed.
68+
The following pages are **always** excluded: `/dev-404-page`,`/404` &`/offline-plugin-app-shell-fallback`, this cannot be changed even by customizing the [`filterPages`](#filterPages) function.
3869

39-
Example:
70+
## Example:
4071

4172
```javascript
73+
const siteUrl = process.env.URL || `https://fallback.net`
74+
4275
// In your gatsby-config.js
43-
siteMetadata: {
44-
siteUrl: `https://www.example.com`,
45-
},
46-
plugins: [
47-
{
48-
resolve: `gatsby-plugin-sitemap`,
49-
options: {
50-
output: `/some-other-sitemap.xml`,
51-
// Exclude specific pages or groups of pages using glob parameters
52-
// See: https://github.com/isaacs/minimatch
53-
// The example below will exclude the single `path/to/page` and all routes beginning with `category`
54-
exclude: [`/category/*`, `/path/to/page`],
55-
query: `
76+
module.exports = {
77+
plugins: [
78+
{
79+
resolve: "gatsby-plugin-sitemap",
80+
options: {
81+
query: `
5682
{
57-
wp {
58-
generalSettings {
59-
siteUrl
60-
}
61-
}
62-
6383
allSitePage {
6484
nodes {
6585
path
6686
}
6787
}
68-
}`,
69-
resolveSiteUrl: ({site, allSitePage}) => {
70-
//Alternatively, you may also pass in an environment variable (or any location) at the beginning of your `gatsby-config.js`.
71-
return site.wp.generalSettings.siteUrl
72-
},
73-
serialize: ({ site, allSitePage }) =>
74-
allSitePage.nodes.map(node => {
88+
allWpContentNode(filter: {nodeType: {in: ["Post", "Page"]}}) {
89+
nodes {
90+
... on WpPost {
91+
uri
92+
modifiedGmt
93+
}
94+
... on WpPage {
95+
uri
96+
modifiedGmt
97+
}
98+
}
99+
}
100+
}
101+
`,
102+
resolveSiteUrl: () => siteUrl,
103+
resolvePages: ({
104+
allSitePage: { nodes: allPages },
105+
allWpContentNode: { nodes: allWpNodes },
106+
}) => {
107+
const wpNodeMap = allWpNodes.reduce((acc, node) => {
108+
const { uri } = node
109+
acc[uri] = node
110+
111+
return acc
112+
}, {})
113+
114+
return allPages.map(page => {
115+
return { ...page, ...wpNodeMap[page.path] }
116+
})
117+
},
118+
serialize: ({ path, modifiedGmt }) => {
75119
return {
76-
url: `${site.wp.generalSettings.siteUrl}${node.path}`,
77-
changefreq: `daily`,
78-
priority: 0.7,
120+
url: path,
121+
lastmod: modifiedGmt,
79122
}
80-
})
81-
}
82-
}
83-
]
123+
},
124+
},
125+
},
126+
],
127+
}
84128
```
85129

86-
## Sitemap Index
130+
## API Reference
131+
132+
<a id=resolveSiteUrl></a>
133+
134+
## resolveSiteUrl ⇒ <code>string</code>
135+
136+
Sync or async functions allowed.
137+
138+
**Returns**: <code>string</code> - - site URL, this can come from the graphql query or another scope.
139+
140+
| Param | Type | Description |
141+
| ----- | ------------------- | ---------------------------- |
142+
| data | <code>object</code> | Results of the GraphQL query |
143+
144+
<a id=resolvePagePath></a>
145+
146+
## resolvePagePath ⇒ <code>string</code>
147+
148+
If you don't want to place the URI in `path` then `resolvePagePath`
149+
is needed.
87150

88-
We also support generating `sitemap index`.
151+
**Returns**: <code>string</code> - - uri of the page without domain or protocol
89152

90-
- [Split up your large sitemaps](https://support.google.com/webmasters/answer/75712?hl=en)
91-
- [Using Sitemap index files (to group multiple sitemap files)](https://www.sitemaps.org/protocol.html#index)
153+
| Param | Type | Description |
154+
| ----- | ------------------- | ------------------- |
155+
| page | <code>object</code> | <code>string</code> | Array Item returned from resolvePages |
156+
157+
<a id=resolvePages></a>
158+
159+
## resolvePages ⇒ <code>Array</code>
160+
161+
This allows custom resolution of the array of pages.
162+
This also where users could merge multiple sources into
163+
a single array if needed. Sync or async functions allowed.
164+
165+
**Returns**: <code>object[]</code> - - Array of objects representing each page
166+
167+
| Param | Type | Description |
168+
| ----- | ------------------- | ---------------------------- |
169+
| data | <code>object</code> | results of the GraphQL query |
170+
171+
<a id="filterPages"></a>
172+
173+
## filterPages ⇒ <code>boolean</code>
174+
175+
This allows filtering any data in any way.
176+
177+
This function is executed via:
92178

93179
```javascript
94-
// In your gatsby-config.js
95-
siteMetadata: {
96-
siteUrl: `https://www.example.com`,
97-
},
98-
plugins: [
99-
{
100-
resolve: `gatsby-plugin-sitemap`,
101-
options: {
102-
sitemapSize: 5000
103-
}
104-
}
105-
]
180+
allPages.filter(
181+
page => !excludes.some(excludedRoute => thisFunc(page, ecludedRoute, tools))
182+
)
183+
```
184+
185+
`allPages` is the results of the [`resolvePages`](#resolvePages) function.
186+
187+
**Returns**: <code>Boolean</code> - - `true` excludes the path, `false` keeps it.
188+
189+
| Param | Type | Description |
190+
| ------------- | ------------------- | ----------------------------------------------------------------------------------- |
191+
| page | <code>object</code> | |
192+
| excludedRoute | <code>string</code> | Element from `exclude` Array in plugin config. |
193+
| tools | <code>object</code> | contains tools for filtering `{ minimatch, withoutTrailingSlash, resolvePagePath }` |
194+
195+
<a id="serialize"></a>
196+
197+
## serialize ⇒ <code>object</code>
198+
199+
This function is executed by:
200+
201+
```javascript
202+
allPages.map(page => thisFunc(page, tools))
106203
```
107204

108-
Above is the minimal configuration to split a large sitemap.
109-
When the number of URLs in a sitemap is more than 5000, the plugin will create sitemap (e.g. `sitemap-0.xml`, `sitemap-1.xml`) and index (e.g. `sitemap.xml`) files.
205+
`allpages` is the result of the [`filterPages`](#filterPages) function. Sync or async functions allowed.
206+
207+
**Kind**: global variable
208+
209+
| Param | Type | Description |
210+
| ----- | ------------------- | ---------------------------------------------------------------- |
211+
| page | <code>object</code> | A single element from the results of the `resolvePages` function |
212+
| tools | <code>object</code> | contains tools for serializing `{ resolvePagePath }` |

packages/gatsby-plugin-sitemap/package.json

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,14 @@
1010
"@babel/runtime": "^7.12.5",
1111
"common-tags": "^1.8.0",
1212
"minimatch": "^3.0.4",
13-
"pify": "^3.0.0",
14-
"sitemap": "^1.13.0"
13+
"sitemap": "^6.3.0"
1514
},
1615
"devDependencies": {
1716
"@babel/cli": "^7.12.1",
1817
"@babel/core": "^7.12.3",
1918
"babel-preset-gatsby-package": "^1.4.0-next.0",
20-
"cross-env": "^7.0.3"
19+
"cross-env": "^7.0.3",
20+
"gatsby-plugin-utils": "1.4.0-next.0"
2121
},
2222
"homepage": "https://github.com/gatsbyjs/gatsby/tree/master/packages/gatsby-plugin-sitemap#readme",
2323
"keywords": [
@@ -39,7 +39,9 @@
3939
"scripts": {
4040
"build": "babel src --out-dir . --ignore \"**/__tests__\"",
4141
"prepare": "cross-env NODE_ENV=production npm run build",
42-
"watch": "babel -w src --out-dir . --ignore \"**/__tests__\""
42+
"watch": "babel -w src --out-dir . --ignore \"**/__tests__\"",
43+
"test": "jest",
44+
"test:watch": "jest --watch"
4345
},
4446
"engines": {
4547
"node": ">=12.13.0"

packages/gatsby-plugin-sitemap/src/__tests__/__snapshots__/gatsby-node.js.snap

Lines changed: 21 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,26 @@
11
// Jest Snapshot v1, https://goo.gl/fbAQLP
22

3-
exports[`Test plugin sitemap custom query runs 1`] = `
4-
"<?xml version=\\"1.0\\" encoding=\\"UTF-8\\"?>
5-
<urlset xmlns=\\"http://www.sitemaps.org/schemas/sitemap/0.9\\" xmlns:news=\\"http://www.google.com/schemas/sitemap-news/0.9\\" xmlns:xhtml=\\"http://www.w3.org/1999/xhtml\\" xmlns:mobile=\\"http://www.google.com/schemas/sitemap-mobile/1.0\\" xmlns:image=\\"http://www.google.com/schemas/sitemap-image/1.1\\" xmlns:video=\\"http://www.google.com/schemas/sitemap-video/1.1\\">
6-
<url> <loc>http://dummy.url/post/page-1</loc> <changefreq>weekly</changefreq> <priority>0.8</priority> </url>
7-
</urlset>"
3+
exports[`gatsby-plugin-sitemap Node API should accept a custom query 1`] = `
4+
Array [
5+
Object {
6+
"changefreq": "weekly",
7+
"priority": 0.8,
8+
"url": "http://dummy.url/page-1",
9+
},
10+
]
811
`;
912

10-
exports[`Test plugin sitemap default settings work properly 1`] = `
11-
"<?xml version=\\"1.0\\" encoding=\\"UTF-8\\"?>
12-
<urlset xmlns=\\"http://www.sitemaps.org/schemas/sitemap/0.9\\" xmlns:news=\\"http://www.google.com/schemas/sitemap-news/0.9\\" xmlns:xhtml=\\"http://www.w3.org/1999/xhtml\\" xmlns:mobile=\\"http://www.google.com/schemas/sitemap-mobile/1.0\\" xmlns:image=\\"http://www.google.com/schemas/sitemap-image/1.1\\" xmlns:video=\\"http://www.google.com/schemas/sitemap-video/1.1\\">
13-
<url> <loc>http://dummy.url/page-1</loc> <changefreq>daily</changefreq> <priority>0.7</priority> </url>
14-
<url> <loc>http://dummy.url/page-2</loc> <changefreq>daily</changefreq> <priority>0.7</priority> </url>
15-
</urlset>"
13+
exports[`gatsby-plugin-sitemap Node API should succeed with default options 1`] = `
14+
Array [
15+
Object {
16+
"changefreq": "daily",
17+
"priority": 0.7,
18+
"url": "http://dummy.url/page-1",
19+
},
20+
Object {
21+
"changefreq": "daily",
22+
"priority": 0.7,
23+
"url": "http://dummy.url/page-2",
24+
},
25+
]
1626
`;
17-
18-
exports[`Test plugin sitemap sitemap index set sitemap size and urls are less than it. 1`] = `
19-
"<?xml version=\\"1.0\\" encoding=\\"UTF-8\\"?>
20-
<urlset xmlns=\\"http://www.sitemaps.org/schemas/sitemap/0.9\\" xmlns:news=\\"http://www.google.com/schemas/sitemap-news/0.9\\" xmlns:xhtml=\\"http://www.w3.org/1999/xhtml\\" xmlns:mobile=\\"http://www.google.com/schemas/sitemap-mobile/1.0\\" xmlns:image=\\"http://www.google.com/schemas/sitemap-image/1.1\\" xmlns:video=\\"http://www.google.com/schemas/sitemap-video/1.1\\">
21-
<url> <loc>http://dummy.url/page-1</loc> <changefreq>daily</changefreq> <priority>0.7</priority> </url>
22-
<url> <loc>http://dummy.url/page-2</loc> <changefreq>daily</changefreq> <priority>0.7</priority> </url>
23-
</urlset>"
24-
`;
25-

packages/gatsby-plugin-sitemap/src/__tests__/__snapshots__/gatsby-ssr.js.snap

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
// Jest Snapshot v1, https://goo.gl/fbAQLP
22

3-
exports[`Adds <Link> for site to head creates Link href with path prefix when __PATH_PREFIX__ sets 1`] = `
3+
exports[`gatsby-plugin-sitemap SSR API creates Link href with path prefix when __PATH_PREFIX__ sets 1`] = `
44
[MockFunction] {
55
"calls": Array [
66
Array [
77
Array [
88
<link
9-
href="/hogwarts/sitemap.xml"
9+
href="/hogwarts/test-folder/sitemap-index.xml"
1010
rel="sitemap"
1111
type="application/xml"
1212
/>,
@@ -22,13 +22,13 @@ exports[`Adds <Link> for site to head creates Link href with path prefix when __
2222
}
2323
`;
2424

25-
exports[`Adds <Link> for site to head creates Link if createLinkInHead is true 1`] = `
25+
exports[`gatsby-plugin-sitemap SSR API should create a Link if createLinkInHead is true 1`] = `
2626
[MockFunction] {
2727
"calls": Array [
2828
Array [
2929
Array [
3030
<link
31-
href="/sitemap.xml"
31+
href="/test-folder/sitemap-index.xml"
3232
rel="sitemap"
3333
type="application/xml"
3434
/>,
@@ -44,4 +44,4 @@ exports[`Adds <Link> for site to head creates Link if createLinkInHead is true 1
4444
}
4545
`;
4646

47-
exports[`Adds <Link> for site to head does not create Link if createLinkInHead is false 1`] = `[MockFunction]`;
47+
exports[`gatsby-plugin-sitemap SSR API should not create Link if createLinkInHead is false 1`] = `[MockFunction]`;
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
// Jest Snapshot v1, https://goo.gl/fbAQLP
2+
3+
exports[`gatsby-plugin-sitemap internals tests pageFilter should filter correctly 1`] = `
4+
Array [
5+
Object {
6+
"path": "/to/keep/1",
7+
},
8+
Object {
9+
"path": "/to/keep/2",
10+
},
11+
]
12+
`;

0 commit comments

Comments
 (0)