Skip to content

Commit 4501bdb

Browse files
committed
Normalize unicode internally using NFD
Previously, the path reservation system, which defends against unicode path name collisions (the subject of a handful of past CVE issues), was using NFKD normalization internally to determine of two paths would be likely to reference the same file on disk. This has the weird effect of normalizing things like `℀` into simple decomposed character strings, for example `a/c`. These can contain slashes and double-dot sections, which means that the path reservations may end up reserving more (or different) paths than intended. Thankfully, tar was already *extracting* properly, even if the path reservations collided, and these collisions resulted in tar being *more* aggressive than it should be in restricting parallel extraction, rather than less. That's a good direction to err in, for security, but also, made tar less efficient than it could be in some edge cases. Using NFD normalization, unicode characters are not decomposed in compatibility mode, but still result in matching path reservation keys as intended. This does not cause any change in observed behavior, other than allowing some files to be extracted in parallel where it is provably safe to do so. Credit: discovered by @Sim4n6. This did not result in a juicy security vulnerability, but it sure looked like one at first. They were extremely patient, thorough, and persistent in trying to pin this down to a POC and CVE. There is very little reward or visibility when a security researcher finds a bug that doesn't result in a security disclosure, but the attempt often results in improvements to the project.
1 parent 24efc74 commit 4501bdb

File tree

5 files changed

+60
-3
lines changed

5 files changed

+60
-3
lines changed

lib/normalize-unicode.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ const normalizeCache = Object.create(null)
66
const { hasOwnProperty } = Object.prototype
77
module.exports = s => {
88
if (!hasOwnProperty.call(normalizeCache, s)) {
9-
normalizeCache[s] = s.normalize('NFKD')
9+
normalizeCache[s] = s.normalize('NFD')
1010
}
1111
return normalizeCache[s]
1212
}

lib/path-reservations.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,7 @@ module.exports = () => {
123123
// effectively removing all parallelization on windows.
124124
paths = isWindows ? ['win32 parallelization disabled'] : paths.map(p => {
125125
// don't need normPath, because we skip this entirely for windows
126-
return normalize(stripSlashes(join(p))).toLowerCase()
126+
return stripSlashes(join(normalize(p))).toLowerCase()
127127
})
128128

129129
const dirs = new Set(

lib/unpack.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ const uint32 = (a, b, c) =>
105105
// Note that on windows, we always drop the entire cache whenever a
106106
// symbolic link is encountered, because 8.3 filenames are impossible
107107
// to reason about, and collisions are hazards rather than just failures.
108-
const cacheKeyNormalize = path => normalize(stripSlash(normPath(path)))
108+
const cacheKeyNormalize = path => stripSlash(normPath(normalize(path)))
109109
.toLowerCase()
110110

111111
const pruneCache = (cache, abs) => {
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
/* IMPORTANT
2+
* This snapshot file is auto-generated, but designed for humans.
3+
* It should be checked into source control and tracked carefully.
4+
* Re-generate by setting TAP_SNAPSHOT=1 and running tests.
5+
* Make sure to inspect the output below. Do not ignore changes!
6+
*/
7+
'use strict'
8+
exports[`test/normalize-unicode.js TAP normalize with strip slashes "1/4foo.txt" > normalized 1`] = `
9+
1/4foo.txt
10+
`
11+
12+
exports[`test/normalize-unicode.js TAP normalize with strip slashes "\\\\a\\\\b\\\\c\\\\d\\\\" > normalized 1`] = `
13+
/a/b/c/d
14+
`
15+
16+
exports[`test/normalize-unicode.js TAP normalize with strip slashes "¼foo.txt" > normalized 1`] = `
17+
¼foo.txt
18+
`
19+
20+
exports[`test/normalize-unicode.js TAP normalize with strip slashes "﹨aaaa﹨dddd﹨" > normalized 1`] = `
21+
﹨aaaa﹨dddd﹨
22+
`
23+
24+
exports[`test/normalize-unicode.js TAP normalize with strip slashes "\bbb\eee\" > normalized 1`] = `
25+
\bbb\eee\
26+
`
27+
28+
exports[`test/normalize-unicode.js TAP normalize with strip slashes "\\\\\eee\\\\\\" > normalized 1`] = `
29+
\\\\\eee\\\\\\
30+
`

test/normalize-unicode.js

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
1+
process.env.TESTING_TAR_FAKE_PLATFORM = 'win32'
12
const t = require('tap')
23
const normalize = require('../lib/normalize-unicode.js')
4+
const stripSlash = require('../lib/strip-trailing-slashes.js')
5+
const normPath = require('../lib/normalize-windows-path.js')
36

47
// café
58
const cafe1 = Buffer.from([0x63, 0x61, 0x66, 0xc3, 0xa9]).toString()
@@ -10,3 +13,27 @@ const cafe2 = Buffer.from([0x63, 0x61, 0x66, 0x65, 0xcc, 0x81]).toString()
1013
t.equal(normalize(cafe1), normalize(cafe2), 'matching unicodes')
1114
t.equal(normalize(cafe1), normalize(cafe2), 'cached')
1215
t.equal(normalize('foo'), 'foo', 'non-unicode string')
16+
17+
t.test('normalize with strip slashes', t => {
18+
const paths = [
19+
'\\a\\b\\c\\d\\',
20+
'﹨aaaa﹨dddd﹨',
21+
'\bbb\eee\',
22+
'\\\\\eee\\\\\\',
23+
'¼foo.txt',
24+
'1/4foo.txt',
25+
]
26+
27+
t.plan(paths.length)
28+
29+
for (const path of paths) {
30+
t.test(JSON.stringify(path), t => {
31+
const a = normalize(stripSlash(normPath(path)))
32+
const b = stripSlash(normPath(normalize(path)))
33+
t.matchSnapshot(a, 'normalized')
34+
t.equal(a, b, 'order should not matter')
35+
t.end()
36+
})
37+
}
38+
t.end()
39+
})

0 commit comments

Comments
 (0)