Compare optimal g on a per-length-l basis. #158

rianhunter · 2016-06-22T18:50:44Z

Before this change, most_guessable_match_sequence() could return a
different result for the same set of matches depending on their
order.

For example, consider two terminating matches for the k prefix,
match A: MA with l = LA and g = GA, and match B: MB with l = LB and g = GB,
with GA < GB and LA != LB.

Before this change, if update(MA, LA) was called before update(MB,
LB), then optimal.m[k][LB] will not contain MB. If update(MB, LB) was
called before update(MA, LA) then optimal.m[k][LB] will contain MB.
In both cases optimal.m[k][LA] contains MA.

This affects computation of bruteforce_update(k + 1), where we iterate
through optimal.m[k] and update the optimal state based on the
contents of optimal.m[k].

This change modifies update() so that it decides whether or not to
store the match in optimal.m[k][l] by only comparing g against other
optimal gs for the same length. Thus, the state of optimal.m[k][l]
is not dependent on the order of the input.

Before this change, most_guessable_match_sequence() could return a different result for the same set of matches depending on their order. For example, consider two terminating matches for the k prefix, match A: MA with l = LA and g = GA, and match B: MB with l = LB and g = GB, with GA < GB and LA != LB. Before this change, if update(MA, LA) was called before update(MB, LB), then optimal.m[k][LB] will not contain MB. If update(MB, LB) was called before update(MA, LA) then optimal.m[k][LB] will contain MB. In both cases optimal.m[k][LA] contains MA. This affects computation of bruteforce_update(k + 1), where we iterate through optimal.m[k] and update the optimal state based on the contents of optimal.m[k]. This change modifies update() so that it decides whether or not to store the match in optimal.m[k][l] by only comparing g against other optimal gs for the same length. Thus, the state of optimal.m[k][l] is not dependent on the order of the input.

lowe · 2016-07-11T00:28:48Z

But, this would only give potentially different outputs with the same g, right? If that's the case, and you're worried about non-deterministic output, a simpler change would be to sort the input matches by i before further processing (in this case, the next step is partitioning by j into matches_by_j).

I'm just trying to see if there's a simpler one-line change instead of your current diff.

If you can think of an example where a non-optimal sequence is returned, (vs different equally optimal sequences), that's a more series issue and i'd be interested to get an example input so i could walk through it myself.

rianhunter · 2016-07-11T03:54:01Z

The different non-deterministic outputs actually end up with the different g values.

Sorting matches_by_j fixes the non-determinism but it doesn't fix the correctness. The correct result depends on the incremental l value being computed and I don't think there is a matches_by_j sort that has a connection to the correct result. What actually should be sorted is the iteration over for l, last_m of optimal.m[k - 1] (in bruteforce_update) but it should be sorted by last_m.g, this will ensure update() is called in the right order.

I actually ran into this with the passphrase "horse_correct_battery_staple" [sic]. For one reason or another, zxcvbn-cpp provoked a different (& incorrect) result than upstream
zxcvbn.

To verify this yourself, randomize the iteration of for l, last_m of optimal.m[k - 1] for different runs of "horse_correct_battery_staple" and you'll get different result sequences with different g values.

rianhunter · 2016-07-11T04:01:25Z

Actually I'm not sure that sorting by last_m.g works either, it really needs to be sorted by the g that update() computes. Maybe there is a proof that sorting by last_m.g gives the same sort but I'm not sure at the moment.

rianhunter · 2016-07-19T03:40:08Z

Okay here is a patch that makes it easy to reproduce this bug. I believe this shows that the output of zxcvbn is currently sensitive to the ordering of the optimal.m[k] iteration. The test case is "correct_horse_battery_staple".
patch.txt

lowe · 2016-09-24T23:52:09Z

There was definitely a bug here -- I fixed it a slightly different way and applied your patch to make sure I now get identical output regardless of search order.

lowe added a commit that referenced this pull request Sep 24, 2016

bug fix: compare g on a per-length-l basis. #158

82c460f

lowe closed this Sep 24, 2016

EKOVEEVOKE1 mentioned this pull request Jul 18, 2025

The different non-deterministic outputs actually end up with the different g values. dropbox/hackpad#77

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compare optimal g on a per-length-l basis. #158

Compare optimal g on a per-length-l basis. #158

Uh oh!

rianhunter commented Jun 22, 2016

Uh oh!

lowe commented Jul 11, 2016

Uh oh!

rianhunter commented Jul 11, 2016 •

edited

Loading

Uh oh!

rianhunter commented Jul 11, 2016

Uh oh!

rianhunter commented Jul 19, 2016 •

edited

Loading

Uh oh!

lowe commented Sep 24, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Compare optimal g on a per-length-l basis. #158

Compare optimal g on a per-length-l basis. #158

Uh oh!

Conversation

rianhunter commented Jun 22, 2016

Uh oh!

lowe commented Jul 11, 2016

Uh oh!

rianhunter commented Jul 11, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rianhunter commented Jul 11, 2016

Uh oh!

rianhunter commented Jul 19, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lowe commented Sep 24, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rianhunter commented Jul 11, 2016 •

edited

Loading

rianhunter commented Jul 19, 2016 •

edited

Loading