Skip to content

Conversation

@rianhunter
Copy link
Contributor

Before this change, most_guessable_match_sequence() could return a
different result for the same set of matches depending on their
order.

For example, consider two terminating matches for the k prefix,
match A: MA with l = LA and g = GA, and match B: MB with l = LB and g = GB,
with GA < GB and LA != LB.

Before this change, if update(MA, LA) was called before update(MB,
LB), then optimal.m[k][LB] will not contain MB. If update(MB, LB) was
called before update(MA, LA) then optimal.m[k][LB] will contain MB.
In both cases optimal.m[k][LA] contains MA.

This affects computation of bruteforce_update(k + 1), where we iterate
through optimal.m[k] and update the optimal state based on the
contents of optimal.m[k].

This change modifies update() so that it decides whether or not to
store the match in optimal.m[k][l] by only comparing g against other
optimal gs for the same length. Thus, the state of optimal.m[k][l]
is not dependent on the order of the input.

Before this change, most_guessable_match_sequence() could return a
different result for the same set of matches depending on their
order.

For example, consider two terminating matches for the k prefix,
match A: MA with l = LA and g = GA, and match B: MB with l = LB and g = GB,
with GA < GB and LA != LB.

Before this change, if update(MA, LA) was called before update(MB,
LB), then optimal.m[k][LB] will not contain MB. If update(MB, LB) was
called before update(MA, LA) then optimal.m[k][LB] will contain MB.
In both cases optimal.m[k][LA] contains MA.

This affects computation of bruteforce_update(k + 1), where we iterate
through optimal.m[k] and update the optimal state based on the
contents of optimal.m[k].

This change modifies update() so that it decides whether or not to
store the match in optimal.m[k][l] by only comparing g against other
optimal gs for the same length. Thus, the state of optimal.m[k][l]
is not dependent on the order of the input.
@lowe
Copy link
Collaborator

lowe commented Jul 11, 2016

But, this would only give potentially different outputs with the same g, right? If that's the case, and you're worried about non-deterministic output, a simpler change would be to sort the input matches by i before further processing (in this case, the next step is partitioning by j into matches_by_j).

I'm just trying to see if there's a simpler one-line change instead of your current diff.

If you can think of an example where a non-optimal sequence is returned, (vs different equally optimal sequences), that's a more series issue and i'd be interested to get an example input so i could walk through it myself.

@rianhunter
Copy link
Contributor Author

rianhunter commented Jul 11, 2016

The different non-deterministic outputs actually end up with the different g values.

Sorting matches_by_j fixes the non-determinism but it doesn't fix the correctness. The correct result depends on the incremental l value being computed and I don't think there is a matches_by_j sort that has a connection to the correct result. What actually should be sorted is the iteration over for l, last_m of optimal.m[k - 1] (in bruteforce_update) but it should be sorted by last_m.g, this will ensure update() is called in the right order.

I actually ran into this with the passphrase "horse_correct_battery_staple" [sic]. For one reason or another, zxcvbn-cpp provoked a different (& incorrect) result than upstream
zxcvbn.

To verify this yourself, randomize the iteration of for l, last_m of optimal.m[k - 1] for different runs of "horse_correct_battery_staple" and you'll get different result sequences with different g values.

@rianhunter
Copy link
Contributor Author

Actually I'm not sure that sorting by last_m.g works either, it really needs to be sorted by the g that update() computes. Maybe there is a proof that sorting by last_m.g gives the same sort but I'm not sure at the moment.

@rianhunter
Copy link
Contributor Author

rianhunter commented Jul 19, 2016

Okay here is a patch that makes it easy to reproduce this bug. I believe this shows that the output of zxcvbn is currently sensitive to the ordering of the optimal.m[k] iteration. The test case is "correct_horse_battery_staple".
patch.txt

lowe added a commit that referenced this pull request Sep 24, 2016
@lowe
Copy link
Collaborator

lowe commented Sep 24, 2016

There was definitely a bug here -- I fixed it a slightly different way and applied your patch to make sure I now get identical output regardless of search order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants