feat: Privacy Preserving Learning #3485

manavsinghal157 · 2021-11-24T19:37:41Z

Part of the Empirical Analysis of Privacy Preserving Learning Project.

This PR introduces a command line argument that implements aggregated learning by saving only those features that have seen a minimum threshold of users thus upholding the privacy of the user.

Methodology:

For each feature, a 32-bit vector is defined. (vowpalwabbit/array_parameters.h and vowpalwabbit/array_parameters_dense.h)
We calculate a 5-bit hash of the tag of the example. (vowpalwabbit/parser.cc)
For each feature weight updated by a non-zero value, we use the 5-bit hash to look up a bit in the 32-bit vector and set it to 1.(vowpalwabbit/gd_predict.h -> (vowpalwabbit/array_parameters.h and vowpalwabbit/array_parameters_dense.h))
When saving the weights into a file, we calculate the number of bits set to 1 for a feature. If it is greater than the threshold, the weights for that feature are saved. (vowpalwabbit/gd.cc->(vowpalwabbit/array_parameters.h and vowpalwabbit/array_parameters_dense.h))

(The default value of the threshold is 10)

This PR includes:

Command line argument to activate privacy preservation and set the threshold. (vowpalwabbit/parse_args.cc)
Runtests to test the desired output on a small dataset. (test/core.vwtest.json)
Unit-tests for checking output when threshold is reached for a feature and when it is not. (test/unit_test/weights_test.cc)
Benchmarks to test time taken for learning in privacy preserving method. (test/benchmarks/standalone/benchmark_text_input.cc )

Implementation details:

--privacy_activation : To activate the feature
--privacy_activation_threshold arg (=10) : To set the threshold

Future Work:

Implement the feature for save_resume.
Work on aggregations in the online setting.

Wiki page for the same : https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Privacy-Preserving-Learning

…vsinghal157/vowpal_wabbit into RLOS_21_Privacy_Preserving_Learning

lalo · 2021-11-29T17:55:32Z

vowpalwabbit/array_parameters_dense.h

  uint32_t _stride_shift;
  bool _seeded;  // whether the instance is sharing model state with others
+  size_t _privacy_activation_threshold;
+  std::unordered_map<uint64_t, std::bitset<32>> _feature_bitset;  // define the bitset for each feature


this could be a unique_ptr set to nullptr unless the privacy mode is on, to be a bit explicit and avoid extra memory allocations

I like that suggestion, making it a shared ptr due to shallow copy fn

jackgerrits · 2021-11-29T19:09:52Z

vowpalwabbit/array_parameters_dense.h

 {
 private:
+  // struct to store the tag hash and if it is set or not
+  struct tag_hash_info


I wish it could be an optional

…vsinghal157/vowpal_wabbit into RLOS_21_Privacy_Preserving_Learning

jackgerrits · 2021-11-29T19:13:38Z

vowpalwabbit/audit_regressor.cc

      {
        INTERACTIONS::generate_interactions<audit_regressor_data, const uint64_t, audit_regressor_feature, true,
-            audit_regressor_interaction, sparse_parameters>(rd.all->interactions, rd.all->extent_interactions,
+            audit_regressor_interaction, sparse_parameters, true>(rd.all->interactions, rd.all->extent_interactions,


Suggested change

audit_regressor_interaction, sparse_parameters, true>(rd.all->interactions, rd.all->extent_interactions,

audit_regressor_interaction, sparse_parameters, true /*privacy_activation*/>(rd.all->interactions, rd.all->extent_interactions,

jackgerrits · 2021-11-29T19:15:22Z

vowpalwabbit/cb_explore_adf_rnd.cc

-  GD::foreach_feature<std::pair<float, float>, float, vec_add_with_norm, LazyGaussian>(w, all->ignore_some_linear,
-      all->ignore_linear, all->interactions, all->extent_interactions, all->permutations, *ec, dotwithnorm,
-      all->_generate_interactions_object_cache);
+  if (all->privacy_activation)


It seems not great that this is new global state

jackgerrits · 2021-11-29T19:15:31Z

vowpalwabbit/example.h

  float weight = 1.f;  // a relative importance weight for the example, default = 1
  v_array<char> tag;   // An identifier for the example.
  size_t example_counter = 0;
+  uint64_t tag_hash;  // Storing the hash of the tag for privacy preservation learning


jackgerrits · 2021-11-29T19:16:29Z

vowpalwabbit/ftrl.cc

+  if (b.all->weights.sparse && privacy_activation)
+  {
+    b.all->weights.sparse_weights.set_tag(
+        hashall(ec.tag.begin(), ec.tag.size(), b.all->hash_seed) % b.all->feature_bitset_size);
+    GD::foreach_feature<ftrl_update_data, inner_update_proximal>(*b.all, ec, b.data);
+    b.all->weights.sparse_weights.unset_tag();
+  }
+  else if (!b.all->weights.sparse && privacy_activation)
+  {
+    b.all->weights.dense_weights.set_tag(
+        hashall(ec.tag.begin(), ec.tag.size(), b.all->hash_seed) % b.all->feature_bitset_size);
+    GD::foreach_feature<ftrl_update_data, inner_update_proximal>(*b.all, ec, b.data);
+    b.all->weights.dense_weights.unset_tag();
+  }
+  else
+  {
+    GD::foreach_feature<ftrl_update_data, inner_update_proximal>(*b.all, ec, b.data);
+  }


This is a massive uptick in complexity when just doing an update. Can this be abstracted?

jackgerrits · 2021-11-29T19:20:57Z

vowpalwabbit/gd_predict.h

 {
 // iterate through one namespace (or its part), callback function FuncT(some_data_R, feature_value_x, feature_index)
-template <class DataT, void (*FuncT)(DataT&, float feature_value, uint64_t feature_index), class WeightsT>
+template <class DataT, void (*FuncT)(DataT&, float feature_value, uint64_t feature_index), bool privacy_activation>


privacy_activation seems unused?

olgavrou · 2021-11-29T21:57:52Z

Replaced by #3334

manavsinghal157 added 4 commits November 25, 2021 00:18

Privacy Preserving Learning

d57be31

Files added for tests

ed1986b

Corrupt_weights_Line

616c7ae

Missed benchmark addition

fe81b8c

manavsinghal157 mentioned this pull request Nov 24, 2021

feat: Privacy Preserving Learning #3327

Closed

olgavrou added this to the VW 9.0 milestone Nov 24, 2021

olgavrou added 6 commits November 29, 2021 08:42

Merge branch 'master' into RLOS_21_Privacy_Preserving_Learning

70a60fd

add privacy_activation on generated interactions path

2351b31

Merge branch 'RLOS_21_Privacy_Preserving_Learning' of github.com:mana…

c22ddbb

…vsinghal157/vowpal_wabbit into RLOS_21_Privacy_Preserving_Learning

formatting

540641c

formatting

bc201cd

fix unit test

7b58464

lalo reviewed Nov 29, 2021

View reviewed changes

Merge branch 'master' into RLOS_21_Privacy_Preserving_Learning

c5a44f0

jackgerrits reviewed Nov 29, 2021

View reviewed changes

olgavrou added 2 commits November 29, 2021 11:11

bit map made into sharedptr

bf0dedc

Merge branch 'RLOS_21_Privacy_Preserving_Learning' of github.com:mana…

9a7c4b7

…vsinghal157/vowpal_wabbit into RLOS_21_Privacy_Preserving_Learning

jackgerrits reviewed Nov 29, 2021

View reviewed changes

olgavrou closed this Nov 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Privacy Preserving Learning #3485

feat: Privacy Preserving Learning #3485

Uh oh!

manavsinghal157 commented Nov 24, 2021

Uh oh!

lalo Nov 29, 2021

Uh oh!

olgavrou Nov 29, 2021

Uh oh!

olgavrou Nov 29, 2021

Uh oh!

jackgerrits Nov 29, 2021

Uh oh!

jackgerrits Nov 29, 2021

Uh oh!

jackgerrits Nov 29, 2021

Uh oh!

jackgerrits Nov 29, 2021

Uh oh!

jackgerrits Nov 29, 2021

Uh oh!

jackgerrits Nov 29, 2021

Uh oh!

olgavrou commented Nov 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	audit_regressor_interaction, sparse_parameters, true>(rd.all->interactions, rd.all->extent_interactions,
	audit_regressor_interaction, sparse_parameters, true /privacy_activation/>(rd.all->interactions, rd.all->extent_interactions,

feat: Privacy Preserving Learning #3485

feat: Privacy Preserving Learning #3485

Uh oh!

Conversation

manavsinghal157 commented Nov 24, 2021

Methodology:

This PR includes:

Implementation details:

Future Work:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

olgavrou commented Nov 29, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants