@@ -311,3 +311,160 @@ $> hdfs dfs -find /hbase -name \
311311 d41d8cd98f00b204e9800998ecf8427e19700118ffd9c244fe69488bbc9f2c77d24a3e6a
312312/hbase/mobdir/data/default/some_table/372c1b27e3dc0b56c3a031926e5efbe9/foo/d41d8cd98f00b204e9800998ecf8427e19700118ffd9c244fe69488bbc9f2c77d24a3e6a
313313----
314+
315+ ==== Moving a column family out of MOB
316+
317+ If you want to disable MOB on a column family you must ensure you instruct HBase to migrate the data
318+ out of the MOB system prior to turning the feature off. If you fail to do this HBase will return the
319+ internal MOB metadata to applications because it will not know that it needs to resolve the actual
320+ values.
321+
322+ The following procedure will safely migrate the underlying data without requiring a cluster outage.
323+ Clients will see a number of retries when configuration settings are applied and regions are
324+ reloaded.
325+
326+ .Procedure: Stop MOB maintenance, change MOB threshold, rewrite data via compaction
327+ . Ensure the MOB compaction chore in the Master is off by setting
328+ `hbase.mob.file.compaction.chore.period` to `0`. Applying this configuration change will require a
329+ rolling restart of HBase Masters. That will require at least one fail-over of the active master,
330+ which may cause retries for clients doing HBase administrative operations.
331+ . Ensure no MOB compactions are issued for the table via the HBase shell for the duration of this
332+ migration.
333+ . Use the HBase shell to change the MOB size threshold for the column family you are migrating to a
334+ value that is larger than the largest cell present in the column family. E.g. given a table named
335+ 'some_table' and a column family named 'foo' we can pick one gigabyte as an arbitrary "bigger than
336+ what we store" value:
337+ +
338+ ----
339+ hbase(main):011:0> alter 'some_table', {NAME => 'foo', MOB_THRESHOLD => '1000000000'}
340+ Updating all regions with the new schema...
341+ 9/25 regions updated.
342+ 25/25 regions updated.
343+ Done.
344+ 0 row(s) in 3.4940 seconds
345+ ----
346+ +
347+ Note that if you are still ingesting data you must ensure this threshold is larger than any cell
348+ value you might write; MAX_INT would be a safe choice.
349+
350+ . Perform a major compaction on the table. Specifically you are performing a "normal" compaction and
351+ not a MOB compaction.
352+ +
353+ ----
354+ hbase(main):012:0> major_compact 'some_table'
355+ 0 row(s) in 0.2600 seconds
356+ ----
357+
358+ . Monitor for the end of the major compaction. Since compaction is handled asynchronously you'll
359+ need to use the shell to first see the compaction start and then see it end.
360+ +
361+ HBase should first say that a "MAJOR" compaction is happening.
362+ +
363+ ----
364+ hbase(main):015:0> @hbase.admin(@formatter).instance_eval do
365+ hbase(main):016:1* p @admin.get_compaction_state('some_table').to_string
366+ hbase(main):017:2* end
367+ “MAJOR”
368+ ----
369+ +
370+ When the compaction has finished the result should print out "NONE".
371+ +
372+ ----
373+ hbase(main):015:0> @hbase.admin(@formatter).instance_eval do
374+ hbase(main):016:1* p @admin.get_compaction_state('some_table').to_string
375+ hbase(main):017:2* end
376+ “NONE”
377+ ----
378+ . Run the _mobrefs_ utility to ensure there are no MOB cells. Specifically, the tool will launch a
379+ Hadoop MapReduce job that will show a job counter of 0 input records when we've successfully
380+ rewritten all of the data.
381+ +
382+ ----
383+ $> HADOOP_CLASSPATH=/etc/hbase/conf:$(hbase mapredcp) yarn jar \
384+ /some/path/to/hbase-shaded-mapreduce.jar mobrefs mobrefs-report-output some_table foo
385+ ...
386+ 19/12/10 11:38:47 INFO impl.YarnClientImpl: Submitted application application_1575695902338_0004
387+ 19/12/10 11:38:47 INFO mapreduce.Job: The url to track the job: https://rm-2.example.com:8090/proxy/application_1575695902338_0004/
388+ 19/12/10 11:38:47 INFO mapreduce.Job: Running job: job_1575695902338_0004
389+ 19/12/10 11:38:57 INFO mapreduce.Job: Job job_1575695902338_0004 running in uber mode : false
390+ 19/12/10 11:38:57 INFO mapreduce.Job: map 0% reduce 0%
391+ 19/12/10 11:39:07 INFO mapreduce.Job: map 7% reduce 0%
392+ 19/12/10 11:39:17 INFO mapreduce.Job: map 13% reduce 0%
393+ 19/12/10 11:39:19 INFO mapreduce.Job: map 33% reduce 0%
394+ 19/12/10 11:39:21 INFO mapreduce.Job: map 40% reduce 0%
395+ 19/12/10 11:39:22 INFO mapreduce.Job: map 47% reduce 0%
396+ 19/12/10 11:39:23 INFO mapreduce.Job: map 60% reduce 0%
397+ 19/12/10 11:39:24 INFO mapreduce.Job: map 73% reduce 0%
398+ 19/12/10 11:39:27 INFO mapreduce.Job: map 100% reduce 0%
399+ 19/12/10 11:39:35 INFO mapreduce.Job: map 100% reduce 100%
400+ 19/12/10 11:39:35 INFO mapreduce.Job: Job job_1575695902338_0004 completed successfully
401+ 19/12/10 11:39:35 INFO mapreduce.Job: Counters: 54
402+ ...
403+ Map-Reduce Framework
404+ Map input records=0
405+ ...
406+ 19/12/09 22:41:28 INFO mapreduce.MobRefReporter: Finished creating report for 'some_table', family='foo'
407+ ----
408+ +
409+ If the data has not successfully been migrated out, this report will show both a non-zero number
410+ of input records and a count of mob cells.
411+ +
412+ ----
413+ $> HADOOP_CLASSPATH=/etc/hbase/conf:$(hbase mapredcp) yarn jar \
414+ /some/path/to/hbase-shaded-mapreduce.jar mobrefs mobrefs-report-output some_table foo
415+ ...
416+ 19/12/10 11:44:18 INFO impl.YarnClientImpl: Submitted application application_1575695902338_0005
417+ 19/12/10 11:44:18 INFO mapreduce.Job: The url to track the job: https://busbey-2.gce.cloudera.com:8090/proxy/application_1575695902338_0005/
418+ 19/12/10 11:44:18 INFO mapreduce.Job: Running job: job_1575695902338_0005
419+ 19/12/10 11:44:26 INFO mapreduce.Job: Job job_1575695902338_0005 running in uber mode : false
420+ 19/12/10 11:44:26 INFO mapreduce.Job: map 0% reduce 0%
421+ 19/12/10 11:44:36 INFO mapreduce.Job: map 7% reduce 0%
422+ 19/12/10 11:44:45 INFO mapreduce.Job: map 13% reduce 0%
423+ 19/12/10 11:44:47 INFO mapreduce.Job: map 27% reduce 0%
424+ 19/12/10 11:44:48 INFO mapreduce.Job: map 33% reduce 0%
425+ 19/12/10 11:44:50 INFO mapreduce.Job: map 40% reduce 0%
426+ 19/12/10 11:44:51 INFO mapreduce.Job: map 53% reduce 0%
427+ 19/12/10 11:44:52 INFO mapreduce.Job: map 73% reduce 0%
428+ 19/12/10 11:44:54 INFO mapreduce.Job: map 100% reduce 0%
429+ 19/12/10 11:44:59 INFO mapreduce.Job: map 100% reduce 100%
430+ 19/12/10 11:45:00 INFO mapreduce.Job: Job job_1575695902338_0005 completed successfully
431+ 19/12/10 11:45:00 INFO mapreduce.Job: Counters: 54
432+ ...
433+ Map-Reduce Framework
434+ Map input records=1
435+ ...
436+ MOB
437+ NUM_CELLS=1
438+ ...
439+ 19/12/10 11:45:00 INFO mapreduce.MobRefReporter: Finished creating report for 'some_table', family='foo'
440+ ----
441+ +
442+ If this happens you should verify that MOB compactions are disabled, verify that you have picked
443+ a sufficiently large MOB threshold, and redo the major compaction step.
444+ . When the _mobrefs_ report shows that no more data is stored in the MOB system then you can safely
445+ alter the column family configuration so that the MOB feature is disabled.
446+ +
447+ ----
448+ hbase(main):017:0> alter 'some_table', {NAME => 'foo', IS_MOB => 'false'}
449+ Updating all regions with the new schema...
450+ 8/25 regions updated.
451+ 25/25 regions updated.
452+ Done.
453+ 0 row(s) in 2.9370 seconds
454+ ----
455+ . After the column family no longer shows the MOB feature enabled, it is safe to start MOB
456+ maintenance chores again. You can allow the default to be used for
457+ `hbase.mob.file.compaction.chore.period` by removing it from your configuration files or restore
458+ it to whatever custom value you had prior to starting this process.
459+ . Once the MOB feature is disabled for the column family there will be no internal HBase process
460+ looking for data in the MOB storage area specific to this column family. There will still be data
461+ present there from prior to the compaction process that rewrote the values into HBase's data area.
462+ You can check for this residual data directly in HDFS as an HBase superuser.
463+ +
464+ ----
465+ $ hdfs dfs -count /hbase/mobdir/data/default/some_table
466+ 4 54 9063269081 /hbase/mobdir/data/default/some_table
467+ ----
468+ +
469+ This data is spurious and may be reclaimed. You should sideline it, verify your application’s view
470+ of the table, and then delete it.
0 commit comments