Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.io.FSDataInputStreamWrapper;
import org.apache.hadoop.hbase.regionserver.DataTieringManager;
import org.apache.yetus.audience.InterfaceAudience;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
Expand All @@ -40,8 +41,13 @@ public HFilePreadReader(ReaderContext context, HFileInfo fileInfo, CacheConfig c

final MutableBoolean shouldCache = new MutableBoolean(true);

// Initialize HFileInfo object with metadata for caching decisions
fileInfo.initMetaAndIndex(this);
DataTieringManager dataTieringManager = DataTieringManager.getInstance();
if (dataTieringManager != null) {
// Initialize HFileInfo object with metadata for caching decisions.
// Initialize the metadata only if the data-tiering is enabled.
// If not, the metadata will be initialized later.
fileInfo.initMetaAndIndex(this);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this part is needed. We decided not to use Data Tiering logic here and restrict it to Bucket Cache, which I will take care of in the refactoring JIRA. Additionally, if the concern is to avoid executing initMetaAndIndex twice, then I have modified initMetaAndIndex to only execute once.

Copy link
Contributor Author

@jhungund jhungund Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is to avoid any changes to the old functionality. We had observed that this line does introduce some performance regression leading to a unit test failure. We keep the old behaviour to initialise the metadata at the original location when data-tiering is not enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us please avoid spaghetti code and keep components cohesive. We shall not pollute the reader with data tiering logic. As @vinayakphegde mentioned, initiMetaAndIndex already knows it should perform init logic only once in the flow, so this call here won't add extra execution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I will remove the check here. The intention was to keep the old behaviour when the feature is disabled.


cacheConf.getBlockCache().ifPresent(cache -> {
Optional<Boolean> result = cache.shouldCacheFile(fileInfo, conf);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -985,11 +985,10 @@ void freeSpace(final String why) {

// Check the list of files to determine the cold files which can be readily evicted.
Map<String, String> coldFiles = null;
try {
DataTieringManager dataTieringManager = DataTieringManager.getInstance();

DataTieringManager dataTieringManager = DataTieringManager.getInstance();
if (dataTieringManager != null) {
coldFiles = dataTieringManager.getColdFilesList();
} catch (IllegalStateException e) {
LOG.warn("Data Tiering Manager is not set. Ignore time-based block evictions.");
}
// Scan entire map putting bucket entry into appropriate bucket entry
// group
Expand Down Expand Up @@ -2195,17 +2194,19 @@ public Optional<Boolean> blockFitsIntoTheCache(HFileBlock block) {
@Override
public Optional<Boolean> shouldCacheFile(HFileInfo hFileInfo, Configuration conf) {
String fileName = hFileInfo.getHFileContext().getHFileName();
try {
DataTieringManager dataTieringManager = DataTieringManager.getInstance();
DataTieringManager dataTieringManager = DataTieringManager.getInstance();
if (dataTieringManager != null) {
if (!dataTieringManager.isHotData(hFileInfo, conf)) {
LOG.debug("Data tiering is enabled for file: '{}' and it is not hot data", fileName);
return Optional.of(false);
} else {
LOG.debug("Data tiering is enabled for file: '{}' and it is hot data", fileName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: no need for this extra else block, just do the logging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack!

}
} catch (IllegalStateException e) {
LOG.error("Error while getting DataTieringManager instance: {}", e.getMessage());
} else {
LOG.debug("Data tiering feature is not enabled. "
+ " The file: '{}' will be loaded if not already loaded", fileName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: no need for this extra else block, just do the logging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack!

}

// if we don't have the file in fullyCachedFiles, we should cache it
// if we don't have the file in fullyCachedFiles, we should cache it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: avoid useless line changes in the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack!

return Optional.of(!fullyCachedFiles.containsKey(fileName));
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,8 @@
@InterfaceAudience.Private
public class DataTieringManager {
private static final Logger LOG = LoggerFactory.getLogger(DataTieringManager.class);
public static final String DATA_TIERING_ENABLED_KEY = "hbase.hstore.datatiering.enable";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is a cluster-wide (or region server-level) configuration, I think the format should be something like hbase.regionserver.*, although I'm not entirely sure. Additionally, could we use a better name for the variable to indicate that this is a cluster-wide configuration? like, GLOBAL_DATA_TIERING_ENABLED_KEY?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack!

public static final boolean DEFAULT_DATA_TIERING_ENABLED = false; // disabled by default
public static final String DATATIERING_KEY = "hbase.hstore.datatiering.type";
public static final String DATATIERING_HOT_DATA_AGE_KEY =
"hbase.hstore.datatiering.hot.age.millis";
Expand All @@ -61,25 +63,25 @@ private DataTieringManager(Map<String, HRegion> onlineRegions) {
* Initializes the DataTieringManager instance with the provided map of online regions.
* @param onlineRegions A map containing online regions.
*/
public static synchronized void instantiate(Map<String, HRegion> onlineRegions) {
if (instance == null) {
instance = new DataTieringManager(onlineRegions);
LOG.info("DataTieringManager instantiated successfully.");
public static synchronized void instantiate(Configuration conf,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this Configuration parameter? We only enter here when isDataTieringFeatureEnabled(conf) is true, so why do we need to check it again?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an additional check to avoid an unintentional use if someone makes a call to this function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are enforcing the check here, than there's no point in exposing isDataTieringFeatureEnabled as a public method, as you are not giving any flexibility to clients on what to do upon the return of isDataTieringFeatureEnabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the function isDataTieringFeatureEnabled private function of the class. We do not want to give the undue flexiblity to the user to instantiate the DataTieringManager even when the feature key is disabled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, would it make sense to return True if instantiated successfully and false otherwise?

Map<String, HRegion> onlineRegions) {
if (isDataTieringFeatureEnabled(conf)) {
if (instance == null) {
instance = new DataTieringManager(onlineRegions);
LOG.info("DataTieringManager instantiated successfully.");
} else {
LOG.warn("DataTieringManager is already instantiated.");
}
} else {
LOG.warn("DataTieringManager is already instantiated.");
LOG.info("Data-Tiering feature is not enabled.");
}
}

/**
* Retrieves the instance of DataTieringManager.
* @return The instance of DataTieringManager.
* @throws IllegalStateException if DataTieringManager has not been instantiated.
*/
public static synchronized DataTieringManager getInstance() {
if (instance == null) {
throw new IllegalStateException(
"DataTieringManager has not been instantiated. Call instantiate() first.");
}
return instance;
}

Expand Down Expand Up @@ -308,4 +310,9 @@ public Map<String, String> getColdFilesList() {
}
return coldFiles;
}

public static boolean isDataTieringFeatureEnabled(Configuration conf) {
return conf.getBoolean(DataTieringManager.DATA_TIERING_ENABLED_KEY,
DataTieringManager.DEFAULT_DATA_TIERING_ENABLED);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -534,7 +534,9 @@ public HRegionServer(final Configuration conf) throws IOException {
regionServerAccounting = new RegionServerAccounting(conf);

blockCache = BlockCacheFactory.createBlockCache(conf);
DataTieringManager.instantiate(onlineRegions);
if (DataTieringManager.isDataTieringFeatureEnabled(conf)) {
DataTieringManager.instantiate(conf, onlineRegions);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are not instantiating the DataTieringManager itself when global configuration is not enabled, do we need to restart every time we change that global parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we will require server restart since we are initialising the DataTieringManager during server restart.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this if check, since you do it internally on DataTieringManager.instantiate. If you want to keep the check inside DataTieringManager.instantiate, then no need for this check here.

mobFileCache = new MobFileCache(conf);

rsSnapshotVerifier = new RSSnapshotVerifier(conf);
Expand Down
Loading