-
Notifications
You must be signed in to change notification settings - Fork 4.1k
ARROW-2034: [C++] Filesystem implementation for Azure Blob Storage #12914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 9 commits
Commits
Show all changes
38 commits
Select commit
Hold shift + click to select a range
b60c7f4
ARROW-2034: [C++] Filesystem implementation for AzureBlobFileSystem
1e2d0a3
ARROW-2034: [C++] Fixed formatting issues
d3cffa2
ARROW-2034: [C++] Fixed formatting issues
af13444
Added -DARROW_AZURE in ci
1026e15
Added CXX_STANDARD and CXX_STANDARD_REQUIRED
5f8b82a
Added mocked test file
b53a834
Merge remote-tracking branch 'upstream/master' into ARROW-2034-azurefs
5bd8210
Turned -DARROW_AZURE=OFF in appveyor-cpp-build
eead673
Changed default C++ version
f99fad5
Changed LibXml2 target
e2008d8
Fixing CMake styling issues
bb49f62
Enabling ARROW_AZURE flag
323b394
Added OpenSSL dependency
95cc602
Disabling ARROW_AZURE in windows-mingw
9350b4c
Fixing lint issues
9cd1a1a
Fixing azurefs_test
8ba75ae
Added Azurite
ca9a6fc
Added azurefs_objlib
f067ba9
Reverting azure object library changes
1f26725
Added permissions to install_azurite.sh
c16f853
chmod +x ci/scripts/install_azurite.sh
kou 14267c2
Don't specify CMAKE_CXX_STANDARD by default
kou 11ce11f
Fix system detection
kou a428a2b
Fix syntax
kou a62d104
Fix style
kou 488e223
Fix style
kou 3831a88
Running azurite through boost::process
8248c48
Fixed naming in azurefs_test.cc
dcd6e30
Fixed naming in azurefs.cc
b15a6b1
Fixed OpenOutputStream
a06c480
Merge remote-tracking branch 'upstream/master' into ARROW-2034-azurefs
a40a316
Added uri.Parse()
8600b6b
Updated versions.txt
18dc625
Merge remote-tracking branch 'upstream/master' into ARROW-2034-azurefs
b532701
Fixed ARROW_AZURE_STORAGE_BLOBS_URL
200592b
Added libxml2-dev
fe5b311
Merge remote-tracking branch 'upstream/master' into ARROW-2034-azurefs
3ea2d7f
Fixed build errors
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -432,6 +432,46 @@ else() | |
| "${THIRDPARTY_MIRROR_URL}/aws-sdk-cpp-${ARROW_AWSSDK_BUILD_VERSION}.tar.gz") | ||
| endif() | ||
|
|
||
| if(DEFINED ENV{ARROW_AZURE_CORE_URL}) | ||
| set(AZURE_CORE_SOURCE_URL "$ENV{ARROW_AZURE_CORE_URL}") | ||
| else() | ||
| set_urls(AZURE_CORE_SOURCE_URL | ||
| "https://github.com/Azure/azure-sdk-for-cpp/archive/azure-core_${ARROW_AZURE_CORE_BUILD_VERSION}.tar.gz" | ||
| ) | ||
| endif() | ||
|
|
||
| if(DEFINED ENV{ARROW_AZURE_IDENTITY_URL}) | ||
| set(AZURE_IDENTITY_SOURCE_URL "$ENV{ARROW_AZURE_IDENTITY_URL}") | ||
| else() | ||
| set_urls(AZURE_IDENTITY_SOURCE_URL | ||
| "https://github.com/Azure/azure-sdk-for-cpp/archive/azure-identity_${ARROW_AZURE_IDENTITY_BUILD_VERSION}.tar.gz" | ||
| ) | ||
| endif() | ||
|
|
||
| if(DEFINED ENV{ARROW_AZURE_STORAGE_BLOB_URL}) | ||
| set(AZURE_STORAGE_BLOB_SOURCE_URL "$ENV{ARROW_AZURE_STORAGE_BLOB_URL}") | ||
| else() | ||
| set_urls(AZURE_STORAGE_BLOB_SOURCE_URL | ||
| "https://github.com/Azure/azure-sdk-for-cpp/archive/azure-storage-blobs_${ARROW_AZURE_STORAGE_BLOB_BUILD_VERSION}.tar.gz" | ||
| ) | ||
| endif() | ||
|
|
||
| if(DEFINED ENV{ARROW_AZURE_STORAGE_COMMON_URL}) | ||
| set(AZURE_STORAGE_COMMON_SOURCE_URL "$ENV{ARROW_AZURE_STORAGE_COMMON_URL}") | ||
| else() | ||
| set_urls(AZURE_STORAGE_COMMON_SOURCE_URL | ||
| "https://github.com/Azure/azure-sdk-for-cpp/archive/azure-storage-common_${ARROW_AZURE_STORAGE_COMMON_BUILD_VERSION}.tar.gz" | ||
| ) | ||
| endif() | ||
|
|
||
| if(DEFINED ENV{ARROW_AZURE_STORAGE_FILES_DATALAKE_URL}) | ||
| set(AZURE_STORAGE_FILES_DATALAKE_SOURCE_URL "$ENV{ARROW_AZURE_STORAGE_FILES_DATALAKE_URL}") | ||
| else() | ||
| set_urls(AZURE_STORAGE_FILES_DATALAKE_SOURCE_URL | ||
| "https://github.com/Azure/azure-sdk-for-cpp/archive/azure-storage-files-datalake_${ARROW_AZURE_STORAGE_FILES_DATALAKE_BUILD_VERSION}.tar.gz" | ||
| ) | ||
| endif() | ||
|
|
||
| if(DEFINED ENV{ARROW_BOOST_URL}) | ||
| set(BOOST_SOURCE_URL "$ENV{ARROW_BOOST_URL}") | ||
| else() | ||
|
|
@@ -4553,6 +4593,98 @@ if(ARROW_S3) | |
| endif() | ||
| endif() | ||
|
|
||
| macro(build_azuresdk) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you build Azure C++ SDK by |
||
| message(STATUS "Building Azure C++ SDK from source") | ||
|
|
||
| set(AZURESDK_PREFIX "${CMAKE_CURRENT_BINARY_DIR}/azuresdk_ep-install") | ||
| set(AZURESDK_INCLUDE_DIR "${AZURESDK_PREFIX}/include") | ||
|
|
||
| set(AZURESDK_CMAKE_ARGS | ||
| ${EP_COMMON_CMAKE_ARGS} | ||
| -DBUILD_TESTING=OFF | ||
| -DCMAKE_INSTALL_LIBDIR=lib | ||
| "-DCMAKE_INSTALL_PREFIX=${AZURESDK_PREFIX}" | ||
| -DCMAKE_PREFIX_PATH=${AZURESDK_PREFIX}) | ||
|
|
||
| file(MAKE_DIRECTORY ${AZURESDK_INCLUDE_DIR}) | ||
|
|
||
| # Azure C++ SDK related libraries to link statically | ||
| set(_AZURESDK_LIBS | ||
| azure-core | ||
| azure-identity | ||
| azure-storage-blobs | ||
| azure-storage-common | ||
| azure-storage-files-datalake) | ||
| set(AZURESDK_LIBRARIES) | ||
| set(AZURESDK_LIBRARIES_CPP) | ||
| foreach(_AZURESDK_LIB ${_AZURESDK_LIBS}) | ||
| string(TOUPPER ${_AZURESDK_LIB} _AZURESDK_LIB_UPPER) | ||
| string(REPLACE "-" "_" _AZURESDK_LIB_NAME_PREFIX ${_AZURESDK_LIB_UPPER}) | ||
| list(APPEND AZURESDK_LIBRARIES_CPP "${_AZURESDK_LIB}-cpp") | ||
| set(_AZURESDK_TARGET_NAME Azure::${_AZURESDK_LIB}) | ||
| set(_AZURESDK_STATIC_LIBRARY | ||
| "${AZURESDK_PREFIX}/lib/${CMAKE_STATIC_LIBRARY_PREFIX}${_AZURESDK_LIB}${CMAKE_STATIC_LIBRARY_SUFFIX}" | ||
| ) | ||
| add_library(${_AZURESDK_TARGET_NAME} STATIC IMPORTED) | ||
| set_target_properties(${_AZURESDK_TARGET_NAME} | ||
| PROPERTIES IMPORTED_LOCATION ${_AZURESDK_STATIC_LIBRARY} | ||
| INTERFACE_INCLUDE_DIRECTORIES | ||
| "${AZURESDK_INCLUDE_DIR}") | ||
| set("${_AZURESDK_LIB_NAME_PREFIX}_STATIC_LIBRARY" ${_AZURESDK_STATIC_LIBRARY}) | ||
| list(APPEND AZURESDK_LIBRARIES ${_AZURESDK_TARGET_NAME}) | ||
| endforeach() | ||
|
|
||
| externalproject_add(azure_core_ep | ||
| ${EP_LOG_OPTIONS} | ||
| URL ${AZURE_CORE_SOURCE_URL} | ||
| URL_HASH "SHA256=${ARROW_AZURE_CORE_BUILD_SHA256_CHECKSUM}" | ||
| CMAKE_ARGS ${AZURESDK_CMAKE_ARGS} | ||
| BUILD_BYPRODUCTS ${AZURE_CORE_STATIC_LIBRARY}) | ||
| add_dependencies(Azure::azure-core azure_core_ep) | ||
|
|
||
| externalproject_add(azure_identity_ep | ||
| ${EP_LOG_OPTIONS} | ||
| URL ${AZURE_IDENTITY_SOURCE_URL} | ||
| URL_HASH "SHA256=${ARROW_AZURE_IDENTITY_BUILD_SHA256_CHECKSUM}" | ||
| CMAKE_ARGS ${AZURESDK_CMAKE_ARGS} | ||
| BUILD_BYPRODUCTS ${AZURE_IDENTITY_STATIC_LIBRARY}) | ||
| add_dependencies(Azure::azure-identity azure_identity_ep) | ||
|
|
||
| externalproject_add(azure_storage_blobs_ep | ||
| ${EP_LOG_OPTIONS} | ||
| URL ${AZURE_STORAGE_BLOB_SOURCE_URL} | ||
| URL_HASH "SHA256=${ARROW_AZURE_STORAGE_BLOB_BUILD_SHA256_CHECKSUM}" | ||
| CMAKE_ARGS ${AZURESDK_CMAKE_ARGS} | ||
| BUILD_BYPRODUCTS ${AZURE_STORAGE_BLOBS_STATIC_LIBRARY}) | ||
| add_dependencies(Azure::azure-storage-blobs azure_storage_blobs_ep) | ||
|
|
||
| externalproject_add(azure_storage_common_ep | ||
| ${EP_LOG_OPTIONS} | ||
| URL ${AZURE_STORAGE_COMMON_SOURCE_URL} | ||
| URL_HASH "SHA256=${ARROW_AZURE_STORAGE_COMMON_BUILD_SHA256_CHECKSUM}" | ||
| CMAKE_ARGS ${AZURESDK_CMAKE_ARGS} | ||
| BUILD_BYPRODUCTS ${AZURE_STORAGE_COMMON_STATIC_LIBRARY}) | ||
| add_dependencies(Azure::azure-storage-common azure_storage_common_ep) | ||
|
|
||
| externalproject_add(azure_storage_files_datalake_ep | ||
| ${EP_LOG_OPTIONS} | ||
| URL ${AZURE_STORAGE_FILES_DATALAKE_SOURCE_URL} | ||
| URL_HASH "SHA256=${ARROW_AZURE_STORAGE_FILES_DATALAKE_BUILD_SHA256_CHECKSUM}" | ||
| CMAKE_ARGS ${AZURESDK_CMAKE_ARGS} | ||
| BUILD_BYPRODUCTS ${AZURE_STORAGE_FILES_DATALAKE_STATIC_LIBRARY}) | ||
| add_dependencies(Azure::azure-storage-files-datalake azure_storage_files_datalake_ep) | ||
|
|
||
| set(AZURESDK_LINK_LIBRARIES ${AZURESDK_LIBRARIES}) | ||
| endmacro() | ||
|
|
||
| if(ARROW_AZURE) | ||
| build_azuresdk() | ||
| find_curl() | ||
| find_package(LibXml2 REQUIRED) | ||
| message(STATUS "Found Azure SDK headers: ${AZURESDK_INCLUDE_DIR}") | ||
| message(STATUS "Found Azure SDK libraries: ${AZURESDK_LINK_LIBRARIES}") | ||
| endif() | ||
|
|
||
| # ---------------------------------------------------------------------- | ||
| # ucx - communication framework for modern, high-bandwidth and low-latency networks | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -469,6 +469,12 @@ if(ARROW_FILESYSTEM) | |
| filesystem/path_util.cc | ||
| filesystem/util_internal.cc) | ||
|
|
||
| if(ARROW_AZURE) | ||
| list(APPEND ARROW_SRCS filesystem/azurefs.cc filesystem/azurefs_mock.cc) | ||
| set_source_files_properties(filesystem/azurefs.cc filesystem/azurefs_mock.cc | ||
| PROPERTIES SKIP_PRECOMPILE_HEADERS ON | ||
| SKIP_UNITY_BUILD_INCLUSION ON) | ||
| endif() | ||
| if(ARROW_GCS) | ||
| list(APPEND ARROW_SRCS filesystem/gcsfs.cc filesystem/gcsfs_internal.cc) | ||
| set_source_files_properties(filesystem/gcsfs.cc filesystem/gcsfs_internal.cc | ||
|
|
@@ -573,6 +579,7 @@ add_arrow_lib(arrow | |
| SHARED_INSTALL_INTERFACE_LIBS | ||
| ${ARROW_SHARED_INSTALL_INTERFACE_LIBS}) | ||
|
|
||
| target_link_libraries(arrow_shared PUBLIC LibXml2::LibXml2) | ||
|
||
| add_dependencies(arrow ${ARROW_LIBRARIES}) | ||
|
|
||
| if(ARROW_BUILD_STATIC AND WIN32) | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I don't think we should do that over the entire codebase, only
azurefs.cc?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to set
CMAKE_CXX_STANDARDto 14 only for the Azure component given thatazurefs.cc/azurefs_test.ccis a part of the Arrow target? If we make a separate target for compiling Azure files then it might be possible to useset_target_properties(azurefs_objlib PROPERTIES CXX_STANDARD 14 CXX_STANDARD_REQUIRED ON). Initially, we had a separate Azure target but this change was suggested by @kouThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea what CMake allows but hopefully we are not obliged to make this a project-wide decision. @kou What is your take?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that it's reasonable to use project global
CMAKE_CXX_STANDARD=14with-DARROW_AZURE=ON. It'll reduce our CMake related maintenance cost. Here are reasons:CXX_STANDARDhttps://cmake.org/cmake/help/latest/prop_tgt/CXX_STANDARD.html target property to control C++ version but it's only for CMake target not for source. It means that we need to create a target forazurefs.ccto useCXX_STANDARD. In general, we should use object library https://cmake.org/cmake/help/latest/command/add_library.html#object-libraries for this but CMake 3.5 (our minimum required version) doesn't provide enough convenient features for object library. For example, we can't use object library fortarget_link_libraries(). ("New in version 3.12: Object libraries can be linked to with target_link_libraries().") I don't want to maintain such CMake code for this case...CMAKE_CXX_STANDARD=14doesn't cause a new problem. Our code base should work withcmake -DCMAKE_CXX_STANDARD=14 ...andcmake -DCMAKE_CXX_STANDARD=17 .... For example, we have tests for these cases: https://github.com/apache/arrow/blob/master/dev/tasks/tasks.yml#L1189-L1198There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we can require CMake 3.12 or later for
-DARROW_AZURE=ONand use object library instead of project globalCMAKE_CXX_STANDARD=14.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be better. Upgrading CMake is easier than changing compilers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that we need a C++14 compatible compiler for
-DARROW_AZURE=ONeven if we choose either theCMAKE_CXX_STANDARD=14approach or the object library approach.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, yes, you're right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we may as well keep it like that then.