Skip to content

Conversation

@charlesbluca
Copy link
Collaborator

Looks like we're seeing failures in CI due to a ModuleNotFoundError for sasl:

___________________ ERROR at setup of test_select_partitions ___________________
[gw0] linux -- Python 3.7.10 /usr/share/miniconda3/envs/test/bin/python

    @pytest.fixture(scope="session")
    def hive_cursor():
        """
        Getting a hive setup up and running is a bit more complicated.
        We need three running docker containers:
        * a postgres database to store the metadata
        * the metadata server itself
        * and a server to answer SQL queries
    
        They are all started one after the other to check,
        if they are up and running.
        We "fake" a network filesystem (instead of using hdfs),
        by mounting a temporary folder from the host to the
        docker container, which can be accessed both by hive
        and the dask-sql client.
    
        We just need to make sure, to remove all containers,
        the network and the temporary folders correctly again.
    
        The ideas for the docker setup are taken from the docker-compose
        hive setup described by bde2020.
        """
        client = docker.from_env()
    
        network = None
        hive_server = None
        hive_metastore = None
        hive_postgres = None
    
        tmpdir = tempfile.mkdtemp()
        tmpdir_parted = tempfile.mkdtemp()
    
        try:
            network = client.networks.create("dask-sql-hive", driver="bridge")
    
            hive_server = client.containers.create(
                "bde2020/hive:2.3.2-postgresql-metastore",
                hostname="hive-server",
                name="hive-server",
                network="dask-sql-hive",
                volumes=[f"{tmpdir}:{tmpdir}", f"{tmpdir_parted}:{tmpdir_parted}"],
                environment={
                    "HIVE_CORE_CONF_javax_jdo_option_ConnectionURL": "jdbc:postgresql://hive-metastore-postgresql/metastore",
                    **DEFAULT_CONFIG,
                },
            )
    
            hive_metastore = client.containers.create(
                "bde2020/hive:2.3.2-postgresql-metastore",
                hostname="hive-metastore",
                name="hive-metastore",
                network="dask-sql-hive",
                environment=DEFAULT_CONFIG,
                command="/opt/hive/bin/hive --service metastore",
            )
    
            hive_postgres = client.containers.create(
                "bde2020/hive-metastore-postgresql:2.3.0",
                hostname="hive-metastore-postgresql",
                name="hive-metastore-postgresql",
                network="dask-sql-hive",
            )
    
            # Wait for it to start
            hive_postgres.start()
            hive_postgres.exec_run(["bash"])
            for l in hive_postgres.logs(stream=True):
                if b"ready for start up." in l:
                    break
    
            hive_metastore.start()
            hive_metastore.exec_run(["bash"])
            for l in hive_metastore.logs(stream=True):
                if b"Starting hive metastore" in l:
                    break
    
            hive_server.start()
            hive_server.exec_run(["bash"])
            for l in hive_server.logs(stream=True):
                if b"Starting HiveServer2" in l:
                    break
    
            # The server needs some time to start.
            # It is easier to check for the first access
            # on the metastore than to wait some
            # arbitrary time.
            for l in hive_metastore.logs(stream=True):
                if b"get_multi_table" in l:
                    break
    
            time.sleep(2)
    
            hive_server.reload()
            address = hive_server.attrs["NetworkSettings"]["Networks"]["dask-sql-hive"][
                "IPAddress"
            ]
            port = 10000
>           cursor = sqlalchemy.create_engine(f"hive://{address}:{port}").connect()

tests/integration/test_hive.py:133: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/usr/share/miniconda3/envs/test/lib/python3.7/site-packages/sqlalchemy/engine/base.py:3194: in connect
    return self._connection_cls(self, close_with_result=close_with_result)
/usr/share/miniconda3/envs/test/lib/python3.7/site-packages/sqlalchemy/engine/base.py:96: in __init__
    else engine.raw_connection()
/usr/share/miniconda3/envs/test/lib/python3.7/site-packages/sqlalchemy/engine/base.py:3273: in raw_connection
    return self._wrap_pool_connect(self.pool.connect, _connection)
/usr/share/miniconda3/envs/test/lib/python3.7/site-packages/sqlalchemy/engine/base.py:3240: in _wrap_pool_connect
    return fn()
/usr/share/miniconda3/envs/test/lib/python3.7/site-packages/sqlalchemy/pool/base.py:310: in connect
    return _ConnectionFairy._checkout(self)
/usr/share/miniconda3/envs/test/lib/python3.7/site-packages/sqlalchemy/pool/base.py:868: in _checkout
    fairy = _ConnectionRecord.checkout(pool)
/usr/share/miniconda3/envs/test/lib/python3.7/site-packages/sqlalchemy/pool/base.py:476: in checkout
    rec = pool._do_get()
/usr/share/miniconda3/envs/test/lib/python3.7/site-packages/sqlalchemy/pool/impl.py:146: in _do_get
    self._dec_overflow()
/usr/share/miniconda3/envs/test/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py:72: in __exit__
    with_traceback=exc_tb,
/usr/share/miniconda3/envs/test/lib/python3.7/site-packages/sqlalchemy/util/compat.py:207: in raise_
    raise exception
/usr/share/miniconda3/envs/test/lib/python3.7/site-packages/sqlalchemy/pool/impl.py:143: in _do_get
    return self._create_connection()
/usr/share/miniconda3/envs/test/lib/python3.7/site-packages/sqlalchemy/pool/base.py:256: in _create_connection
    return _ConnectionRecord(self)
/usr/share/miniconda3/envs/test/lib/python3.7/site-packages/sqlalchemy/pool/base.py:371: in __init__
    self.__connect()
/usr/share/miniconda3/envs/test/lib/python3.7/site-packages/sqlalchemy/pool/base.py:666: in __connect
    pool.logger.debug("Error on connect(): %s", e)
/usr/share/miniconda3/envs/test/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py:72: in __exit__
    with_traceback=exc_tb,
/usr/share/miniconda3/envs/test/lib/python3.7/site-packages/sqlalchemy/util/compat.py:207: in raise_
    raise exception
/usr/share/miniconda3/envs/test/lib/python3.7/site-packages/sqlalchemy/pool/base.py:661: in __connect
    self.dbapi_connection = connection = pool._invoke_creator(self)
/usr/share/miniconda3/envs/test/lib/python3.7/site-packages/sqlalchemy/engine/create.py:590: in connect
    return dialect.connect(*cargs, **cparams)
/usr/share/miniconda3/envs/test/lib/python3.7/site-packages/sqlalchemy/engine/default.py:584: in connect
    return self.dbapi.connect(*cargs, **cparams)
/usr/share/miniconda3/envs/test/lib/python3.7/site-packages/pyhive/hive.py:104: in connect
    return Connection(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <pyhive.hive.Connection object at 0x7f76dc641210>, host = '172.18.0.4'
port = 10000, scheme = None, username = 'runner', database = 'default'
                # Always needs the Authorization header
                self._set_authorization_header(thrift_transport, username, password)
            elif auth == "KERBEROS" and kerberos_service_name:
                self._set_kerberos_header(thrift_transport, kerberos_service_name, host)
            else:
                raise ValueError(
                    "Authentication is not valid use one of:"
                    "BASIC, NOSASL, KERBEROS, NONE"
                )
            host, port, auth, kerberos_service_name, password = (
                None, None, None, None, None
            )
    
        username = username or getpass.getuser()
        configuration = configuration or {}
    
        if (password is not None) != (auth in ('LDAP', 'CUSTOM')):
            raise ValueError("Password should be set if and only if in LDAP or CUSTOM mode; "
                             "Remove password or use one of those modes")
        if (kerberos_service_name is not None) != (auth == 'KERBEROS'):
            raise ValueError("kerberos_service_name should be set if and only if in KERBEROS mode")
        if thrift_transport is not None:
            has_incompatible_arg = (
                host is not None
                or port is not None
                or auth is not None
                or kerberos_service_name is not None
                or password is not None
            )
            if has_incompatible_arg:
                raise ValueError("thrift_transport cannot be used with "
                                 "host/port/auth/kerberos_service_name/password")
    
        if thrift_transport is not None:
            self._transport = thrift_transport
        else:
            if port is None:
                port = 10000
            if auth is None:
                auth = 'NONE'
            socket = thrift.transport.TSocket.TSocket(host, port)
            if auth == 'NOSASL':
                # NOSASL corresponds to hive.server2.authentication=NOSASL in hive-site.xml
                self._transport = thrift.transport.TTransport.TBufferedTransport(socket)
            elif auth in ('LDAP', 'KERBEROS', 'NONE', 'CUSTOM'):
                # Defer import so package dependency is optional
>               import sasl
E               ModuleNotFoundError: No module named 'sasl'

/usr/share/miniconda3/envs/test/lib/python3.7/site-packages/pyhive/hive.py:203: ModuleNotFoundError

I wouldn't expect this, as we are installing PyHive, which seems to include this package in its hive requirements. Do we maybe need to explicitly install pyhive[hive]?

Either way, this PR should fix those failures.

@codecov-commenter
Copy link

codecov-commenter commented Oct 1, 2021

Codecov Report

Merging #244 (f130515) into main (4d5f7dd) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##              main      #244   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           64        64           
  Lines         2590      2590           
  Branches       361       361           
=========================================
  Hits          2590      2590           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4d5f7dd...f130515. Read the comment docs.

@charlesbluca
Copy link
Collaborator Author

@rajagurunath do you have any idea what could be going on here?

@charlesbluca
Copy link
Collaborator Author

Closing this in favor of handling this with a general CI refactor in #238

@charlesbluca charlesbluca reopened this Oct 5, 2021
@charlesbluca charlesbluca merged commit 3f9b7e3 into main Oct 5, 2021
@charlesbluca charlesbluca deleted the fix-gha-sasl branch October 5, 2021 17:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants