Skip to content
Federico Stagni edited this page Oct 20, 2025 · 50 revisions

DIRAC v9.0

DIRAC v9 is the first version of DIRAC that needs to be deployed together with DiracX v0.0.1. This wiki details the changes needed to deploy both. This document focus on updating from DIRAC v8 to DIRACv9+DiracX. New DIRAC users can skip the database updates and several other details.

Specific notes on DIRAC v9

  • The concept of "Setup" is disappearing. The concept (which is explained in https://dirac.readthedocs.io/en/latest/AdministratorGuide/Introduction/diraccomponents.html) has a long history and effectively enabled the possibility to use one single machine as a server for multiple setups/installations (e.g. production and testing setups on the same node). This possibility is recognized now as being not useful. Therefore, this functionality is being removed.

  • DISET -> JSON encoding. The default encoding becomes JSON: PR https://github.com/DIRACGrid/DIRAC/pull/6466 removes environment variable DIRAC_USE_JSON_DECODE and flips the default of DIRAC_USE_JSON_ENCODE. Most probably nothing has to be done, unless you've overwritten these values in the past. You may end up having issues in particular if you have dictionaries indexed by integers, which are converted into strings. To help you test your extensions, DIRAC_DEBUG_DENCODE_CALLSTACK will print out any such situation. Note that this already works in v8

  • 1024-bit proxies no longer work.

  • PR https://github.com/DIRACGrid/DIRAC/pull/7439 removes the possibility to use external StorageElements as SandboxStore. Most probably you have nothing to do.

Migration from v8 to v9+DiracX

skeleton document for migration

What follows is a skeleton of migration. We suggest to copy/paste/edit what is below, adapting it to your installation. It contains few sections, starting from "PRE".

PRE

Things that have to be done in v8, before you even start considering a migration

  • Install the latest DIRAC v8

  • The SecurityLogging Service is not anymore used by default: follow one of these 2 options:

    • If you use the centralized logging together with message queue and logstash, follow the instructions above to configure logstash
    • If you do not want to use centralized logging, set the CS flag /Operations/[vo]/EnableSecurityLogging = True (this is False by default)
  • JobParameters need to be stored in OpenSearch. See last bullet of this documentation

  • If you have not done it before, you'll need to install and use RSS

    • No need to go through the "Advanced Configuration"
    • to make data management operations working, the different StorageElements must be set to Status Active in the RSS
  • replace ARC and ARC6 with AREX by following these instructions:

    1. From a DIRAC client, execute the following command to get the port used by the CE to provide the AREX interface:
    $ arcinfo -c <ce name>`
    Computing service: arc (production)`
      `Information endpoint: ldap://<ce name>:2135/Mds-Vo-Name=local,o=grid (org.nordugrid.ldapng)`
      Information endpoint: ldap://<ce name>:2135/o=glue (org.nordugrid.ldapglue2)
      Information endpoint: https://<ce name>:443/arex (org.nordugrid.arcrest) <------------ get the port used here
      Information endpoint: https://<ce name>:443/arex (org.ogf.glue.emies.resourceinfo)
      Submission endpoint: https://<ce name>:443/arex (status: ok, interface: org.nordugrid.arcrest)
      Submission endpoint: https://<ce name>:443/arex (status: ok, interface: org.ogf.glue.emies.activitycreation)
      Submission endpoint: gsiftp://<ce name>:2811/jobs (status: ok, interface: org.nordugrid.gridftpjob)
    1. Simply replace CEType=ARC or ARC6 with CEType=AREX in the configuration
    2. If the port is not 443 (the default one), then you also need to add the following option to your configuration:
    Port = <AREX port>
  • if your .cfg files (e.g. dirac.cfg) are managed by puppet (or something else) prepare an update for removing the Setup and instanceName

  • if you have a DIRAC extension, update it considering the many changes. Special note: Setup disappeared from many places.

  • if you have a DIRAC extension you might need to code an empty vodiracx extension, depending on what your extension does. For examples:

  • if you have a WebAppDIRAC extension, code an empty vodiracx-web extension. For examples:

  • Satisfy the minimal requirements for running DiracX

  • Add in CS a DiracX section. Inside, for the moment add only the option DisabledVOs. The value of this option is a list of all the Virtual Organizations that your DIRAC installation supports. This list will be reduced and possibly completely removed after the DiracX full installation (more info here)

  • deploy (vo)DiracX(-web) by following the instructions

  • You can add the Pilot command RegisterPilot in the list of commands run by the Pilot (this is already the default, and it only applies if you were previously running with a non-default list of commands)

  • make sure that the Hosts running TransformationSystem Agents have the ProductionManagement property. The same is true for a shifterProxy and groups that are used for Transformations

  • If you have been previously using the "Elastic Job Parameters DB" (see v8 doc here, by setting /Operations/<Defaults|setup>/Services/JobMonitoring/useESForJobParametersFlag=True) then copy the section Systems/<instance>/WorkloadManagement/Databases/ElasticJobParametersDB to Systems/<instance>/WorkloadManagement/Databases/JobParametersDB.

the day before the update

  • Install the latest DIRAC v8
  • partial drain of the system (can't fully drain) by stopping the Transformation/WorkflowTask agents

few hours before the update

  • stop Transformation/RequestTask agents
  • stop Transformation/Transformation agents
  • stop RequestManagement/RequestExecuting agent

Update phase ("deep downtime")

  • stop all DIRAC components (agents, services, executors) with the exception of:
    • Configuration/Server Services
    • Framework/SystemAdministrator (of these, there will be one per server)
  • update DBs with the following:
GRANT CREATE TEMPORARY TABLES ON *.* TO 'Dirac'@'%';

use JobDB;
ALTER TABLE `Jobs` ADD COLUMN `VO` VARCHAR(64);
use PilotAgentsDB;
ALTER TABLE `PilotAgents` ADD COLUMN `VO` VARCHAR(64);

use TaskQueueDB;
ALTER TABLE `tq_TaskQueues` ADD COLUMN `Owner` VARCHAR(255) NOT NULL;
ALTER TABLE `tq_TaskQueues` ADD COLUMN VO VARCHAR(64);

use SandboxMetadataDB;
ALTER TABLE `sb_Owners` ADD COLUMN `VO` VARCHAR(64);

use TransformationDB;
ALTER TABLE `Transformations` ADD COLUMN `Author` VARCHAR(255) NOT NULL;
ALTER TABLE `Transformations` MODIFY COLUMN `AuthorDN` VARCHAR(255) DEFAULT NULL;

use ReqDB;
ALTER TABLE `Request` ADD COLUMN `Owner` VARCHAR(255) NOT NULL;
  • Save the following script for adding "VO" info to few DBs, in a (whatever, e.g. in /opt/dirac) directory of a DIRAC server machine, then:
    • if you do not use the TransformationSystem nor the ProductionSystem, run it with:
    python script_name.py -o /DIRAC/Security/UseServerCertificate=yes
    
    while if you use one or both of them (adjust the flags) run it with:
    python script_name.py --transformation --production -o /DIRAC/Security/UseServerCertificate=yes
    
  • update DBs with following SQL statements: https://gist.github.com/fstagni/d977b4f3ebe5432ee7bb2743145dc837
    • NOTE if you are running an older versions of MySQL or MariaDB, the following might fail:
      WITH xxx AS (
          SELECT MAX(OwnerId) AS badId,
                 MIN(OwnerId) AS goodId
          FROM sb_Owners
          GROUP BY Owner, OwnerGroup, VO
          HAVING COUNT(*) > 1
          )
          UPDATE sb_SandBoxes AS s
          JOIN xxx ON s.OwnerId = xxx.badId
          SET s.OwnerId = xxx.goodId;
    If that's the case, you have 2 options:
    • update your server
    • replace the above with
    UPDATE sb_SandBoxes AS s
    JOIN (
        SELECT 
            MAX(OwnerId) AS badId,
            MIN(OwnerId) AS goodId
        FROM sb_Owners
        GROUP BY Owner, OwnerGroup, VO
        HAVING COUNT(*) > 1
    ) AS xxx
    ON s.OwnerId = xxx.badId
    SET s.OwnerId = xxx.goodId;
  • update Accounting DB: you will have several tables for which the name has to be altered. The following selection will print out the SQL command that you will need to issue to have things done (beware to replace "DIRAC-Certification" with the name of your setup).
SET group_concat_max_len=5000;SELECT group_concat(v.name separator '; ')
 FROM (
     SELECT concat('RENAME TABLE `', t.table_name, '` TO `', replace(t.table_name, '_DIRAC-Certification_', '_'), '`') name
     FROM information_schema.tables t
     WHERE table_name like '%_DIRAC-Certification_%'
 ) v;

(you might need to run the above more than once).

Only after the above is completed, you can issue:

DELETE FROM `ac_catalog_Types` where name LIKE 'DIRAC-Certification%'

(again, replace 'DIRAC-Certification%' with the name of your setup).

  • remove agent Framework/CAUpdateAgent
  • remove agent WorkloadManagement/CloudDirector
  • remove service WorkloadManagement/VirtualMachineManager
  • convert the Systems part of CS to "NoSetup" by running https://gist.github.com/atsareg/080682ed97f329e65c2458e99eca89e5 (or do by hand if you know what you are doing)
    • and add CS option /DIRAC/NoSetup = True for backward compatibility
  • convert the local cfg file to "NoSetup" (/opt/dirac/etc/dirac.cfg) (use the puppet update previously configured, if needed)
  • convert the Operations part of CS to "NoSetup" by running https://github.com/DIRACGrid/DIRAC/pull/7218/files (or do by hand if you know what you are doing)
  • Make sure that you can, and actually are synchronizing the CS to DiracX
  • install DIRAC v9 (usual procedure)
  • install service Monitoring/WebApp
  • if existing, remove the section Systems/WorkloadManagement/Databases/ElasticJobParametersDB
  • the OpenSearch indexes used for jobs parameters changed name (e.g. from "lhcb-production_elasticjobparameters_index_1014.0m" to "job_parameters_lhcb_1014m" -- this name is configurable, what is given is the standard naming). Update the name of the old indexes accordingly.
  • the OpenSearch indexes used for WMSHistory have an added "VO" field.

Restart phase

  • restart the running DIRAC components
  • start all stopped DIRAC components, services before agents

Checking phase

  • DIRAC:
  • DiracX:
    • dirac diracx whoami
  • DiracX-Web:
    • should be up
    • JobMonitoring app should be there

Any time after

  • the OpenSearch index names lost the Setup name. This means that index patterns (in OpenSearch, Grafana, etc.) would need to be updated to something like *wmshistory*

Later:

(optional) enable the remote pilot logger:

PR https://github.com/DIRACGrid/DIRAC/pull/6208 introduces a possibility to store pilot log files to remote storage. It is foreseen to use a plugin for this purpose. The PR contains a FileCacheLoggingPlugin which sends the logs to a SE.

  • Install WorkloadManagement/TornadoPilotLoggingHandler service
  • Install WorkloadManagement/PilotLoggingAgent agent

Configuration: Configuration is done in a VO by VO basis, in a VO-specific Pilot section in Operations. Defaults section can be used as usual to set up initial settings for all VOs.

  • Enable remote login in the Pilot section of a VO: RemoteLogging = True
  • Set the service URL: RemoteLoggerURL = https://dirac.host.name:8444/WorkloadManagement/TornadoPilotLogging
  • Set the upload SE, e.g.: UploadSE = UKI-LT2-IC-HEP-disk
  • Uploading is done by a Shifter called DataManager, so a shifter of this name should be defined in a shifter section of the VO
  • Set the upload path a VO can write to, e.g.: UploadPath = /gridpp/pilotlogs

The TornadoPilotLoggingHandler service requires a plugin name to be specified under Services/TornadoPilotLogging:

  • LoggingPlugin = FileCacheLoggingPlugin

Clone this wiki locally