-
Notifications
You must be signed in to change notification settings - Fork 184
DIRAC 9.0
DIRAC v9 is the first version of DIRAC that needs to be deployed together with DiracX v0.0.1. This wiki details the changes needed to deploy both. This document focus on updating from DIRAC v8 to DIRACv9+DiracX. New DIRAC users can skip the database updates and several other details.
-
The concept of "Setup" is disappearing. The concept (which is explained in https://dirac.readthedocs.io/en/latest/AdministratorGuide/Introduction/diraccomponents.html) has a long history and effectively enabled the possibility to use one single machine as a server for multiple setups/installations (e.g. production and testing setups on the same node). This possibility is recognized now as being not useful. Therefore, this functionality is being removed.
-
DISET -> JSON encoding. The default encoding becomes
JSON: PR https://github.com/DIRACGrid/DIRAC/pull/6466 removes environment variable DIRAC_USE_JSON_DECODE and flips the default of DIRAC_USE_JSON_ENCODE. Most probably nothing has to be done, unless you've overwritten these values in the past. You may end up having issues in particular if you have dictionaries indexed by integers, which are converted into strings. To help you test your extensions,DIRAC_DEBUG_DENCODE_CALLSTACKwill print out any such situation. Note that this already works inv8 -
1024-bit proxies no longer work.
-
PR https://github.com/DIRACGrid/DIRAC/pull/7439 removes the possibility to use external StorageElements as SandboxStore. Most probably you have nothing to do.
What follows is a skeleton of migration. We suggest to copy/paste/edit what is below, adapting it to your installation. It contains few sections, starting from "PRE".
-
Install the latest DIRAC v8
-
The
SecurityLoggingService is not anymore used by default: follow one of these 2 options:- If you use the centralized logging together with message queue and
logstash, follow the instructions above to configurelogstash - If you do not want to use
centralized logging, set the CS flag/Operations/[vo]/EnableSecurityLogging = True(this isFalseby default)
- If you use the centralized logging together with message queue and
-
JobParameters need to be stored in OpenSearch. See last bullet of this documentation
-
If you have not done it before, you'll need to install and use RSS
- No need to go through the "Advanced Configuration"
- to make data management operations working, the different StorageElements must be set to
StatusActivein the RSS
-
replace ARC and ARC6 with AREX by following these instructions:
- From a DIRAC client, execute the following command to get the port used by the CE to provide the AREX interface:
$ arcinfo -c <ce name>` Computing service: arc (production)` `Information endpoint: ldap://<ce name>:2135/Mds-Vo-Name=local,o=grid (org.nordugrid.ldapng)` Information endpoint: ldap://<ce name>:2135/o=glue (org.nordugrid.ldapglue2) Information endpoint: https://<ce name>:443/arex (org.nordugrid.arcrest) <------------ get the port used here Information endpoint: https://<ce name>:443/arex (org.ogf.glue.emies.resourceinfo) Submission endpoint: https://<ce name>:443/arex (status: ok, interface: org.nordugrid.arcrest) Submission endpoint: https://<ce name>:443/arex (status: ok, interface: org.ogf.glue.emies.activitycreation) Submission endpoint: gsiftp://<ce name>:2811/jobs (status: ok, interface: org.nordugrid.gridftpjob)
- Simply replace
CEType=ARCorARC6withCEType=AREXin the configuration - If the port is not
443(the default one), then you also need to add the following option to your configuration:
Port = <AREX port>
-
if your
.cfgfiles (e.g.dirac.cfg) are managed by puppet (or something else) prepare an update for removing theSetupandinstanceName -
if you have a DIRAC extension, update it considering the many changes. Special note:
Setupdisappeared from many places. -
if you have a DIRAC extension you might need to code an empty
vodiracxextension, depending on what your extension does. For examples:- First, check the gubbins extension
- Check lhcbdiracx as "real" example
-
if you have a WebAppDIRAC extension, code an empty
vodiracx-webextension. For examples:- First, check the gubbins extension
- lhcbdiracx-web as "real" example
-
Satisfy the minimal requirements for running DiracX
-
Add in CS a
DiracXsection. Inside, for the moment add only the optionDisabledVOs. The value of this option is a list of all the Virtual Organizations that your DIRAC installation supports. This list will be reduced and possibly completely removed after the DiracX full installation (more info here) -
deploy (vo)DiracX(-web) by following the instructions
-
You can add the Pilot command
RegisterPilotin the list of commands run by the Pilot (this is already the default, and it only applies if you were previously running with a non-default list of commands) -
make sure that the Hosts running TransformationSystem Agents have the
ProductionManagementproperty. The same is true for a shifterProxy and groups that are used for Transformations -
If you have been previously using the "Elastic Job Parameters DB" (see v8 doc here, by setting
/Operations/<Defaults|setup>/Services/JobMonitoring/useESForJobParametersFlag=True) then copy the sectionSystems/<instance>/WorkloadManagement/Databases/ElasticJobParametersDBtoSystems/<instance>/WorkloadManagement/Databases/JobParametersDB.
- Install the latest DIRAC v8
- partial drain of the system (can't fully drain) by stopping the
Transformation/WorkflowTaskagents
- stop
Transformation/RequestTaskagents - stop
Transformation/Transformationagents - stop
RequestManagement/RequestExecutingagent
- stop all DIRAC components (agents, services, executors) with the exception of:
-
Configuration/ServerServices -
Framework/SystemAdministrator(of these, there will be one per server)
-
- update DBs with the following:
GRANT CREATE TEMPORARY TABLES ON *.* TO 'Dirac'@'%';
use JobDB;
ALTER TABLE `Jobs` ADD COLUMN `VO` VARCHAR(64);
use PilotAgentsDB;
ALTER TABLE `PilotAgents` ADD COLUMN `VO` VARCHAR(64);
use TaskQueueDB;
ALTER TABLE `tq_TaskQueues` ADD COLUMN `Owner` VARCHAR(255) NOT NULL;
ALTER TABLE `tq_TaskQueues` ADD COLUMN VO VARCHAR(64);
use SandboxMetadataDB;
ALTER TABLE `sb_Owners` ADD COLUMN `VO` VARCHAR(64);
use TransformationDB;
ALTER TABLE `Transformations` ADD COLUMN `Author` VARCHAR(255) NOT NULL;
ALTER TABLE `Transformations` MODIFY COLUMN `AuthorDN` VARCHAR(255) DEFAULT NULL;
use ReqDB;
ALTER TABLE `Request` ADD COLUMN `Owner` VARCHAR(255) NOT NULL;- Save the following script for adding "VO" info to few DBs, in a (whatever, e.g. in /opt/dirac) directory of a DIRAC server machine, then:
- if you do not use the TransformationSystem nor the ProductionSystem, run it with:
while if you use one or both of them (adjust the flags) run it with:python script_name.py -o /DIRAC/Security/UseServerCertificate=yespython script_name.py --transformation --production -o /DIRAC/Security/UseServerCertificate=yes - update DBs with following SQL statements: https://gist.github.com/fstagni/d977b4f3ebe5432ee7bb2743145dc837
- NOTE if you are running an older versions of MySQL or MariaDB, the following might fail:
If that's the case, you have 2 options:WITH xxx AS ( SELECT MAX(OwnerId) AS badId, MIN(OwnerId) AS goodId FROM sb_Owners GROUP BY Owner, OwnerGroup, VO HAVING COUNT(*) > 1 ) UPDATE sb_SandBoxes AS s JOIN xxx ON s.OwnerId = xxx.badId SET s.OwnerId = xxx.goodId;
- update your server
- replace the above with
UPDATE sb_SandBoxes AS s JOIN ( SELECT MAX(OwnerId) AS badId, MIN(OwnerId) AS goodId FROM sb_Owners GROUP BY Owner, OwnerGroup, VO HAVING COUNT(*) > 1 ) AS xxx ON s.OwnerId = xxx.badId SET s.OwnerId = xxx.goodId;
- update Accounting DB: you will have several tables for which the name has to be altered. The following selection will print out the SQL command that you will need to issue to have things done (beware to replace "DIRAC-Certification" with the name of your setup).
SET group_concat_max_len=5000;SELECT group_concat(v.name separator '; ')
FROM (
SELECT concat('RENAME TABLE `', t.table_name, '` TO `', replace(t.table_name, '_DIRAC-Certification_', '_'), '`') name
FROM information_schema.tables t
WHERE table_name like '%_DIRAC-Certification_%'
) v;(you might need to run the above more than once).
Only after the above is completed, you can issue:
DELETE FROM `ac_catalog_Types` where name LIKE 'DIRAC-Certification%'(again, replace 'DIRAC-Certification%' with the name of your setup).
- remove agent Framework/CAUpdateAgent
- remove agent WorkloadManagement/CloudDirector
- remove service WorkloadManagement/VirtualMachineManager
- convert the Systems part of CS to "NoSetup" by running https://gist.github.com/atsareg/080682ed97f329e65c2458e99eca89e5 (or do by hand if you know what you are doing)
- and add CS option
/DIRAC/NoSetup = Truefor backward compatibility
- and add CS option
- convert the local cfg file to "NoSetup" (
/opt/dirac/etc/dirac.cfg) (use the puppet update previously configured, if needed) - convert the Operations part of CS to "NoSetup" by running https://github.com/DIRACGrid/DIRAC/pull/7218/files (or do by hand if you know what you are doing)
- Make sure that you can, and actually are synchronizing the CS to DiracX
- install DIRAC v9 (usual procedure)
- install service Monitoring/WebApp
- if existing, remove the section
Systems/WorkloadManagement/Databases/ElasticJobParametersDB - the OpenSearch indexes used for jobs parameters changed name (e.g. from "lhcb-production_elasticjobparameters_index_1014.0m" to "job_parameters_lhcb_1014m" -- this name is configurable, what is given is the standard naming). Update the name of the old indexes accordingly.
- the OpenSearch indexes used for WMSHistory have an added "VO" field.
- restart the running DIRAC components
- start all stopped DIRAC components, services before agents
- DIRAC:
- DiracX:
dirac diracx whoami
- DiracX-Web:
- should be up
- JobMonitoring app should be there
- the OpenSearch index names lost the
Setupname. This means that index patterns (in OpenSearch, Grafana, etc.) would need to be updated to something like*wmshistory*
PR https://github.com/DIRACGrid/DIRAC/pull/6208 introduces a possibility to store pilot log files to remote storage. It is foreseen to use a plugin for this purpose. The PR contains a FileCacheLoggingPlugin which sends the logs to a SE.
- Install
WorkloadManagement/TornadoPilotLoggingHandlerservice - Install
WorkloadManagement/PilotLoggingAgentagent
Configuration:
Configuration is done in a VO by VO basis, in a VO-specific Pilot section in Operations. Defaults section can be used as usual to set up initial settings for all VOs.
- Enable remote login in the Pilot section of a VO:
RemoteLogging = True - Set the service URL:
RemoteLoggerURL = https://dirac.host.name:8444/WorkloadManagement/TornadoPilotLogging - Set the upload SE, e.g.:
UploadSE = UKI-LT2-IC-HEP-disk - Uploading is done by a Shifter called
DataManager, so a shifter of this name should be defined in a shifter section of the VO - Set the upload path a VO can write to, e.g.:
UploadPath = /gridpp/pilotlogs
The TornadoPilotLoggingHandler service requires a plugin name to be specified under Services/TornadoPilotLogging:
LoggingPlugin = FileCacheLoggingPlugin