Ideally I would like to avoid making a new macro specifically for setid'd UCPath tables.
(A general feature approach)
A feature could be added such that a where condition or conditions could be declared in the source.yml for each table and then surfaced in the generated model.
- name: ps_jobcode_tbl
description: jobcode validation table
meta:
effective_dated: true
partition_columns: ["jobcode", "setid"]
final_where_condition: ["setid in ('UCSHR')"]
The declared final_where_condition would surface in the final select statement of the staging model. For example,
-- Dummy Line
/* The partition columns are: ['jobcode', 'setid']
*/
with
source as (
select * from {{ source( 'ucp', 'ps_jobcode_tbl' ) }}
where
_fivetran_deleted != true
and dml_ind != 'D'
),
valid_to as (
-- Group the records so that each group belonging to one effdt gets the same valid_to date
select
jobcode,
setid,
effdt,
coalesce((lag(to_date(effdt), 1) over (
partition by
jobcode,
setid
order by effdt desc
) - 1), '2099-12-31') as valid_to
from source
group by
1,
2,
3
),
final as (
select
source._fivetran_deleted,
source._fivetran_id,
source._fivetran_synced,
source.avail_telework,
source.barg_unit,
source.can_noc_cd,
source.comp_frequency,
source.company,
source.cr_bt_dtm,
source.cr_bt_nbr,
source.currency_cd,
source.descr,
source.descrshort,
source.directly_tipped,
source.dml_ind,
source.eeo1code,
source.eeo4code,
source.eeo5code,
source.eeo6code,
source.eeo_job_group,
source.eff_status,
source.effdt,
source.eg_academic_rank,
source.eg_group,
source.encumb_sal_amt,
source.encumb_sal_optn,
source.encumber_indc,
source.flsa_status,
source.function_cd,
source.grade,
source.ipedsscode,
source.job_family,
source.job_function,
source.job_sub_func,
source.jobcode,
source.key_jobcode,
source.last_update_date,
source.lastupddttm,
source.lastupdoprid,
source.manager_level,
source.med_chkup_req,
source.ods_vrsn_nbr,
source.posn_mgmt_indc,
source.reg_region,
source.reg_temp,
source.retro_percent,
source.retro_rate,
source.sal_admin_plan,
source.sal_range_currency,
source.sal_range_freq,
source.sal_range_max_rate,
source.sal_range_mid_rate,
source.sal_range_min_rate,
source.setid,
source.setid_salary,
source.std_hours,
source.std_hrs_frequency,
source.step,
source.survey_job_code,
source.survey_salary,
source.trn_program,
source.union_cd,
source.upd_bt_dtm,
source.upd_bt_nbr,
source.us_occ_cd,
source.us_soc_cd,
source.workers_comp_cd,
-- New Objects
to_date(source.effdt) as valid_from,
valid_to.valid_to as valid_to,
case
when valid_from > {{ var("current_date_pst") }} then 'future'
when valid_to.valid_to < {{ var("current_date_pst") }} then 'past'
when valid_to.valid_to >= {{ var("current_date_pst") }} then 'current'
end as current_record_desc,
case
when current_record_desc = 'current' then true
when current_record_desc in ('future', 'past') then false
end as is_current_record
from source
left outer join valid_to
on source.jobcode = valid_to.jobcode
and source.setid = valid_to.setid
and source.effdt = valid_to.effdt
)
select * from final
where -- <-- THIS WHERE CONDITION WOULD BE VARIABLE AND ONLY POPULATE WHEN THE final_where_condition KEY IS POPULATED
setid in ('UCHSR')
(A UCPath setid specific approach)
If we choose to make a setid specific UCPath macro, then the meta information declared can be more specific (naming wise).
- name: ps_jobcode_tbl
description: jobcode validation table
meta:
effective_dated: true
partition_columns: ["jobcode", "setid"]
ucsc_setid: ["UCSHR"]
We could also have more freedom in where the condition is set. For example we can apply the condition at the source CTE,
-- Dummy Line
/* The partition columns are: ['jobcode', 'setid']
*/
with
source as (
select * from {{ source( 'ucp', 'ps_jobcode_tbl' ) }}
where
_fivetran_deleted != true
and dml_ind != 'D'
and setid in ('UCSHR') -- <-- THIS LINE WOULD NEED TO BE HARCODED IN THE MACRO. THE IN LIST VALUES WOULD BE THE ONLY VARIABLE PORTION.
),
valid_to as (
-- Group the records so that each group belonging to one effdt gets the same valid_to date
select
jobcode,
setid,
effdt,
coalesce((lag(to_date(effdt), 1) over (
partition by
jobcode,
setid
order by effdt desc
) - 1), '2099-12-31') as valid_to
from source
group by
1,
2,
3
),
final as (
select
source._fivetran_deleted,
source._fivetran_id,
source._fivetran_synced,
source.avail_telework,
source.barg_unit,
source.can_noc_cd,
source.comp_frequency,
source.company,
source.cr_bt_dtm,
source.cr_bt_nbr,
source.currency_cd,
source.descr,
source.descrshort,
source.directly_tipped,
source.dml_ind,
source.eeo1code,
source.eeo4code,
source.eeo5code,
source.eeo6code,
source.eeo_job_group,
source.eff_status,
source.effdt,
source.eg_academic_rank,
source.eg_group,
source.encumb_sal_amt,
source.encumb_sal_optn,
source.encumber_indc,
source.flsa_status,
source.function_cd,
source.grade,
source.ipedsscode,
source.job_family,
source.job_function,
source.job_sub_func,
source.jobcode,
source.key_jobcode,
source.last_update_date,
source.lastupddttm,
source.lastupdoprid,
source.manager_level,
source.med_chkup_req,
source.ods_vrsn_nbr,
source.posn_mgmt_indc,
source.reg_region,
source.reg_temp,
source.retro_percent,
source.retro_rate,
source.sal_admin_plan,
source.sal_range_currency,
source.sal_range_freq,
source.sal_range_max_rate,
source.sal_range_mid_rate,
source.sal_range_min_rate,
source.setid,
source.setid_salary,
source.std_hours,
source.std_hrs_frequency,
source.step,
source.survey_job_code,
source.survey_salary,
source.trn_program,
source.union_cd,
source.upd_bt_dtm,
source.upd_bt_nbr,
source.us_occ_cd,
source.us_soc_cd,
source.workers_comp_cd,
-- New Objects
to_date(source.effdt) as valid_from,
valid_to.valid_to as valid_to,
case
when valid_from > {{ var("current_date_pst") }} then 'future'
when valid_to.valid_to < {{ var("current_date_pst") }} then 'past'
when valid_to.valid_to >= {{ var("current_date_pst") }} then 'current'
end as current_record_desc,
case
when current_record_desc = 'current' then true
when current_record_desc in ('future', 'past') then false
end as is_current_record
from source
left outer join valid_to
on source.jobcode = valid_to.jobcode
and source.setid = valid_to.setid
and source.effdt = valid_to.effdt
)
select * from final
The reason final_where_condition is restricted to the final select statement is so devs have more freedom in what kind of conditions they can specify. Since it is going to be added to the final select statement then can apply conditions on fields that were added during the staging process.
I prefer the final_where_condition method because this can be used in other models and not just for UCPath setid tables.
For example for our effseq'd models we could final_where_condition = ["setid in ('UCHSR ')", "is_max_effseq_of_effdt = true"].
.
.
.
select * from final
where
setid in ('UCHSR')
and is_max_effseq_of_effdt = true
Ideally I would like to avoid making a new macro specifically for setid'd UCPath tables.
(A general feature approach)
A feature could be added such that a where condition or conditions could be declared in the source.yml for each table and then surfaced in the generated model.
The declared
final_where_conditionwould surface in the final select statement of the staging model. For example,(A UCPath setid specific approach)
If we choose to make a setid specific UCPath macro, then the meta information declared can be more specific (naming wise).
We could also have more freedom in where the condition is set. For example we can apply the condition at the source CTE,
The reason
final_where_conditionis restricted to the final select statement is so devs have more freedom in what kind of conditions they can specify. Since it is going to be added to the final select statement then can apply conditions on fields that were added during the staging process.I prefer the
final_where_conditionmethod because this can be used in other models and not just for UCPath setid tables.For example for our effseq'd models we could
final_where_condition = ["setid in ('UCHSR ')", "is_max_effseq_of_effdt = true"].