Skip to content

handling project types and category mapping when data isn't in berkeley's database #106

@andersy005

Description

@andersy005

our current project type and category mapping rely on data from Berkeley's database. however, some registries (e.g., art-trees) aren't part of the database. as a result, the project categories are mapped to unknown

In [14]: df[(df.category == 'unknown')&(df.type != 'unknown')][['project_id', 'protocol', 'category', 'type', 'type_source']]
Out[14]: 
     project_id          protocol category           type type_source
3       VCS5460          [vm0047]  unknown  reforestation  carbonplan
5       VCS5462          [vm0047]  unknown  reforestation  carbonplan
9       VCS5446          [vm0047]  unknown  reforestation  carbonplan
15      VCS5433          [vm0047]  unknown  reforestation  carbonplan
19      VCS5417          [vm0047]  unknown  reforestation  carbonplan
21      VCS5412          [vm0042]  unknown    agriculture  carbonplan
26      VCS5404          [vm0047]  unknown  reforestation  carbonplan
28      VCS5403          [vm0047]  unknown  reforestation  carbonplan
29      VCS5401          [vm0047]  unknown  reforestation  carbonplan
31      VCS5399          [vm0047]  unknown  reforestation  carbonplan
32      VCS5398          [vm0047]  unknown  reforestation  carbonplan
37      VCS5389          [vm0047]  unknown  reforestation  carbonplan
39      VCS5392          [vm0042]  unknown    agriculture  carbonplan
41      VCS5387          [vm0047]  unknown  reforestation  carbonplan
42      VCS5386          [vm0047]  unknown  reforestation  carbonplan
46      VCS5382          [vm0047]  unknown  reforestation  carbonplan
48      VCS5376          [vm0042]  unknown    agriculture  carbonplan
76      VCS5333          [vm0047]  unknown  reforestation  carbonplan
78      VCS5324          [vm0047]  unknown  reforestation  carbonplan
84      VCS5316          [vm0047]  unknown  reforestation  carbonplan
92      VCS5304          [vm0042]  unknown    agriculture  carbonplan
95      VCS5299          [vm0042]  unknown    agriculture  carbonplan
107     VCS5282  [vm0047, vm0048]  unknown  reforestation  carbonplan
138     VCS5227          [vm0042]  unknown    agriculture  carbonplan
156     VCS5197          [vm0047]  unknown  reforestation  carbonplan
160     VCS5191          [vm0047]  unknown  reforestation  carbonplan
188     VCS5148          [vm0047]  unknown  reforestation  carbonplan
269     VCS5043          [vm0042]  unknown    agriculture  carbonplan
282     VCS5027          [vm0047]  unknown  reforestation  carbonplan
1628    VCS3354          [vm0042]  unknown    agriculture  carbonplan

the majority of the types above are from our infer_project_type function

@pf.register_dataframe_method
def infer_project_type(df: pd.DataFrame) -> pd.DataFrame:
    """
    Add project types to the DataFrame based on project characteristics
    Parameters
    ----------
    df : pd.DataFrame
        Input DataFrame containing project data.
    Returns
    -------
    pd.DataFrame
        DataFrame with a new 'type' column, indicating the project's type. Defaults to None
    """
    df.loc[:, 'type'] = 'unknown'
    df.loc[:, 'type_source'] = 'carbonplan'
    df.loc[df.apply(lambda x: 'art-trees' in x['protocol'], axis=1), 'type'] = 'redd+'
    df.loc[df.apply(lambda x: 'acr-non-fed' in x['protocol'], axis=1), 'type'] = (
        'improved forest management'
    )
    df.loc[df.apply(lambda x: 'vm0047' in x['protocol'], axis=1), 'type'] = 'reforestation'
    df.loc[df.apply(lambda x: 'vm0045' in x['protocol'], axis=1), 'type'] = (
        'improved forest management'
    )
    df.loc[df.apply(lambda x: 'vm0042' in x['protocol'], axis=1), 'type'] = 'agriculture'
    return df

how should we handle these cases?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions