Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,22 @@
# Changelog

## 44.4.0 [#1364](https://github.com/openfisca/openfisca-core/pull/1364)

#### New features

- **Entity links**: role-based and positional accessors, and dynamic population period-index helpers.
- `Many2OneLink.get_by_role(variable_name, period, role_value=...)`, `One2ManyLink.get_by_role(...)` and `ImplicitOne2ManyLink.get_by_role(...)`.
- `Many2OneLink.rank(variable_name, period)` (and on chained getter, e.g. `person.links["mother"].household.rank("age", period)`).
- `One2ManyLink.nth(n, variable_name, period, role=..., condition=...)` for the n-th target member per source.
- `has_role(role_value)` now supports `Role` objects (comparison by `.key`) in addition to raw values.
- `CorePopulation.snapshot_period(period)` and `get_period_id_to_rownum(period)` for optional dynamic-population period indexing.

#### Technical changes

- Removed unused `openfisca_core.model_api` import in `tests/core/parameters_date_indexing/test_date_indexing.py`.
- SimulationBuilder sets `_id_to_rownum` identity mapping for static simulations (`build_default_simulation`, `build_from_dict` / `build_from_entities`), for dynamic-population support.
- Add `PYTHON` variable to `tasks/lint.mk` so `make lint PYTHON=.venv/bin/python` works; fix style in `test_link_accessors.py` and remove unused variable in `test_many2one.py`.

## 44.3.0

#### New Features
Expand Down
68 changes: 68 additions & 0 deletions docs/implementation/links-api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Entity Links API

OpenFisca Core now includes a generic Entity Link system. Links allow variables computed on one entity to be queried and aggregated from another, or even within the same entity.

## Declaring Links

Links are declared on `Entity` objects, typically when building the `TaxBenefitSystem`.

### 1. Many-to-One Links
A `Many2OneLink` resolves many source members (e.g., persons) to one target entity (e.g., a household, an employer, or another person).

```python
from openfisca_core.links import Many2OneLink

# Example: Intra-entity link (person to mother)
# The `mother_id` variable must be defined on `person` and contain the ID of the mother.
mother_link = Many2OneLink(
name="mother",
link_field="mother_id",
target_entity_key="person",
)
person_entity.add_link(mother_link)

# Usage in a variable formula:
# persons.mother.get("age", period)
# or chained: persons.mother.household.get("rent", period)
```

### 2. One-to-Many Links
A `One2ManyLink` resolves one source entity to many target members. By default, OpenFisca implicitly creates a `One2ManyLink` for every GroupEntity pointing to its members (e.g., `household.persons`).

```python
from openfisca_core.links import One2ManyLink

# Example: Inter-entity link (employer to employees)
# The `employer_id` variable must be defined on `person` and contain the employer ID.
employees_link = One2ManyLink(
name="employees",
link_field="employer_id",
target_entity_key="person", # the target returned
)
employer_entity.add_link(employees_link)

# Usage in a variable formula:
# employers.employees.sum("salary", period)
```

## Using Links in Formulas

When a link is declared on a population, it is exposed as an attribute matching the link's `name`.

### Many2One Methods

* **`link.get(variable_name, period)`**: Returns the target variable values mapped to each source member. Unmapped members receive the default value of the variable.
* **Syntactic sugar**: `link(variable_name, period)` is equivalent to `link.get(variable_name, period)`.
* **Chaining**: `<source>.link1.link2` returns an intermediate chained getter, so `.link1.link2.get(variable, period)` fetches the target variable across two link jumps.

### One2Many Methods

All One2Many aggregation methods return an array sized to the **source** entity. They all take `(variable_name, period)` + optional keyword arguments `role` and `condition` to filter the targets before aggregation.

* `link.sum(...)`
* `link.count(...)`
* `link.any(...)`
* `link.all(...)`
* `link.min(...)`
* `link.max(...)`
* `link.avg(...)`
95 changes: 95 additions & 0 deletions docs/implementation/transition-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Transition Guide: Moving to the New Entity Links

With the release of the **Generic Entity Links** API, OpenFisca-core gains the ability to map complex, graph-like relational structures natively.

This guide explains the primary differences between the legacy `GroupEntity` + `Projectors` approach and the flexible, modern `Many2OneLink` and `One2ManyLink` models, and how you should think about migration.

---

## 1. Why Transition? The "Strict Hierarchy" Problem

Historically, OpenFisca rigidly structured populations into two classes: `SingleEntity` (Persons) and `GroupEntity` (Households, Families, Tax Units).

In this model, **every person must belong to exactly one entity of each group type.**
This handles standard socio-tax models efficiently, but prohibits features like:
- **Intra-entity (horizontal) relations**: Modeling a mother/child bond, marriages, or kinship networks. *Persons couldn't map to other Persons.*
- **Unbounded inter-entity relations**: Employment networks where one `company` controls multiple `persons`, or geographical relations (people living in specific arbitrary administrative districts).

**The Solution:** The new Entity Links system is purely arbitrary and structural. You can declare `Many2OneLink` (N source members to 1 target entity) or `One2ManyLink` (aggregating 1 target back to N source members) linking *any population type to any other population type.*

---

## 2. You don’t *have* to migrate existing simple groups.

**Backward Compatibility is 100% Guaranteed.**

If you have a traditional `GroupEntity` defined for households, those work exactly as they always have. In fact, OpenFisca now silently powers them using the new Linking engine gracefully:
- The legacy `person.household(...)` projector maps to a new automatically injected `ImplicitMany2OneLink`.
- The legacy `household.sum(person_salaries)` maps logically to `household.persons.sum()`.

No code change is required in any existing variable formulas!

---

## 3. From Projectors to Links: The New Syntax

If you previously dealt with `Projectors`, you may have found chaining difficult or buggy. The new system standardizes data lookup through `link.get()` and properties filtering.

### Before: Projectors
If you wanted the value of `rent` for the household of a person:
```python
# Projector syntax
rents = person.household("rent", period)
```

### After: Link Syntax
The same syntax continues to work (it actually calls `.get()` internally now on the implicitly generated link!), but you can explicitly specify `.get()`:
```python
# New link syntax
rents = person.household.get("rent", period)
```

**Where the new syntax shines:** Deep chaining.
You can now continuously resolve attributes down a deep relationship chain effortlessly:
```python
# Imagine a link: `person -> mother_person -> mother_household -> region`
chain = person.mother.household.get("region", period)
```

---

## 4. Transitioning Aggregations: `sum`, `count`, `min`, `max`

Previously, aggregating members relied rigidly on passing entire pre-computed arrays to a heavy `GroupPopulation.sum()` handler.

### Before: Legacy GroupPopulation
```python
# Fetch array of all persons in simulation
salaries = persons("salary", period)
# Pass to the group entity (e.g. household) to aggregate and collapse
total_household_incomes = households.sum(salaries, role=Household.PARENT)
```

### After: Declarative Links
```python
# The logic operates directly on the `One2ManyLink` bridging the two entities.
total_household_incomes = households.persons.sum("salary", period, role=Household.PARENT)
```
Notice how declarative and explicit this is. `persons` is the plural of `person`, which the new system automatically exposed as a `One2ManyLink` on your household.

### Conditional Aggregations
A newly-available feature explicitly unlocked by the Link system is masking by arbitrary properties! You are no longer restricted strictly to OpenFisca Roles:
```python
is_female = persons("is_female", period)
# Sum salaries, but only for members who are `is_female`
female_incomes = households.persons.sum("salary", period, condition=is_female)
```

---

## 5. Summary Checklist for Country Packages
- [ ] You **do not** need to rewrite `GroupEntity` logic for entities whose only purpose is traditional demographic grouping (like core households).
- [ ] You **can** start using `households.persons.sum()`, `households.persons.any()`, `households.persons.avg()` for highly readable aggregations in new variables.
- [ ] You **should** use `Many2OneLink` immediately if your simulation model attempts to relate `persons` to specific entities beyond openfisca-standard hierarchical groups (like a `mother_id` linking to another row in the `persons` dataframe).

Please see the full `links-api.md` file in this directory to see exactly how to declare explicit `Many2OneLink` models inside your `TaxBenefitSystem`.
40 changes: 39 additions & 1 deletion openfisca_core/links/implicit.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,19 @@ def _apply_filters(self, period, values, role, condition):

if role is not None:
roles = self._source_population.members_role
mask &= roles == role
# roles may be an object array of Role instances, so compare by key
if roles.dtype == object:
try:
keys = numpy.fromiter(
(getattr(x, "key", x) for x in roles),
dtype=object,
)
except Exception:
mask &= roles == role
else:
mask &= keys == role
else:
mask &= roles == role

if condition is not None:
mask &= condition
Expand All @@ -69,5 +81,31 @@ def _apply_filters(self, period, values, role, condition):
valid = source_rows >= 0
return source_rows[valid], values[valid]

# override to avoid relying on ``role_field`` which is meaningless for
# implicit links (the role information is stored on the source population)
def get_by_role(
self,
variable_name: str,
period,
role_value,
*,
condition: numpy.ndarray | None = None,
) -> numpy.ndarray:
"""Fetch value for a specific role value on a one-to-many implicit link.

This mirrors :meth:`One2ManyLink.get_by_role` but uses
``self._source_population.members_role`` instead of a named role field
on the target population.
"""
values = self._target_population.simulation.calculate(variable_name, period)
source_rows, values = self._apply_filters(period, values, role_value, condition)

result = numpy.zeros(self._source_population.count, dtype=values.dtype)
# last value wins (same semantics as GroupPopulation.value_from_person)
for tgt_idx, src in enumerate(source_rows):
if src >= 0:
result[src] = values[tgt_idx]
return result


__all__ = ["ImplicitMany2OneLink", "ImplicitOne2ManyLink"]
85 changes: 84 additions & 1 deletion openfisca_core/links/many2one.py
Original file line number Diff line number Diff line change
Expand Up @@ -129,13 +129,70 @@ def role(self) -> numpy.ndarray | None:
)

def has_role(self, role_value) -> numpy.ndarray:
"""Boolean mask: does each source member have the given role?"""
"""Boolean mask: does each source member have the given role?

The ``role`` array may contain raw values (ints, strings) or
``Role`` objects depending on how the population was built. When
``role_value`` is a string we compare against the ``key`` of each
element to make the API ergonomic for callers such as
``link.has_role("parent")`` or ``link.get_by_role(..., role_value="foo")``.
"""
r = self.role
if r is None:
msg = f"Link '{self.name}' has no role_field"
raise ValueError(msg)

# if array holds object references, convert to their keys
if r.dtype == object:
try:
keys = numpy.fromiter(
(getattr(x, "key", x) for x in r),
dtype=object,
)
except Exception:
# fallback to direct comparison
return r == role_value
return keys == role_value

# numpy will perform elementwise comparison for numeric or string
return r == role_value

# -- role-based access --------------------------------------------------

def get_by_role(
self,
variable_name: str,
period,
*,
role_value,
) -> numpy.ndarray:
"""Fetch a variable on the target only for members with a given role.

Parameters
----------
variable_name : str
Name of the variable defined on the target entity.
period : Period
Period for which to calculate the variable.
role_value : object
The role to filter on (e.g. ``"parent"``).

Returns
-------
numpy.ndarray
Array of shape ``(n_source,)`` where only members whose
``has_role(role_value)`` return ``True`` keep their computed
value; all others receive the variable's default (usually 0).
"""
mask = self.has_role(role_value)
result = self.get(variable_name, period)
# zero out non-matching rows using dtype-preserving fill
if not mask.all():
# create a copy to avoid mutating cached results
result = result.copy()
result[~mask] = 0
return result

# -- ID resolution ------------------------------------------------------

def _get_target_ids(self, period) -> numpy.ndarray:
Expand Down Expand Up @@ -169,6 +226,28 @@ def _resolve_ids(self, target_ids: numpy.ndarray) -> numpy.ndarray:

return rows

# -- ranking -----------------------------------------------------------

def rank(self, variable_name: str, period) -> numpy.ndarray:
"""Rank each source member within its group by a variable value.

The rank is computed among all members sharing the same target
entity, sorted by the value of ``variable_name`` evaluated on the
*source* population. The lowest value receives rank ``0``.

This is essentially a thin wrapper around
:meth:`~openfisca_core.populations.Population.get_rank`:

>>> person = simulation.persons
>>> person.links['household'].rank('age', period)
array([...])
"""
source_pop = self._source_population
# criteria on source population
criteria = source_pop.simulation.calculate(variable_name, period)
# let Population.get_rank handle grouping and sorting
return source_pop.get_rank(self, criteria)


# ---------------------------------------------------------------------------
# Chained link getter
Expand Down Expand Up @@ -223,5 +302,9 @@ def __getattr__(self, name: str):
target_entity = target_pop.entity
raise AttributeError(f"Entity '{target_entity.key}' has no link named '{name}'")

def rank(self, variable_name: str, period) -> numpy.ndarray:
# forward to outer link so that chaining keeps semantics
return self._outer.rank(variable_name, period)


__all__ = ["Many2OneLink"]
Loading
Loading