Skip to content

Conversation

@c4v4
Copy link
Contributor

@c4v4 c4v4 commented Mar 28, 2025

Initial Implementation of the CFT Heuristic for Set Covering

This PR introduces the first steps toward implementing the Caprara, Fischetti, and Toth heuristic (CFT) for the Set Covering problem. This is an evolution and porting of another existing implementation of the same algorithm.

This is still a work in progress. The goal of this PR is to create a shared space where we can discuss the development and refine the approach together (as we agreed with the OR-Tools developers working on the Set Cover module).

Current State

The current implementation focuses on the 3-Phase algorithm described in the paper, excluding the Refinement phase for now. Since the Refinement phase repeatedly calls the 3-Phase as a subroutine, it can be added later if needed.

Subgradient

The subgradient method (currently sequential) is implemented with flexibility in mind, it uses the SubgradientCBs interface to define the key customization points.
The main reason for this approach is that the algorithm uses two types of subgradient methods:

  1. One to improve the current dual bound (Subgradient Phase in the paper).
  2. One to generate multipliers for the Greedy algorithm (Heuristic Phase in the paper).

Thus, to avoid code duplication, the parts that differ between these two are isolated in callbacks.

This design also simplifies experimentation with stabilization techniques, especially for improving the dual-bound phase. But, more in general, it should help simplify future work on subgradient-based algorithms for the Set Covering.

Greedy

The Lagrangian multiplier-based Greedy algorithm is implemented, including its column-scoring method based on the median-finding algorithm, as described in the paper.

One missing (small) part is the enumeration step, which the paper suggests using when there are few redundant columns to remove (fewer than 10). In our previous implementation, we noticed that this step added complexity without much benefit, so I left it out for now. It can be added later if needed.

Core Model

The paper describes a core model technique that improves performance by around an order of magnitude. The idea is to focus on a smaller set of high-quality columns, determined through a periodic pricing procedure during the subgradient phase.

This part is already implemented, but only in a temporary form. It will need to be properly integrated with the "sub-model system", which is still to be developed.

Sub-Model

This is the main missing piece. The CFT heuristic works by gradually fixing columns that are likely to be in the optimal solution. This reduces the problem size and shifts the focus to the remaining uncovered elements and their covering columns.

From past experience, this part needs to be designed carefully since a poor design could make future extensions and maintenance tricky. I'm currently discussing with one of the developers to understand possible designs and which one to pick.

Future Possibilities

One potential long-term improvement would be replacing the "static model" with an online column generation approach. The current structure already allows for this: instead of selecting columns, the core model pricing could be replaced with an actual column generation step.

This is beyond the current scope, but I’m keeping it in mind to ensure the implementation remains flexible for such an extension in the future (since this seems to be the most sensible design for many contexts where generating a large enough set of columns is not feasible).


Let me know what you think! Any feedback is welcome :)

@c4v4 c4v4 changed the title Main Initial Implementation of the CFT Heuristic for Set Covering Mar 28, 2025
@c4v4 c4v4 changed the title Initial Implementation of the CFT Heuristic for Set Covering CFT Heuristic for Set Covering Mar 28, 2025
@c4v4
Copy link
Contributor Author

c4v4 commented Apr 5, 2025

As discussed privately, I’ve completed a prototype based on views over the original model to track column fixings and focus on a core model (i.e., a subset of columns).

An alternative approach using model copies with only active rows/columns would probably be more computationally efficient, since it would work with small vectors instead of iterating the full list, skipping inactive items.
However, it's not compatible with the current context when memory is a constraint. Since fixings are incremental, early iterations would require handling nearly a full copy of the original instance, roughly doubling the memory usage (at peak).

The current view-based system is in a minimal but working state and needs to be improved and hardened.
With this, the sequential 3-Phase prototype is complete, covering about 90% of the CFT logic.

Next steps:

  • Test, polish, and stabilize the current implementation. Multipliers can be erratic in edge cases, so we need to identify and handle those.
  • Experiment with the model-copy approach for the core model only. Since the core is much smaller than the original, this might not increase memory usage significantly, and could even reduce it compared to the view-based version.

@Mizux Mizux added Solver: Set Cover Solver in set_cover/ Feature Request Missing Feature/Wrapper labels Apr 8, 2025
@c4v4
Copy link
Contributor Author

c4v4 commented Apr 9, 2025

Current State Update

The 3Phase algorithm is still in a prototype stage, but it's now working reasonably efficiently (still sequential for now).

Two representations for the core model are available:

  • SubModelView: a view-based version, storing only the focused item lists and "is-focused" vectors.
  • CoreModel: an explicit SetCoverModel, wrapped with the necessary components to keep it up to date.

Special care was taken to ensure the search trajectory remains consistent regardless of the chosen representation.

Preliminary testing on rails instances shows that the CoreModel is about 2x faster, with memory usage roughly the same (within a 1% delta, sometimes slightly better, sometimes worse) compared to the SubModelView.

The view system has been refined. That said, if we eventually settle on a specific strategy for column fixing and core model pricing, the abstractions in set_cover_views.h could be replaced with manual filtering of focused items (similar to what one would do in a lower-level language without zero-cost abstractions). For now, they’re helpful for easily switching between design ideas at a high level without having to tweak every implementation detail.

Next steps:

  • Continue with cleanup, organization, and adding comments.
  • Start implementing tests (I'll need help integrating them with OR-Tools' testing system).
  • Begin experimenting with basic subgradient improvements and stabilization techniques.

@c4v4 c4v4 marked this pull request as ready for review April 9, 2025 13:15
@c4v4
Copy link
Contributor Author

c4v4 commented Apr 14, 2025

I’ve applied most of the suggestions from the code review (thanks again for the thorough feedback!).

A few quick notes:

SubModelView

At some point (possibly even now), it might be worth considering dropping support for some of the features currently implemented, mainly the SubModelView class and the specific views used only within it. Removing those could significantly reduce boilerplate without much loss in functionality (note that SubModelView is a less performant alternative to CoreModel).

Composable Views

Another area that could benefit from some cleanup is the handling of strongly typed indices in the full model, particularly in how the views interact with them. While things work as they are, the abstraction is a bit leaky and requires some extra handling inside FullToCoreModel. One possible improvement would be to generalize the current views to support composition, essentially replacing the current absl::Span usage with a templated view type (with absl::Span as the identity-view base case). That said, it would push us closer to template proliferation, which might not be a great fit for the codebase and probably isn’t necessary right now.

SetCoverInvariant in Multiplier-Based Greedy

For now, I’ve avoided using SetCoverInvariant inside the CFT greedy algorithm, mainly to avoid potential overhead from unneeded calculations. But I plan to take a closer look, rewriting the greedy logic around SetCoverInvariant could simplify the code and reduce duplication.

@dourouc05 dourouc05 merged commit 311151a into google:main Apr 15, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Feature Request Missing Feature/Wrapper Solver: Set Cover Solver in set_cover/

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants