-
Notifications
You must be signed in to change notification settings - Fork 2.3k
CFT Heuristic for Set Covering #4607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ff with prev commit)
|
As discussed privately, I’ve completed a prototype based on views over the original model to track column fixings and focus on a core model (i.e., a subset of columns). An alternative approach using model copies with only active rows/columns would probably be more computationally efficient, since it would work with small vectors instead of iterating the full list, skipping inactive items. The current view-based system is in a minimal but working state and needs to be improved and hardened. Next steps:
|
Current State UpdateThe 3Phase algorithm is still in a prototype stage, but it's now working reasonably efficiently (still sequential for now). Two representations for the core model are available:
Special care was taken to ensure the search trajectory remains consistent regardless of the chosen representation. Preliminary testing on The view system has been refined. That said, if we eventually settle on a specific strategy for column fixing and core model pricing, the abstractions in Next steps:
|
|
I’ve applied most of the suggestions from the code review (thanks again for the thorough feedback!). A few quick notes:
|
Initial Implementation of the CFT Heuristic for Set Covering
This PR introduces the first steps toward implementing the Caprara, Fischetti, and Toth heuristic (CFT) for the Set Covering problem. This is an evolution and porting of another existing implementation of the same algorithm.
This is still a work in progress. The goal of this PR is to create a shared space where we can discuss the development and refine the approach together (as we agreed with the OR-Tools developers working on the Set Cover module).
Current State
The current implementation focuses on the 3-Phase algorithm described in the paper, excluding the Refinement phase for now. Since the Refinement phase repeatedly calls the 3-Phase as a subroutine, it can be added later if needed.
Subgradient
The subgradient method (currently sequential) is implemented with flexibility in mind, it uses the
SubgradientCBsinterface to define the key customization points.The main reason for this approach is that the algorithm uses two types of subgradient methods:
Thus, to avoid code duplication, the parts that differ between these two are isolated in callbacks.
This design also simplifies experimentation with stabilization techniques, especially for improving the dual-bound phase. But, more in general, it should help simplify future work on subgradient-based algorithms for the Set Covering.
Greedy
The Lagrangian multiplier-based Greedy algorithm is implemented, including its column-scoring method based on the median-finding algorithm, as described in the paper.
One missing (small) part is the enumeration step, which the paper suggests using when there are few redundant columns to remove (fewer than 10). In our previous implementation, we noticed that this step added complexity without much benefit, so I left it out for now. It can be added later if needed.
Core Model
The paper describes a core model technique that improves performance by around an order of magnitude. The idea is to focus on a smaller set of high-quality columns, determined through a periodic pricing procedure during the subgradient phase.
This part is already implemented, but only in a temporary form. It will need to be properly integrated with the "sub-model system", which is still to be developed.
Sub-Model
This is the main missing piece. The CFT heuristic works by gradually fixing columns that are likely to be in the optimal solution. This reduces the problem size and shifts the focus to the remaining uncovered elements and their covering columns.
From past experience, this part needs to be designed carefully since a poor design could make future extensions and maintenance tricky. I'm currently discussing with one of the developers to understand possible designs and which one to pick.
Future Possibilities
One potential long-term improvement would be replacing the "static model" with an online column generation approach. The current structure already allows for this: instead of selecting columns, the core model pricing could be replaced with an actual column generation step.
This is beyond the current scope, but I’m keeping it in mind to ensure the implementation remains flexible for such an extension in the future (since this seems to be the most sensible design for many contexts where generating a large enough set of columns is not feasible).
Let me know what you think! Any feedback is welcome :)