## Introduction
Gentoo's Portage package manager uses a sophisticated dependency resolver that has evolved over two decades. Unlike binary-based package managers (apt, dnf), Portage must handle source-based compilation with USE flags, slots, subslots, blockers, and complex conditional dependencies. Plenty of developers get an idea to replace python's implementation with something faster. This article explores an experimental integration of the [Z3 SMT](https://github.com/Z3Prover/z3) ([Satisfiability Modulo Theories](https://en.wikipedia.org/wiki/Satisfiability_modulo_theories)) solver as an alternative backend for Portage's dependency resolution and highlights the challenges of such an integration effort.
TL;DR: Z3 is ~2x as fast, but the integration isn't complete - adding choices to uninstall packages will add more constraints and search space. There are also a **lot** of caveats to consider.
### Why Z3?
The native Portage solver uses a greedy algorithm with backtracking. While effective, it can struggle with:
- Large dependency graphs requiring extensive backtracking
- Finding optimal solutions vs. first valid solution
- Explaining *why* a dependency cannot be satisfied
SMT solvers like Z3 approach the problem differently: encode the entire problem as a Boolean satisfiability formula, then find a satisfying assignment. More over as an SMT solver, Z3 can reason over **theroies** - for example an integer theory such that if `X + 4 > Y` then `X > Y - 4` without decomposing the equation to boolean algebra. This has potential advantages:
- **Complete search**: Z3 explores the entire solution space mathematically
- **Proof generation**: Z3 can explain why no solution exists (UNSAT cores)
- **Optimization**: Z3 is extreemely fast, compiling its equations to a representation that encourages [Conflict Driven Clause Learning](https://en.wikipedia.org/wiki/Conflict-driven_clause_learning) algorithm.
## Portage's Native Solver Architecture
### Entry Point: `select_files()`
When you run `emerge dev-libs/foo`, the entry point is `depgraph.select_files()`:
```python
# lib/_emerge/depgraph.py:4934
def select_files(self, args):
return self._select_files(args)
def _select_files(self, myfiles):
"""Given a list of .tbz2s, .ebuilds sets, and deps, populate
self._dynamic_config._initial_arg_list and call self._resolve to create the
appropriate depgraph and return a favorite list."""
self._load_vdb() # Load installed package database
# ... parse arguments into atoms ...
return self._resolve(myfavorites)
```
### The Core Loop: `_resolve()` and `_create_graph()`
The `_resolve()` method initializes the dependency stack with user-requested atoms, then enters the main solving loop:
```python
# lib/_emerge/depgraph.py:5421
def _resolve(self, myfavorites):
for arg in self._expand_set_args(args):
for atom in arg.pset.getAtoms():
dep = Dependency(atom=atom, root=myroot, parent=arg)
# Add to dep_stack and process
if not self._add_pkg(pkg, dep) or not self._create_graph():
# Handle failure, possibly backtrack
...
```
The heart of the solver is `_create_graph()`:
```python
# lib/_emerge/depgraph.py:3094
def _create_graph(self, allow_unsatisfied=False):
dep_stack = self._dynamic_config._dep_stack
dep_disjunctive_stack = self._dynamic_config._dep_disjunctive_stack
while dep_stack or dep_disjunctive_stack:
while dep_stack:
dep = dep_stack.pop()
if isinstance(dep, Package):
if not self._add_pkg_deps(dep): # Add package's dependencies
return 0
else:
if not self._add_dep(dep): # Resolve a dependency
return 0
if dep_disjunctive_stack:
if not self._pop_disjunction(): # Handle || ( ) deps
return 0
return 1
```
This is a **greedy, stack-based** algorithm:
1. Pop a dependency from the stack
2. Find the best matching package (highest version, respecting masks/keywords)
3. Add that package's dependencies to the stack
4. Repeat until stack is empty (success) or no valid choice exists (failure/backtrack)
### Backtracking
When the greedy choice leads to a dead end, Portage can backtrack:
```python
# lib/_emerge/resolver/backtracking.py:100
class Backtracker:
def __init__(self, max_depth):
self._max_depth = max_depth
self._unexplored_nodes = []
```
The backtracker maintains alternative choices that weren't taken. When resolution fails, it reverts to a previous state and tries a different path. This is controlled by `--backtrack=N` (default: 10).
### Key Data Structures
1. **`Package`**: Represents a specific version of an ebuild (e.g., `dev-libs/openssl-3.0.1`)
2. **`Atom`**: A dependency specification (e.g., `>=dev-libs/openssl-1.1:0=`)
3. **`digraph`**: The final dependency graph with merge order
4. **`_dep_stack`**: Work queue of dependencies to resolve
5. **`_slot_packages`**: Tracks packages by (category/package, slot) to allow multiple versions to be installed.
## The Z3 Integration: Encoding Dependencies as SMT
### Hooking into Portage
We can hook into `_create_graph()` with an environment variable:
```python
# lib/_emerge/depgraph.py
_USE_Z3_SOLVER = os.environ.get("PORTAGE_USE_Z3", "0") == "1"
def _create_graph(self, allow_unsatisfied=False):
if _USE_Z3_SOLVER:
# Using new function make it easier to test, overall it just creates the
# Z3DepSolver() object described below.
return self._create_graph_z3(allow_unsatisfied=allow_unsatisfied)
# ... native solver ...
```
### The SMT Encoding
I added the Z3 based solver (`lib/_emerge/resolver/z3_solver.py`) to encode dependency resolution as Boolean satisfiability:
#### 1. Package Variables
Each package version becomes a Boolean variable:
```python
from z3 import And, Bool, Implies, Not, Or, Solver, sat, unsat
class Z3DepSolver:
def __init__(self, depgraph):
# ...
# Package -> Z3 Bool variable (using Any for type hint when Z3 not available)
self._pkg_vars: Dict[Any, Any] = {}
# ...
def _get_pkg_var(self, pkg) -> Bool:
"""Get or create Z3 Bool variable for a package."""
if pkg not in self._pkg_vars:
var_name = f"p_{pkg.cpv}_{pkg.slot}"
self._pkg_vars[pkg] = Bool(var_name)
return self._pkg_vars[pkg]
```
If the variable is `True` in the solution, that package version is installed.
#### 2. Dependency Implications
Dependencies become implications: "if package A is installed, then at least one package satisfying its deps must be installed":
```python
def _encode_package_deps(self, pkg):
for dep_key in ("RDEPEND", "DEPEND", "BDEPEND", "PDEPEND", "IDEPEND"):
dep_string = pkg._metadata.get(dep_key, "")
deps = self._parse_deps(dep_string, pkg, use_enabled)
dep_constraint = self._encode_dep_list(pkg, deps)
if dep_constraint is not None:
# If pkg is installed, deps must be satisfied
self._solver.add(Implies(pkg_var, dep_constraint))
```
For an atom like `>=dev-libs/openssl-1.1`, I find all matching packages and create:
```
pkg_A => (openssl-1.1 OR openssl-1.2 OR openssl-3.0 OR ...)
```
#### 3. OR Dependencies
Portage's `|| ( dep1 dep2 )` maps directly to Z3's `Or()`:
```python
elif isinstance(item, list) and item[0] == "||":
alternatives = item[1:]
alt_constraints = []
for alt in alternatives:
constraint = self._encode_atom_constraint(alt, parent_pkg)
if constraint is not None:
alt_constraints.append(constraint)
return Or(alt_constraints)
```
#### 4. Slot Constraints
Only one package per (category/package, slot) can be installed. I encode this as pairwise mutual exclusion:
```python
def _encode_slot_constraints(self):
for (cp, slot), packages in self._slot_packages.items():
if len(packages) <= 1:
continue
pkg_vars = [self._get_pkg_var(pkg) for pkg in packages]
for i in range(len(pkg_vars)):
for j in range(i + 1, len(pkg_vars)):
# Not both can be true
self._solver.add(Not(And(pkg_vars[i], pkg_vars[j])))
```
#### 5. Blockers
Blockers (`!pkg` or `!!pkg`) become exclusions:
```python
def _encode_blocker(self, parent_pkg, blocker_atom):
# If parent is installed, blocked cannot be installed
for blocked_pkg in matched:
self._solver.add(Implies(parent_var, Not(blocked_var)))
```
#### 6. Root Constraints
User-requested packages must be satisfied:
```python
def _encode_root_atoms(self):
for atom in self._root_atoms:
constraint = self._encode_atom_constraint(atom)
self._solver.add(constraint) # Must be true, not just implied
```
### Solving and Extracting Results
```python
def solve(self, root_atoms):
# ... encode all constraints ...
result = self._solver.check()
if result == sat:
model = self._solver.model()
for pkg, var in self._pkg_vars.items():
if str(model.evaluate(var)) == "True":
if pkg not in self._installed_packages:
new_packages.append(pkg)
return True, new_packages, self._stats
```
## What Worked
### 1. Basic Dependency Resolution
Simple cases work correctly:
```
$ PORTAGE_USE_Z3=1 emerge --pretend dev-libs/A
```
The Z3 solver correctly:
- Finds packages satisfying dependencies
- Handles OR dependencies by selecting a valid alternative
- Respects slot constraints (one package per slot)
- Excludes packages with incompatible KEYWORDS
### 2. Topological Ordering
Portage unit-test require dependencies to come first in the order. To do this I tried tracking dependency edges during encoding:
```python
def _encode_atom_constraint(self, atom, parent_pkg=None):
matched = self._match_atom(atom, candidates)
# Record dependency edges for topological sorting
if parent_pkg is not None:
for dep_pkg in matched:
self._dep_edges[parent_pkg].add(dep_pkg)
```
Then sort the solution using Kahn's algorithm so dependencies come before dependents in the merge order.
### 3. Performance: Z3 is faster, though might get worse as we pass more unit-tests (see below)
**Benchmark result**: Z3 is ~2x faster than native solver
```
Native: mean=1.502s (solving the problem)
Z3: mean=0.644s (building_constraints=0.640s, solving=0.004s)
1658 variables, 3117 constraints, 1605 installed packages
```
## What Didn't Work
### 1. Package Uninstallation for Blockers
**Test failure**: `testBlocker` - expected solution requires uninstalling `dev-libs/Y-1`
The current Z3 encoding only considers *adding* packages. When a blocker like `!dev-libs/Y` exists, and Y is already installed, the solver returns UNSAT instead of realizing it could uninstall Y.
**Fix required**: Add "uninstall" variables for installed packages:
```python
# Hypothetical fix
uninstall_var = Bool(f"uninstall_{pkg.cpv}")
# Package is present if: (was installed AND not uninstalled) OR newly installed
present_var = Or(And(installed_var, Not(uninstall_var)), new_install_var)
```
### 2. World Updates and Upgrade Preferences
**Test failure**: `testOrChoices` - `@world --update --deep` should pull in `vala:0.20`
The native solver has complex heuristics for preferring newer slots during `--update`. The Z3 encoding satisfies dependencies but doesn't optimize for "newest versions":
```python
# Current: just satisfies the dependency
|| ( dev-lang/vala:0.20 dev-lang/vala:0.18 )
# Both vala:0.18 (installed) and vala:0.20 satisfy this
# Z3 picks one arbitrarily (often the already-installed one)
```
**Fix required**: Add optimization objectives:
```python
# Use Z3's optimization to prefer newer versions
from z3 import Optimize
solver = Optimize()
for pkg in candidates:
# Higher version = higher weight
solver.add_soft(pkg_var, weight=version_score(pkg))
```
### 3. Backtracking Information
The native solver maintains detailed backtracking state for:
- Runtime slot conflicts
- USE flag changes needed
- Keyword/mask changes needed
Z3 solver simply returns SAT or UNSAT. I don't extract:
- Which constraints caused UNSAT (unsat core)
- What mask/keyword changes would make it satisfiable
### 4. DEPEND vs RDEPEND Priority
**Test failure**: `testMergeOrder` - circular dependencies should be ordered by dependency type
The native solver distinguishes:
- `DEPEND` (build-time) - must be merged *before* dependent
- `RDEPEND` (run-time) - must be present *after* merge
For circular dependencies, this determines merge order. The topological sort I used above doesn't consider dependency types.
### 5. Autounmask
When no solution exists with current masks, Portage can suggest:
- USE flag changes
- Keyword unmasking (`~arch`)
- Package unmasking
This requires a two-phase solve:
1. Try with all constraints
2. If UNSAT, relax constraints and track what was relaxed
## The Path to Full Compatibility
All this is still failing on 179 tests (out of 259), after scanning the rest of the failed tests the problems seem to be:
1. **Uninstall support**: Add variables and constraints for removing installed packages
2. **Dependency type tracking**: Distinguish DEPEND/RDEPEND/BDEPEND in ordering
3. **Block resolution**: Handle `!pkg` by considering uninstalls
4. **Version preferences**: Integrate Z3 optimization for `--update`
5. **Slot upgrade heuristics**: Prefer newer slots when dependency allows both
6. **--deep semantics**: Correctly interpret depth-limited updates
7. **Relaxable constraints**: Model masks/keywords as soft constraints
8. **Minimal relaxation**: Find smallest set of changes needed
9. **Change reporting**: Extract and display required changes
10. **Circular dependency handling**: Match native solver's cycle-breaking
11. **Virtual packages**: Proper virtual/provider relationships
12. **Depclean integration**: Package removal solver mode
## Conclusion
Integrating Z3 into Portage is challenging, though the basic approach works. The core use - encoding dependencies as Boolean constraints - maps well to Portage's dependency model. However, Portage has accumulated 20 years of nuanced behaviors:
- Intelligent update heuristics
- Graceful degradation via autounmask
- Complex blocker resolution with uninstalls
- Merge ordering beyond simple topological sort
A production-ready Z3 backend would need to replicate these behaviors, not just satisfy dependencies. The proof-of-concept demonstrates the encoding is sound; and Z3 performance is scalable, the remaining work is faithfully modeling Portage's semantics.
Tuesday, December 9, 2025
Integrating Z3 SMT Solver into Gentoo Portage: A Technical Deep Dive
Subscribe to:
Post Comments (Atom)
Integrating Z3 SMT Solver into Gentoo Portage: A Technical Deep Dive
## Introduction Gentoo's Portage package manager uses a sophisticated dependency resolver that has evolved over two decades. Unli...
-
Recently I was asked to run an Inkscape / GIMP workshop for a local makerspace. Local makers were interested in using free tools available ...
-
It's been a long time since my last post. I've started and abandoned new projects, but something also stuck. I've started maki...
-
I have a sleepless two-month old, a toddler, and a full-time technical job, so I really don't want to do much home-IT when I don't ...
No comments:
Post a Comment