This PR supports the creation of securely signed binaries built from spack
develop as well as release branches and tags. Specifically:
- remove internal pr mirror url generation logic in favor of buildcache destination
on command line
- with a single mirror url specified in the spack.yaml, this makes it clearer where
binaries from various pipelines are pushed
- designate some tags as reserved: ['public', 'protected', 'notary']
- these tags are stripped from all jobs by default and provisioned internally
based on pipeline type
- update gitlab ci yaml to include pipelines on more protected branches than just
develop (so include releases and tags)
- binaries from all protected pipelines are pushed into mirrors including the
branch name so releases, tags, and develop binaries are kept separate
- update rebuild jobs running on protected pipelines to run on special runners
provisioned with an intermediate signing key
- protected rebuild jobs no longer use "SPACK_SIGNING_KEY" env var to
obtain signing key (in fact, final signing key is nowhere available to rebuild jobs)
- these intermediate signatures are verified at the end of each pipeline by a new
signing job to ensure binaries were produced by a protected pipeline
- optionallly schedule a signing/notary job at the end of the pipeline to sign all
packges in the mirror
- add signing-job-attributes to gitlab-ci section of spack environment to allow
configuration
- signing job runs on special runner (separate from protected rebuild runners)
provisioned with public intermediate key and secret signing key
Old concrete specs were slipping through in `_assign_hash`, and `package_hash` was
attempting to recompute a package hash when we could not know the package a time
of concretization.
Part of this was that the logic for `_assign_hash` was hard to understand -- it was
called twice from `_finalize_concretization` and had special cases for both args it
was called with. It's much easier to understand the logic here if we just inline it.
- [x] Get rid of `_assign_hash` and just integrate it with `_finalize_concretization`
- [x] Don't call `_package_hash` at all for already-concrete specs.
- [x] Add regression test.
This PR introduces a new build cache layout and package format, with improvements for
both efficiency and security.
## Old Format
Currently a binary package consists of a `spec.json` file at the root and a `.spack` file,
which is a `tar` archive containing a copy of the `spec.json` format, possibly a detached
signature (`.asc`) file, and a tar-gzip compressed archive containing the install tree.
```
build_cache/
# metadata (for indexing)
<arch>-<compiler>-<name>-<ver>-24zvipcqgg2wyjpvdq2ajy5jnm564hen.spec.json
<arch>/
<compiler>/
<name>-<ver>/
# tar archive
<arch>-<compiler>-<name>-<ver>-24zvipcqgg2wyjpvdq2ajy5jnm564hen.spack
# tar archive contents:
# metadata (contains sha256 of internal .tar.gz)
<arch>-<compiler>-<name>-<ver>-24zvipcqgg2wyjpvdq2ajy5jnm564hen.spec.json
# signature
<arch>-<compiler>-<name>-<ver>-24zvipcqgg2wyjpvdq2ajy5jnm564hen.spec.json.asc
# tar.gz-compressed prefix
<arch>-<compiler>-<name>-<ver>-24zvipcqgg2wyjpvdq2ajy5jnm564hen.tar.gz
```
After this change, the nesting has been removed so that the `.spack` file is the
compressed archive of the install tree. Now signed binary packages, will take the
form of a clearsigned `spec.json` file (a `spec.json.sig`) at the root, while unsigned
binary packages will contain a `spec.json` at the root.
## New Format
```
build_cache/
# metadata (for indexing, contains sha256 of .spack file)
<arch>-<compiler>-<name>-<ver>-24zvipcqgg2wyjpvdq2ajy5jnm564hen.spec.json
# clearsigned spec.json metadata
<arch>-<compiler>-<name>-<ver>-24zvipcqgg2wyjpvdq2ajy5jnm564hen.spec.json.sig
<arch>/
<compiler>/
<name>-<ver>/
# tar.gz-compressed prefix (may support more compression formats later)
<arch>-<compiler>-<name>-<ver>-24zvipcqgg2wyjpvdq2ajy5jnm564hen.spack
```
## Benefits
The major benefit of this change is that the signatures on binary packages can be
verified without:
1. Having to download the tarball, or
2. having to extract an unknown tarball.
(1) is an improvement in efficiency; (2) is a security fix: we now ensure that we trust the
binary before we try to run it through `tar`, which avoids potential attacks.
## Backward compatibility
Also after this change, spack should still be able to handle the previous buildcache
structure and binary mirrors with mixed layouts.
This PR builds on #28392 by adding a convenience command to create a local mirror that can be used to bootstrap Spack. This is to overcome the inconvenience in setting up this mirror manually, which has been reported when trying to setup Spack on air-gapped systems.
Using this PR the user can create a bootstrapping mirror, on a machine with internet access, by:
% spack bootstrap mirror --binary-packages /opt/bootstrap
==> Adding "clingo-bootstrap@spack+python %apple-clang target=x86_64" and dependencies to the mirror at /opt/bootstrap/local-mirror
==> Adding "gnupg@2.3: %apple-clang target=x86_64" and dependencies to the mirror at /opt/bootstrap/local-mirror
==> Adding "patchelf@0.13.1:0.13.99 %apple-clang target=x86_64" and dependencies to the mirror at /opt/bootstrap/local-mirror
==> Adding binary packages from "https://github.com/alalazo/spack-bootstrap-mirrors/releases/download/v0.1-rc.2/bootstrap-buildcache.tar.gz" to the mirror at /opt/bootstrap/local-mirror
To register the mirror on the platform where it's supposed to be used run the following command(s):
% spack bootstrap add --trust local-sources /opt/bootstrap/metadata/sources
% spack bootstrap add --trust local-binaries /opt/bootstrap/metadata/binaries
The mirror has to be moved over to the air-gapped system, and registered using the commands shown at prompt. The command has options to:
1. Add pre-built binaries downloaded from Github (default is not to add them)
2. Add development dependencies for Spack (currently the Python packages needed to use spack style)
* bootstrap: refactor bootstrap.yaml to move sources metadata out
* bootstrap: allow adding/removing custom bootstrapping sources
This operation can be performed from the command line since
new subcommands have been added to `spack bootstrap`
* Add --trust argument to spack bootstrap add
* Add a command to generate a local mirror for bootstrapping
* Add a unit test for mirror creation
Currently, environments can either be concretized fully together or fully separately. This works well for users who create environments for interoperable software and can use `concretizer:unify:true`. It does not allow environments with conflicting software to be concretized for maximal interoperability.
The primary use-case for this is facilities providing system software. Facilities provide multiple MPI implementations, but all software built against a given MPI ought to be interoperable.
This PR adds a concretization option `concretizer:unify:when_possible`. When this option is used, Spack will concretize specs in the environment separately, but will optimize for minimal differences in overlapping packages.
* Add a level of indirection to root specs
This commit introduce the "literal" atom, which comes with
a few different "arities". The unary "literal" contains an
integer that id the ID of a spec literal. Other "literals"
contain information on the requests made by literal ID. For
instance zlib@1.2.11 generates the following facts:
literal(0,"root","zlib").
literal(0,"node","zlib").
literal(0,"node_version_satisfies","zlib","1.2.11").
This should help with solving large environments "together
where possible" since later literals can be now solved
together in batches.
* Add a mechanism to relax the number of literals being solved
* Modify spack solve to display the new criteria
Since the new criteria is above all the build criteria,
we need to modify the way we display the output.
Originally done by Greg in #27964 and cherry-picked
to this branch by the co-author of the commit.
Co-authored-by: Massimiliano Culpo <massimiliano.culpo@gmail.com>
* Inject reusable specs into the solve
Instead of coupling the PyclingoDriver() object with
spack.config, inject the concrete specs that can be
reused.
A method level function takes care of reading from
the store and the buildcache.
* spack solve: show output of multi-rounds
* add tests for best-effort coconcretization
* Enforce having at least a literal being solved
Co-authored-by: Greg Becker <becker33@llnl.gov>
Previously the regex was only checking for presence of quotes as a beginning
or end character and not a matching set. This erroneously identified the
following *single* argument as being quoted:
source bashenvfile &> /dev/null && python3 -c "import os, json; print(json.dumps(dict(os.environ)))"
Add a config option to strip `-Werror*` or `-Werror=*` from compile lines everywhere.
```yaml
config:
keep_werror: false
```
By default, we strip all `-Werror` arguments out of compile lines, to avoid unwanted
failures when upgrading compilers. You can re-enable `-Werror` in your builds if
you really want to, with either:
```yaml
config:
keep_werror: all
```
or to keep *just* specific `-Werror=XXX` args:
```yaml
config:
keep_werror: specific
```
This should make swapping in newer versions of compilers much smoother when
maintainers have decided to enable `-Werror` by default.
Parse error information is kept for specs, but it doesn't seem like we propagate it
to the user when we encounter an error. This fixes that.
e.g., for this error in a package:
```python
depends_on("python@:3.8", when="0.900:")
```
Before, with no context and no clue that it's even from a particular spec:
```
==> Error: Unexpected token: ':'
```
With this PR:
```
==> Error: Unexpected token: ':'
Encountered when parsing spec:
0.900:
^
```
* Introduce concretizer:unify option to replace spack:concretization
* Deprecate concretization
* Make spack:concretization overrule concretize:unify for now
* Add environment update logic to move from spack:concretization to spack:concretizer:reuse
* Migrate spack:concretization to spack:concretize:unify in all locations
* For new environments make concretizer:unify explicit, so that defaults can be changed in 0.19
The oneapi and dpcpp compilers are essentially the same except for which
binary is used foc CXX. Spack will detect them as "mixed toolchain" and
not inject compiler optimization flags. This will be needed once
archspec has entries for the oneapi and dpcpp compilers. This PR detects
when dpcpp and oneapi are in the toolchains list and explicitly sets
`is_mixed_toolchain` to `False`.
Error messages for the clingo concretizer have proven challenging. The current messages are incredibly vague and often don't help users at all. Unsat cores in clingo are not guaranteed to be minimal, and lead to cores that are either not useful or need to be post-processed for hours to reach a minimal core.
Following up on an idea from a slack conversation with kwryankrattiger on slack, this PR takes a new approach. We eliminate most integrity constraints and minima/maxima on choice rules in clingo, and instead force invalid states to imply an error predicate. The error predicate can include context on the cause of the error (Package, Version, etc). These error predicates are then heavily optimized against, to ensure that we do not include error facts in the solution when a solution with no error facts could be generated. When post-processing the clingo solution to construct specs, any error facts cause the program to raise an exception. This leads to much more legible error messages. Each error predicate includes a priority and an error message. The error message is formatted by the remaining arguments to produce the error message. The priority is used to ensure that when clingo has a choice of which rules to violate, it chooses the one which will be most informative to the user.
Performance:
"fresh" concretizations appear to suffer a ~20% performance penalty under this branch, while "reuse" concretizations see a speedup of around 33%.
Possible optimizations if users still see unhelpful messages:
There are currently 3 levels of priority of the error messages. Additional priorities are possible, and can allow us finer granularity to ensure more informative error messages are provided in lieu of less informative ones.
Future work:
Improve tests to ensure that every possible rule implying an error message is exercised
A non-existent upstream should not be fatal: it could only mean it is
not deployed yet. In the meantime, it should not block the user to
rebuild anything it needs.
A warning is still emitted, to let the user decide if this is ok or not.
Fixes missing chgrp on symlinks in package installations, and errors on
symlinks referencing non-existent or non-writable locations.
Note: `os.chown(.., follow_symlinks=False)` is python3 only, but
`os.lchown` exists in both versions.
* Change license dir from hard-coded to a configurable item
* Change config item to be a string not an array
Co-authored-by: Todd Gamblin <tgamblin@llnl.gov>
Trying to compute `dag_hash()` or `package_hash()` on a concrete spec that doesn't have
a `_package_hash` attribute would attempt to recompute the package hash.
This most commonly manifests as a failed lookup of a namespace if you attempt to uninstall
or compute the hashes of packages in exsternal repositories that aren't registered, e.g.:
```console
> spack spec --json c/htno
==> Error: Unknown namespace: myrepo
```
While it wouldn't change the already-assigned `dag_hash` value, this behavior is
incorrect, since the package file for a previously concrete spec:
1. might have changed since concretization,
2. might not exist anymore, or
3. might just not be findable by Spack.
This PR ensures that the package hash can't be computed on older concrete specs. Instead
of calling `package_hash()` from within `to_node_dict()`, we now check for the `_package_hash`
attribute and only add the package_hash to the spec record if it's there.
This PR also handles the tricky semantics of computing `package_hash()` at concretization
time. We have to compute it *before* marking the spec concrete so that `to_node_dict` can
use it. But this means that the logic for `package_hash()` can't rely on `spec.concrete`,
as it is called *during* concretization. Instead of checking for concreteness, `package_hash()`
now checks `_patches_assigned()` to determine whether it should add them to the package
hash.
- [x] Add an assert to `package_hash()` so it can't be called on specs for which it
would be wrong.
- [x] Add an `_assign_hash()` method to handle tricky semantics of `package_hash`
and `dag_hash`.
- [x] Rework concretization to call `_assign_hash()` before and after marking specs
concrete.
- [x] Rework content hash part of package hash to check for `_patches_assigned()`
instead of `spec.concrete`.
- [x] regression test
Previously we sorted by hash values for `spack graph`, but changing hashes can make the
test brittle and the node order seem nondeterministic to users.
- [x] Sort nodes in `spack graph` by the default edge order, which takes into account
parent and child names as well as dependency types.
- [x] Update ASCII test output for new order.
The dependency check currently checks whether there are only build
dependencies left for a particular package. However, the database also
contains uninstalled packages, which can cause the check to fail.
For instance, with `bison` and `flex` having already been uninstalled,
`m4` will have the following dependents:
```
bison ('build', 'run')--> m4
flex ('build',)--> m4
libnl ('build',)--> m4
```
`bison` and `flex` should be ignored in this case because they are not
installed anymore.
Fixes#30673
#24556 merged in support for Python's .zip file support via ZipFile.
However as per #30200 ZipFile does not preserve file permissions of
the extracted contents. This PR returns to using the `unzip`
executable on non-Windows systems (as was the case before #24556)
and now uses `tar` on Windows to extract .zip files.
We previously had checks in `directory_layout` to check for build-dependency
conflicts when we weren't storing build dependencies. We don't need
those anymore; we can just rely on the DAG hash now that it includes everything
we know about each spec.
- [x] Remove vestigial code for checking installed spec against concrete spec
in `ensure_installed()`
- [x] Remove `SpecHashCollisionError` -- if specs have the same hash now, they're
the same as far as `DirectoryLayout` should be concerned.
- [x] Convert spec comparison to `dag_hash()` comparison when adding extensions.
The database now stores full hashes, so we need to adjust the criteria we use to
determine if something can be uninstalled. Specifically, it's ok to uninstall thing that
have remaining build-only dependents.
With the original DAG hash, we did not store build dependencies in the database, but
with the full DAG hash, we do. Previously, we'd never tell the concretizer about build
dependencies of things used by hash, because we never had them. Now, we have to avoid
telling the concretizer about them, or they'll unnecessarily constrain build
dependencies for new concretizations.
- [x] Make database track all dependencies included in the `dag_hash`
- [x] Modify spec_clauses so that build dependency information is optional
and off by default.
- [x] `spack diff` asks `spec_clauses` for build dependencies for completeness
- [x] Modify `concretize.lp` so that reuse optimization doesn't affect fresh
installations.
- [x] Modify concretizer setup so that it does *not* prioritize installed versions
over package versions. We don't need this with reuse, so they're low priority.
- [x] Fix `test_installed_deps` for full hash and new concretizer (does not work
for old concretizer with full hash -- leave this for later if we need it)
- [x] Move `test_installed_deps` mock packages to `builtin.mock` for easier debugging
with `spack -m`.
- [x] Fix `test_reuse_installed_packages_when_package_def_changes` for full hash
- [x] update test to use `build_hash` instead of `dag_hash`, as we're testing for
graph structure, and specifically NOT testing for package changes.
- [x] make hash descriptors callable on specs to simplify syntax for invoking them
- [x] make `Spec.spec_hash()` public
This removes all but one usage of runtime hash. The runtime hash was being used to write
historical lockfiles for tests, but we don't need it for that; we can just save those
lockfiles.
- [x] add legacy lockfiles for v1, v2, v3
- [x] fix bugs with v1 lockfile tests (the dummy lockfile we were writing was not actually
a v1 lockfile because it used the new spec file format).
- [x] remove all but one runtime_hash usage -- that one needs a small rework of the
concretizer to really fix, as it's about separate concretization of build
dependencies.
- [x] Document the history of the lockfile format in `environment/__init__.py`
Some test cases had to be modified in a kludgy way so that abstract specs made
concrete would have versions on them. We shouldn't *need* to do this, as the
only reason we care is because the content hash has to be able to get an archive
for a version.
This modifies the content hash so that it can be called on abstract specs,
including only relevant content.
This does NOT add a partial content hash to the DAG hash, as we do not really
want that -- we don't need in-memory spec hashes to need to load package files.
It just makes `Package.content_hash()` less prickly and tests easier to
understand.
`spack monitor` expects a field called `spec_full_hash`, so we shouldn't change that.
Instead, we can pass a `dag_hash` (which is now the full hash) but not change the field
name.
`hashes_final` was used to indicate when a spec was concrete but possibly lacked
`full_hash` or `build_hash` fields. This was only necessary because older Spacks
didn't generate them, and we want to avoid recomputing them, as we likely do not
have the same package files as existed at concretization time.
Now, we don't need to do that -- there is only the DAG hash and specs are either
concrete and have a `dag_hash`, or not concrete and have no `dag_hash`. There's
no middle ground.
Without some enforcement of spec ordering, python 2 produced
different results in the affected test than did python 3. This
change makes the arbitrary but reproducible decision to sort
the specs by their lockfile key alphabetically.
The full hash appears twice in the spec dict now, replacing just
the value replaces it under "hash" and "full_hash". Only replace
the one that appears after "full_hash".
I'm actually not sure what purpose this test served, so maybe it
could be removed, as it may be testing some distinction between
full and dag hash which no longer exists.
For a long time, Spack has used a coarser hash to identify packages
than it likely should. Packages are identified by `dag_hash()`, which
includes only link and run dependencies. Build dependencies are
stripped before hashing, and we have notincluded hashes of build
artifacts or the `package.py` files used to build. This means the
DAG hash actually doesn't represent all the things Spack can build,
and it reduces reproducibility.
We did this because, in the early days, users were (rightly) annoyed
when a new version of CMake, autotools, or some other build dependency
would necessitate a rebuild of their entire stack. Coarsening the hash
avoided this issue and enabled a modicum of stability when only reusing
packages by hash match.
Now that we have `--reuse`, we don't need to be so careful. Users can
avoid unnecessary rebuilds much more easily, and we can add more
provenance to the spec without worrying that frequent hash changes
will cause too many rebuilds.
This commit starts the refactor with the following major change:
- [x] Make `Spec.dag_hash()` include build, run, and link
dependencides and the package hash (it is now equivalent to
`full_hash()`).
It also adds a couple of bugfixes for problems discovered during
the switch:
- [x] Don't add a `package_hash()` in `to_node_dict()` unless
the spec is concrete (fixes breaks on abstract specs)
- [x] Don't add source ids to the package hash for packages without
a known fetch strategy (may mock packages are like this)
- [x] Change how `Spec.patches` is memoized. Using
`llnl.util.lang.memoized` on `Spec` objects causes specs to
be stored in a `dict`, which means they need a hash. But,
`dag_hash()` now includes patch `sha256`'s via the package
hash, which can lead to infinite recursion