Refactor Spack's URL parsing commands (#2938)

* Replace `spack urls` and `spack url-parse` with `spack url`
* Allow spack url list to only list incorrect parsings
* Add spack url test reporting
* Add unit tests for new URL commands
This commit is contained in:
Adam J. Stewart 2017-01-31 10:14:52 -06:00 committed by Todd Gamblin
parent 2e81fe4fb3
commit 123f057089
7 changed files with 796 additions and 311 deletions

View file

@ -300,6 +300,42 @@ Stage objects
Writing commands
----------------
Adding a new command to Spack is easy. Simply add a ``<name>.py`` file to
``lib/spack/spack/cmd/``, where ``<name>`` is the name of the subcommand.
At the bare minimum, two functions are required in this file:
^^^^^^^^^^^^^^^^^^
``setup_parser()``
^^^^^^^^^^^^^^^^^^
Unless your command doesn't accept any arguments, a ``setup_parser()``
function is required to define what arguments and flags your command takes.
See the `Argparse documentation <https://docs.python.org/2.7/library/argparse.html>`_
for more details on how to add arguments.
Some commands have a set of subcommands, like ``spack compiler find`` or
``spack module refresh``. You can add subparsers to your parser to handle
this. Check out ``spack edit --command compiler`` for an example of this.
A lot of commands take the same arguments and flags. These arguments should
be defined in ``lib/spack/spack/cmd/common/arguments.py`` so that they don't
need to be redefined in multiple commands.
^^^^^^^^^^^^
``<name>()``
^^^^^^^^^^^^
In order to run your command, Spack searches for a function with the same
name as your command in ``<name>.py``. This is the main method for your
command, and can call other helper methods to handle common tasks.
Remember, before adding a new command, think to yourself whether or not this
new command is actually necessary. Sometimes, the functionality you desire
can be added to an existing command. Also remember to add unit tests for
your command. If it isn't used very frequently, changes to the rest of
Spack can cause your command to break without sufficient unit tests to
prevent this from happening.
----------
Unit tests
----------
@ -312,14 +348,80 @@ Unit testing
Developer commands
------------------
.. _cmd-spack-doc:
^^^^^^^^^^^^^
``spack doc``
^^^^^^^^^^^^^
.. _cmd-spack-test:
^^^^^^^^^^^^^^
``spack test``
^^^^^^^^^^^^^^
.. _cmd-spack-url:
^^^^^^^^^^^^^
``spack url``
^^^^^^^^^^^^^
A package containing a single URL can be used to download several different
versions of the package. If you've ever wondered how this works, all of the
magic is in :mod:`spack.url`. This module contains methods for extracting
the name and version of a package from its URL. The name is used by
``spack create`` to guess the name of the package. By determining the version
from the URL, Spack can replace it with other versions to determine where to
download them from.
The regular expressions in ``parse_name_offset`` and ``parse_version_offset``
are used to extract the name and version, but they aren't perfect. In order
to debug Spack's URL parsing support, the ``spack url`` command can be used.
"""""""""""""""""""
``spack url parse``
"""""""""""""""""""
If you need to debug a single URL, you can use the following command:
.. command-output:: spack url parse http://cache.ruby-lang.org/pub/ruby/2.2/ruby-2.2.0.tar.gz
You'll notice that the name and version of this URL are correctly detected,
and you can even see which regular expressions it was matched to. However,
you'll notice that when it substitutes the version number in, it doesn't
replace the ``2.2`` with ``9.9`` where we would expect ``9.9.9b`` to live.
This particular package may require a ``list_url`` or ``url_for_version``
function.
This command also accepts a ``--spider`` flag. If provided, Spack searches
for other versions of the package and prints the matching URLs.
""""""""""""""""""
``spack url list``
""""""""""""""""""
This command lists every URL in every package in Spack. If given the
``--color`` and ``--extrapolation`` flags, it also colors the part of
the string that it detected to be the name and version. The
``--incorrect-name`` and ``--incorrect-version`` flags can be used to
print URLs that were not being parsed correctly.
""""""""""""""""""
``spack url test``
""""""""""""""""""
This command attempts to parse every URL for every package in Spack
and prints a summary of how many of them are being correctly parsed.
It also prints a histogram showing which regular expressions are being
matched and how frequently:
.. command-output:: spack url test
This command is essential for anyone adding or changing the regular
expressions that parse names and versions. By running this command
before and after the change, you can make sure that your regular
expression fixes more packages than it breaks.
---------
Profiling
---------

View file

@ -712,8 +712,8 @@ is at ``http://example.com/downloads/foo-1.0.tar.gz``, Spack will look
in ``http://example.com/downloads/`` for links to additional versions.
If you need to search another path for download links, you can supply
some extra attributes that control how your package finds new
versions. See the documentation on `attribute_list_url`_ and
`attribute_list_depth`_.
versions. See the documentation on :ref:`attribute_list_url` and
:ref:`attribute_list_depth`.
.. note::
@ -728,6 +728,102 @@ versions. See the documentation on `attribute_list_url`_ and
syntax errors, or the ``import`` will fail. Use this once you've
got your package in working order.
--------------------
Finding new versions
--------------------
You've already seen the ``homepage`` and ``url`` package attributes:
.. code-block:: python
:linenos:
from spack import *
class Mpich(Package):
"""MPICH is a high performance and widely portable implementation of
the Message Passing Interface (MPI) standard."""
homepage = "http://www.mpich.org"
url = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
These are class-level attributes used by Spack to show users
information about the package, and to determine where to download its
source code.
Spack uses the tarball URL to extrapolate where to find other tarballs
of the same package (e.g. in :ref:`cmd-spack-checksum`, but
this does not always work. This section covers ways you can tell
Spack to find tarballs elsewhere.
.. _attribute_list_url:
^^^^^^^^^^^^
``list_url``
^^^^^^^^^^^^
When spack tries to find available versions of packages (e.g. with
:ref:`cmd-spack-checksum`), it spiders the parent directory
of the tarball in the ``url`` attribute. For example, for libelf, the
url is:
.. code-block:: python
url = "http://www.mr511.de/software/libelf-0.8.13.tar.gz"
Here, Spack spiders ``http://www.mr511.de/software/`` to find similar
tarball links and ultimately to make a list of available versions of
``libelf``.
For many packages, the tarball's parent directory may be unlistable,
or it may not contain any links to source code archives. In fact,
many times additional package downloads aren't even available in the
same directory as the download URL.
For these, you can specify a separate ``list_url`` indicating the page
to search for tarballs. For example, ``libdwarf`` has the homepage as
the ``list_url``, because that is where links to old versions are:
.. code-block:: python
:linenos:
class Libdwarf(Package):
homepage = "http://www.prevanders.net/dwarf.html"
url = "http://www.prevanders.net/libdwarf-20130729.tar.gz"
list_url = homepage
.. _attribute_list_depth:
^^^^^^^^^^^^^^
``list_depth``
^^^^^^^^^^^^^^
``libdwarf`` and many other packages have a listing of available
versions on a single webpage, but not all do. For example, ``mpich``
has a tarball URL that looks like this:
.. code-block:: python
url = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
But its downloads are in many different subdirectories of
``http://www.mpich.org/static/downloads/``. So, we need to add a
``list_url`` *and* a ``list_depth`` attribute:
.. code-block:: python
:linenos:
class Mpich(Package):
homepage = "http://www.mpich.org"
url = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
list_url = "http://www.mpich.org/static/downloads/"
list_depth = 2
By default, Spack only looks at the top-level page available at
``list_url``. ``list_depth`` tells it to follow up to 2 levels of
links from the top-level page. Note that here, this implies two
levels of subdirectories, as the ``mpich`` website is structured much
like a filesystem. But ``list_depth`` really refers to link depth
when spidering the page.
.. _vcs-fetch:
@ -1241,103 +1337,6 @@ RPATHs in Spack are handled in one of three ways:
links. You can see this how this is used in the :ref:`PySide
example <pyside-patch>` above.
--------------------
Finding new versions
--------------------
You've already seen the ``homepage`` and ``url`` package attributes:
.. code-block:: python
:linenos:
from spack import *
class Mpich(Package):
"""MPICH is a high performance and widely portable implementation of
the Message Passing Interface (MPI) standard."""
homepage = "http://www.mpich.org"
url = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
These are class-level attributes used by Spack to show users
information about the package, and to determine where to download its
source code.
Spack uses the tarball URL to extrapolate where to find other tarballs
of the same package (e.g. in :ref:`cmd-spack-checksum`, but
this does not always work. This section covers ways you can tell
Spack to find tarballs elsewhere.
.. _attribute_list_url:
^^^^^^^^^^^^
``list_url``
^^^^^^^^^^^^
When spack tries to find available versions of packages (e.g. with
:ref:`cmd-spack-checksum`), it spiders the parent directory
of the tarball in the ``url`` attribute. For example, for libelf, the
url is:
.. code-block:: python
url = "http://www.mr511.de/software/libelf-0.8.13.tar.gz"
Here, Spack spiders ``http://www.mr511.de/software/`` to find similar
tarball links and ultimately to make a list of available versions of
``libelf``.
For many packages, the tarball's parent directory may be unlistable,
or it may not contain any links to source code archives. In fact,
many times additional package downloads aren't even available in the
same directory as the download URL.
For these, you can specify a separate ``list_url`` indicating the page
to search for tarballs. For example, ``libdwarf`` has the homepage as
the ``list_url``, because that is where links to old versions are:
.. code-block:: python
:linenos:
class Libdwarf(Package):
homepage = "http://www.prevanders.net/dwarf.html"
url = "http://www.prevanders.net/libdwarf-20130729.tar.gz"
list_url = homepage
.. _attribute_list_depth:
^^^^^^^^^^^^^^
``list_depth``
^^^^^^^^^^^^^^
``libdwarf`` and many other packages have a listing of available
versions on a single webpage, but not all do. For example, ``mpich``
has a tarball URL that looks like this:
.. code-block:: python
url = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
But its downloads are in many different subdirectories of
``http://www.mpich.org/static/downloads/``. So, we need to add a
``list_url`` *and* a ``list_depth`` attribute:
.. code-block:: python
:linenos:
class Mpich(Package):
homepage = "http://www.mpich.org"
url = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
list_url = "http://www.mpich.org/static/downloads/"
list_depth = 2
By default, Spack only looks at the top-level page available at
``list_url``. ``list_depth`` tells it to follow up to 2 levels of
links from the top-level page. Note that here, this implies two
levels of subdirectories, as the ``mpich`` website is structured much
like a filesystem. But ``list_depth`` really refers to link depth
when spidering the page.
.. _attribute_parallel:
---------------

319
lib/spack/spack/cmd/url.py Normal file
View file

@ -0,0 +1,319 @@
##############################################################################
# Copyright (c) 2013-2016, Lawrence Livermore National Security, LLC.
# Produced at the Lawrence Livermore National Laboratory.
#
# This file is part of Spack.
# Created by Todd Gamblin, tgamblin@llnl.gov, All rights reserved.
# LLNL-CODE-647188
#
# For details, see https://github.com/llnl/spack
# Please also see the LICENSE file for our notice and the LGPL.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License (as
# published by the Free Software Foundation) version 2.1, February 1999.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the IMPLIED WARRANTY OF
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the terms and
# conditions of the GNU Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
##############################################################################
from __future__ import division, print_function
from collections import defaultdict
import spack
from llnl.util import tty
from spack.url import *
from spack.util.web import find_versions_of_archive
description = "debugging tool for url parsing"
def setup_parser(subparser):
sp = subparser.add_subparsers(metavar='SUBCOMMAND', dest='subcommand')
# Parse
parse_parser = sp.add_parser('parse', help='attempt to parse a url')
parse_parser.add_argument(
'url',
help='url to parse')
parse_parser.add_argument(
'-s', '--spider', action='store_true',
help='spider the source page for versions')
# List
list_parser = sp.add_parser('list', help='list urls in all packages')
list_parser.add_argument(
'-c', '--color', action='store_true',
help='color the parsed version and name in the urls shown '
'(versions will be cyan, name red)')
list_parser.add_argument(
'-e', '--extrapolation', action='store_true',
help='color the versions used for extrapolation as well '
'(additional versions will be green, names magenta)')
excl_args = list_parser.add_mutually_exclusive_group()
excl_args.add_argument(
'-n', '--incorrect-name', action='store_true',
help='only list urls for which the name was incorrectly parsed')
excl_args.add_argument(
'-v', '--incorrect-version', action='store_true',
help='only list urls for which the version was incorrectly parsed')
# Test
sp.add_parser(
'test', help='print a summary of how well we are parsing package urls')
def url(parser, args):
action = {
'parse': url_parse,
'list': url_list,
'test': url_test
}
action[args.subcommand](args)
def url_parse(args):
url = args.url
tty.msg('Parsing URL: {0}'.format(url))
print()
ver, vs, vl, vi, vregex = parse_version_offset(url)
tty.msg('Matched version regex {0:>2}: r{1!r}'.format(vi, vregex))
name, ns, nl, ni, nregex = parse_name_offset(url, ver)
tty.msg('Matched name regex {0:>2}: r{1!r}'.format(ni, nregex))
print()
tty.msg('Detected:')
try:
print_name_and_version(url)
except UrlParseError as e:
tty.error(str(e))
print(' name: {0}'.format(name))
print(' version: {0}'.format(ver))
print()
tty.msg('Substituting version 9.9.9b:')
newurl = substitute_version(url, '9.9.9b')
print_name_and_version(newurl)
if args.spider:
print()
tty.msg('Spidering for versions:')
versions = find_versions_of_archive(url)
max_len = max(len(str(v)) for v in versions)
for v in sorted(versions):
print('{0:{1}} {2}'.format(v, max_len, versions[v]))
def url_list(args):
urls = set()
# Gather set of URLs from all packages
for pkg in spack.repo.all_packages():
url = getattr(pkg.__class__, 'url', None)
urls = url_list_parsing(args, urls, url, pkg)
for params in pkg.versions.values():
url = params.get('url', None)
urls = url_list_parsing(args, urls, url, pkg)
# Print URLs
for url in sorted(urls):
if args.color or args.extrapolation:
print(color_url(url, subs=args.extrapolation, errors=True))
else:
print(url)
# Return the number of URLs that were printed, only for testing purposes
return len(urls)
def url_test(args):
# Collect statistics on how many URLs were correctly parsed
total_urls = 0
correct_names = 0
correct_versions = 0
# Collect statistics on which regexes were matched and how often
name_regex_dict = dict()
name_count_dict = defaultdict(int)
version_regex_dict = dict()
version_count_dict = defaultdict(int)
tty.msg('Generating a summary of URL parsing in Spack...')
# Loop through all packages
for pkg in spack.repo.all_packages():
urls = set()
url = getattr(pkg.__class__, 'url', None)
if url:
urls.add(url)
for params in pkg.versions.values():
url = params.get('url', None)
if url:
urls.add(url)
# Calculate statistics
for url in urls:
total_urls += 1
# Parse versions
version = None
try:
version, vs, vl, vi, vregex = parse_version_offset(url)
version_regex_dict[vi] = vregex
version_count_dict[vi] += 1
if version_parsed_correctly(pkg, version):
correct_versions += 1
except UndetectableVersionError:
pass
# Parse names
try:
name, ns, nl, ni, nregex = parse_name_offset(url, version)
name_regex_dict[ni] = nregex
name_count_dict[ni] += 1
if name_parsed_correctly(pkg, name):
correct_names += 1
except UndetectableNameError:
pass
print()
print(' Total URLs found: {0}'.format(total_urls))
print(' Names correctly parsed: {0:>4}/{1:>4} ({2:>6.2%})'.format(
correct_names, total_urls, correct_names / total_urls))
print(' Versions correctly parsed: {0:>4}/{1:>4} ({2:>6.2%})'.format(
correct_versions, total_urls, correct_versions / total_urls))
print()
tty.msg('Statistics on name regular expresions:')
print()
print(' Index Count Regular Expresion')
for ni in name_regex_dict:
print(' {0:>3}: {1:>6} r{2!r}'.format(
ni, name_count_dict[ni], name_regex_dict[ni]))
print()
tty.msg('Statistics on version regular expresions:')
print()
print(' Index Count Regular Expresion')
for vi in version_regex_dict:
print(' {0:>3}: {1:>6} r{2!r}'.format(
vi, version_count_dict[vi], version_regex_dict[vi]))
print()
# Return statistics, only for testing purposes
return (total_urls, correct_names, correct_versions,
name_count_dict, version_count_dict)
def print_name_and_version(url):
"""Prints a URL. Underlines the detected name with dashes and
the detected version with tildes.
:param str url: The url to parse
"""
name, ns, nl, ntup, ver, vs, vl, vtup = substitution_offsets(url)
underlines = [' '] * max(ns + nl, vs + vl)
for i in range(ns, ns + nl):
underlines[i] = '-'
for i in range(vs, vs + vl):
underlines[i] = '~'
print(' {0}'.format(url))
print(' {0}'.format(''.join(underlines)))
def url_list_parsing(args, urls, url, pkg):
"""Helper function for :func:`url_list`.
:param argparse.Namespace args: The arguments given to ``spack url list``
:param set urls: List of URLs that have already been added
:param url: A URL to potentially add to ``urls`` depending on ``args``
:type url: str or None
:param spack.package.PackageBase pkg: The Spack package
:returns: The updated ``urls`` list
:rtype: set
"""
if url:
if args.incorrect_name:
# Only add URLs whose name was incorrectly parsed
try:
name = parse_name(url)
if not name_parsed_correctly(pkg, name):
urls.add(url)
except UndetectableNameError:
urls.add(url)
elif args.incorrect_version:
# Only add URLs whose version was incorrectly parsed
try:
version = parse_version(url)
if not version_parsed_correctly(pkg, version):
urls.add(url)
except UndetectableVersionError:
urls.add(url)
else:
urls.add(url)
return urls
def name_parsed_correctly(pkg, name):
"""Determine if the name of a package was correctly parsed.
:param spack.package.PackageBase pkg: The Spack package
:param str name: The name that was extracted from the URL
:returns: True if the name was correctly parsed, else False
:rtype: bool
"""
pkg_name = pkg.name
# After determining a name, `spack create` determines a build system.
# Some build systems prepend a special string to the front of the name.
# Since this can't be guessed from the URL, it would be unfair to say
# that these names are incorrectly parsed, so we remove them.
if pkg_name.startswith('r-'):
pkg_name = pkg_name[2:]
elif pkg_name.startswith('py-'):
pkg_name = pkg_name[3:]
elif pkg_name.startswith('octave-'):
pkg_name = pkg_name[7:]
return name == pkg_name
def version_parsed_correctly(pkg, version):
"""Determine if the version of a package was correctly parsed.
:param spack.package.PackageBase pkg: The Spack package
:param str version: The version that was extracted from the URL
:returns: True if the name was correctly parsed, else False
:rtype: bool
"""
# If the version parsed from the URL is listed in a version()
# directive, we assume it was correctly parsed
for pkg_version in pkg.versions:
if str(pkg_version) == str(version):
return True
return False

View file

@ -1,79 +0,0 @@
##############################################################################
# Copyright (c) 2013-2016, Lawrence Livermore National Security, LLC.
# Produced at the Lawrence Livermore National Laboratory.
#
# This file is part of Spack.
# Created by Todd Gamblin, tgamblin@llnl.gov, All rights reserved.
# LLNL-CODE-647188
#
# For details, see https://github.com/llnl/spack
# Please also see the LICENSE file for our notice and the LGPL.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License (as
# published by the Free Software Foundation) version 2.1, February 1999.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the IMPLIED WARRANTY OF
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the terms and
# conditions of the GNU Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
##############################################################################
import llnl.util.tty as tty
import spack
import spack.url
from spack.util.web import find_versions_of_archive
description = "show parsing of a URL, optionally spider web for versions"
def setup_parser(subparser):
subparser.add_argument('url', help="url of a package archive")
subparser.add_argument(
'-s', '--spider', action='store_true',
help="spider the source page for versions")
def print_name_and_version(url):
name, ns, nl, ntup, ver, vs, vl, vtup = spack.url.substitution_offsets(url)
underlines = [" "] * max(ns + nl, vs + vl)
for i in range(ns, ns + nl):
underlines[i] = '-'
for i in range(vs, vs + vl):
underlines[i] = '~'
print " %s" % url
print " %s" % ''.join(underlines)
def url_parse(parser, args):
url = args.url
ver, vs, vl = spack.url.parse_version_offset(url, debug=True)
name, ns, nl = spack.url.parse_name_offset(url, ver, debug=True)
print
tty.msg("Detected:")
try:
print_name_and_version(url)
except spack.url.UrlParseError as e:
tty.error(str(e))
print ' name: %s' % name
print ' version: %s' % ver
print
tty.msg("Substituting version 9.9.9b:")
newurl = spack.url.substitute_version(url, '9.9.9b')
print_name_and_version(newurl)
if args.spider:
print
tty.msg("Spidering for versions:")
versions = find_versions_of_archive(url)
for v in sorted(versions):
print "%-20s%s" % (v, versions[v])

View file

@ -1,59 +0,0 @@
##############################################################################
# Copyright (c) 2013-2016, Lawrence Livermore National Security, LLC.
# Produced at the Lawrence Livermore National Laboratory.
#
# This file is part of Spack.
# Created by Todd Gamblin, tgamblin@llnl.gov, All rights reserved.
# LLNL-CODE-647188
#
# For details, see https://github.com/llnl/spack
# Please also see the LICENSE file for our notice and the LGPL.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License (as
# published by the Free Software Foundation) version 2.1, February 1999.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the IMPLIED WARRANTY OF
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the terms and
# conditions of the GNU Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
##############################################################################
import spack
import spack.url
description = "inspect urls used by packages in spack"
def setup_parser(subparser):
subparser.add_argument(
'-c', '--color', action='store_true',
help="color the parsed version and name in the urls shown. "
"version will be cyan, name red")
subparser.add_argument(
'-e', '--extrapolation', action='store_true',
help="color the versions used for extrapolation as well. "
"additional versions are green, names magenta")
def urls(parser, args):
urls = set()
for pkg in spack.repo.all_packages():
url = getattr(pkg.__class__, 'url', None)
if url:
urls.add(url)
for params in pkg.versions.values():
url = params.get('url', None)
if url:
urls.add(url)
for url in sorted(urls):
if args.color or args.extrapolation:
print spack.url.color_url(
url, subs=args.extrapolation, errors=True)
else:
print url

View file

@ -0,0 +1,116 @@
##############################################################################
# Copyright (c) 2013-2016, Lawrence Livermore National Security, LLC.
# Produced at the Lawrence Livermore National Laboratory.
#
# This file is part of Spack.
# Created by Todd Gamblin, tgamblin@llnl.gov, All rights reserved.
# LLNL-CODE-647188
#
# For details, see https://github.com/llnl/spack
# Please also see the LICENSE file for our notice and the LGPL.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License (as
# published by the Free Software Foundation) version 2.1, February 1999.
#
# This program is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the IMPLIED WARRANTY OF
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the terms and
# conditions of the GNU Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
##############################################################################
import argparse
import pytest
from spack.cmd.url import *
@pytest.fixture(scope='module')
def parser():
"""Returns the parser for the ``url`` command"""
parser = argparse.ArgumentParser()
setup_parser(parser)
return parser
class MyPackage:
def __init__(self, name, versions):
self.name = name
self.versions = versions
def test_name_parsed_correctly():
# Expected True
assert name_parsed_correctly(MyPackage('netcdf', []), 'netcdf')
assert name_parsed_correctly(MyPackage('r-devtools', []), 'devtools')
assert name_parsed_correctly(MyPackage('py-numpy', []), 'numpy')
assert name_parsed_correctly(MyPackage('octave-splines', []), 'splines')
# Expected False
assert not name_parsed_correctly(MyPackage('', []), 'hdf5')
assert not name_parsed_correctly(MyPackage('hdf5', []), '')
assert not name_parsed_correctly(MyPackage('imagemagick', []), 'ImageMagick') # noqa
assert not name_parsed_correctly(MyPackage('yaml-cpp', []), 'yamlcpp')
assert not name_parsed_correctly(MyPackage('yamlcpp', []), 'yaml-cpp')
assert not name_parsed_correctly(MyPackage('r-py-parser', []), 'parser')
assert not name_parsed_correctly(MyPackage('oce', []), 'oce-0.18.0') # noqa
def test_version_parsed_correctly():
# Expected True
assert version_parsed_correctly(MyPackage('', ['1.2.3']), '1.2.3')
assert version_parsed_correctly(MyPackage('', ['5.4a', '5.4b']), '5.4a')
assert version_parsed_correctly(MyPackage('', ['5.4a', '5.4b']), '5.4b')
# Expected False
assert not version_parsed_correctly(MyPackage('', []), '1.2.3')
assert not version_parsed_correctly(MyPackage('', ['1.2.3']), '')
assert not version_parsed_correctly(MyPackage('', ['1.2.3']), '1.2.4')
assert not version_parsed_correctly(MyPackage('', ['3.4a']), '3.4')
assert not version_parsed_correctly(MyPackage('', ['3.4']), '3.4b')
assert not version_parsed_correctly(MyPackage('', ['0.18.0']), 'oce-0.18.0') # noqa
def test_url_parse(parser):
args = parser.parse_args(['parse', 'http://zlib.net/fossils/zlib-1.2.10.tar.gz'])
url(parser, args)
@pytest.mark.xfail
def test_url_parse_xfail(parser):
# No version in URL
args = parser.parse_args(['parse', 'http://www.netlib.org/voronoi/triangle.zip'])
url(parser, args)
def test_url_list(parser):
args = parser.parse_args(['list'])
total_urls = url_list(args)
# The following two options should not change the number of URLs printed.
args = parser.parse_args(['list', '--color', '--extrapolation'])
colored_urls = url_list(args)
assert colored_urls == total_urls
# The following two options should print fewer URLs than the default.
# If they print the same number of URLs, something is horribly broken.
# If they say we missed 0 URLs, something is probably broken too.
args = parser.parse_args(['list', '--incorrect-name'])
incorrect_name_urls = url_list(args)
assert 0 < incorrect_name_urls < total_urls
args = parser.parse_args(['list', '--incorrect-version'])
incorrect_version_urls = url_list(args)
assert 0 < incorrect_version_urls < total_urls
def test_url_test(parser):
args = parser.parse_args(['test'])
(total_urls, correct_names, correct_versions,
name_count_dict, version_count_dict) = url_test(args)
assert 0 < correct_names <= sum(name_count_dict.values()) <= total_urls # noqa
assert 0 < correct_versions <= sum(version_count_dict.values()) <= total_urls # noqa

View file

@ -28,17 +28,17 @@
download location of the package, and figure out version and name information
from there.
Example: when spack is given the following URL:
**Example:** when spack is given the following URL:
ftp://ftp.ruby-lang.org/pub/ruby/1.9/ruby-1.9.1-p243.tar.gz
https://www.hdfgroup.org/ftp/HDF/releases/HDF4.2.12/src/hdf-4.2.12.tar.gz
It can figure out that the package name is ruby, and that it is at version
1.9.1-p243. This is useful for making the creation of packages simple: a user
It can figure out that the package name is ``hdf``, and that it is at version
``4.2.12``. This is useful for making the creation of packages simple: a user
just supplies a URL and skeleton code is generated automatically.
Spack can also figure out that it can most likely download 1.8.1 at this URL:
Spack can also figure out that it can most likely download 4.2.6 at this URL:
ftp://ftp.ruby-lang.org/pub/ruby/1.9/ruby-1.8.1.tar.gz
https://www.hdfgroup.org/ftp/HDF/releases/HDF4.2.6/src/hdf-4.2.6.tar.gz
This is useful if a user asks for a package at a particular version number;
spack doesn't need anyone to tell it where to get the tarball even though
@ -104,24 +104,23 @@ def strip_query_and_fragment(path):
def split_url_extension(path):
"""Some URLs have a query string, e.g.:
1. https://github.com/losalamos/CLAMR/blob/packages/PowerParser_v2.0.7.tgz?raw=true
2. http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.2.0/apache-cassandra-1.2.0-rc2-bin.tar.gz
3. https://gitlab.kitware.com/vtk/vtk/repository/archive.tar.bz2?ref=v7.0.0
1. https://github.com/losalamos/CLAMR/blob/packages/PowerParser_v2.0.7.tgz?raw=true
2. http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.2.0/apache-cassandra-1.2.0-rc2-bin.tar.gz
3. https://gitlab.kitware.com/vtk/vtk/repository/archive.tar.bz2?ref=v7.0.0
In (1), the query string needs to be stripped to get at the
extension, but in (2) & (3), the filename is IN a single final query
argument.
In (1), the query string needs to be stripped to get at the
extension, but in (2) & (3), the filename is IN a single final query
argument.
This strips the URL into three pieces: prefix, ext, and suffix.
The suffix contains anything that was stripped off the URL to
get at the file extension. In (1), it will be '?raw=true', but
in (2), it will be empty. In (3) the suffix is a parameter that follows
after the file extension, e.g.:
This strips the URL into three pieces: ``prefix``, ``ext``, and ``suffix``.
The suffix contains anything that was stripped off the URL to
get at the file extension. In (1), it will be ``'?raw=true'``, but
in (2), it will be empty. In (3) the suffix is a parameter that follows
after the file extension, e.g.:
1. ('https://github.com/losalamos/CLAMR/blob/packages/PowerParser_v2.0.7', '.tgz', '?raw=true')
2. ('http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.2.0/apache-cassandra-1.2.0-rc2-bin',
'.tar.gz', None)
3. ('https://gitlab.kitware.com/vtk/vtk/repository/archive', '.tar.bz2', '?ref=v7.0.0')
1. ``('https://github.com/losalamos/CLAMR/blob/packages/PowerParser_v2.0.7', '.tgz', '?raw=true')``
2. ``('http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.2.0/apache-cassandra-1.2.0-rc2-bin', '.tar.gz', None)``
3. ``('https://gitlab.kitware.com/vtk/vtk/repository/archive', '.tar.bz2', '?ref=v7.0.0')``
"""
prefix, ext, suffix = path, '', ''
@ -149,7 +148,7 @@ def determine_url_file_extension(path):
"""This returns the type of archive a URL refers to. This is
sometimes confusing because of URLs like:
(1) https://github.com/petdance/ack/tarball/1.93_02
(1) https://github.com/petdance/ack/tarball/1.93_02
Where the URL doesn't actually contain the filename. We need
to know what type it is so that we can appropriately name files
@ -166,19 +165,44 @@ def determine_url_file_extension(path):
return ext
def parse_version_offset(path, debug=False):
"""Try to extract a version string from a filename or URL. This is taken
largely from Homebrew's Version class."""
def parse_version_offset(path):
"""Try to extract a version string from a filename or URL.
:param str path: The filename or URL for the package
:return: A tuple containing:
version of the package,
first index of version,
length of version string,
the index of the matching regex
the matching regex
:rtype: tuple
:raises UndetectableVersionError: If the URL does not match any regexes
"""
original_path = path
# path: The prefix of the URL, everything before the ext and suffix
# ext: The file extension
# suffix: Any kind of query string that begins with a '?'
path, ext, suffix = split_url_extension(path)
# Allow matches against the basename, to avoid including parent
# dirs in version name Remember the offset of the stem in the path
# stem: Everything from path after the final '/'
stem = os.path.basename(path)
offset = len(path) - len(stem)
version_types = [
# List of the following format:
#
# [
# (regex, string),
# ...
# ]
#
# The first regex that matches string will be used to determine
# the version of the package. Thefore, hyperspecific regexes should
# come first while generic, catch-all regexes should come last.
version_regexes = [
# GitHub tarballs, e.g. v1.2.3
(r'github.com/.+/(?:zip|tar)ball/v?((\d+\.)+\d+)$', path),
@ -258,16 +282,13 @@ def parse_version_offset(path, debug=False):
(r'\/(\d\.\d+)\/', path),
# e.g. http://www.ijg.org/files/jpegsrc.v8d.tar.gz
(r'\.v(\d+[a-z]?)', stem)]
(r'\.v(\d+[a-z]?)', stem)
]
for i, vtype in enumerate(version_types):
regex, match_string = vtype
for i, version_regex in enumerate(version_regexes):
regex, match_string = version_regex
match = re.search(regex, match_string)
if match and match.group(1) is not None:
if debug:
tty.msg("Parsing URL: %s" % path,
" Matched regex %d: r'%s'" % (i, regex))
version = match.group(1)
start = match.start(1)
@ -275,30 +296,74 @@ def parse_version_offset(path, debug=False):
if match_string is stem:
start += offset
return version, start, len(version)
return version, start, len(version), i, regex
raise UndetectableVersionError(original_path)
def parse_version(path, debug=False):
"""Given a URL or archive name, extract a version from it and return
a version object.
def parse_version(path):
"""Try to extract a version string from a filename or URL.
:param str path: The filename or URL for the package
:return: The version of the package
:rtype: spack.version.Version
:raises UndetectableVersionError: If the URL does not match any regexes
"""
ver, start, l = parse_version_offset(path, debug=debug)
return Version(ver)
version, start, length, i, regex = parse_version_offset(path)
return Version(version)
def parse_name_offset(path, v=None, debug=False):
def parse_name_offset(path, v=None):
"""Try to determine the name of a package from its filename or URL.
:param str path: The filename or URL for the package
:param str v: The version of the package
:return: A tuple containing:
name of the package,
first index of name,
length of name,
the index of the matching regex
the matching regex
:rtype: tuple
:raises UndetectableNameError: If the URL does not match any regexes
"""
original_path = path
# We really need to know the version of the package
# This helps us prevent collisions between the name and version
if v is None:
v = parse_version(path, debug=debug)
try:
v = parse_version(path)
except UndetectableVersionError:
# Not all URLs contain a version. We still want to be able
# to determine a name if possible.
v = ''
# path: The prefix of the URL, everything before the ext and suffix
# ext: The file extension
# suffix: Any kind of query string that begins with a '?'
path, ext, suffix = split_url_extension(path)
# Allow matching with either path or stem, as with the version.
# stem: Everything from path after the final '/'
stem = os.path.basename(path)
offset = len(path) - len(stem)
name_types = [
# List of the following format:
#
# [
# (regex, string),
# ...
# ]
#
# The first regex that matches string will be used to determine
# the name of the package. Thefore, hyperspecific regexes should
# come first while generic, catch-all regexes should come last.
name_regexes = [
(r'/sourceforge/([^/]+)/', path),
(r'github.com/[^/]+/[^/]+/releases/download/%s/(.*)-%s$' %
(v, v), path),
@ -316,10 +381,11 @@ def parse_name_offset(path, v=None, debug=False):
(r'/([^/]+)%s' % v, path),
(r'^([^/]+)[_.-]v?%s' % v, path),
(r'^([^/]+)%s' % v, path)]
(r'^([^/]+)%s' % v, path)
]
for i, name_type in enumerate(name_types):
regex, match_string = name_type
for i, name_regex in enumerate(name_regexes):
regex, match_string = name_regex
match = re.search(regex, match_string)
if match:
name = match.group(1)
@ -333,17 +399,38 @@ def parse_name_offset(path, v=None, debug=False):
name = name.lower()
name = re.sub('[_.]', '-', name)
return name, start, len(name)
return name, start, len(name), i, regex
raise UndetectableNameError(path)
raise UndetectableNameError(original_path)
def parse_name(path, ver=None):
name, start, l = parse_name_offset(path, ver)
"""Try to determine the name of a package from its filename or URL.
:param str path: The filename or URL for the package
:param str ver: The version of the package
:return: The name of the package
:rtype: str
:raises UndetectableNameError: If the URL does not match any regexes
"""
name, start, length, i, regex = parse_name_offset(path, ver)
return name
def parse_name_and_version(path):
"""Try to determine the name of a package and extract its version
from its filename or URL.
:param str path: The filename or URL for the package
:return: A tuple containing:
The name of the package
The version of the package
:rtype: tuple
"""
ver = parse_version(path)
name = parse_name(path, ver)
return (name, ver)
@ -371,12 +458,12 @@ def cumsum(elts, init=0, fn=lambda x: x):
def substitution_offsets(path):
"""This returns offsets for substituting versions and names in the
provided path. It is a helper for substitute_version().
provided path. It is a helper for :func:`substitute_version`.
"""
# Get name and version offsets
try:
ver, vs, vl = parse_version_offset(path)
name, ns, nl = parse_name_offset(path, ver)
ver, vs, vl, vi, vregex = parse_version_offset(path)
name, ns, nl, ni, nregex = parse_name_offset(path, ver)
except UndetectableNameError:
return (None, -1, -1, (), ver, vs, vl, (vs,))
except UndetectableVersionError:
@ -444,21 +531,22 @@ def wildcard_version(path):
def substitute_version(path, new_version):
"""Given a URL or archive name, find the version in the path and
substitute the new version for it. Replace all occurrences of
the version *if* they don't overlap with the package name.
substitute the new version for it. Replace all occurrences of
the version *if* they don't overlap with the package name.
Simple example::
substitute_version('http://www.mr511.de/software/libelf-0.8.13.tar.gz', '2.9.3')
->'http://www.mr511.de/software/libelf-2.9.3.tar.gz'
Simple example:
Complex examples::
substitute_version('http://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.0.tar.gz', 2.1)
-> 'http://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.1.tar.gz'
.. code-block:: python
# In this string, the "2" in mvapich2 is NOT replaced.
substitute_version('http://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.tar.gz', 2.1)
-> 'http://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.1.tar.gz'
substitute_version('http://www.mr511.de/software/libelf-0.8.13.tar.gz', '2.9.3')
>>> 'http://www.mr511.de/software/libelf-2.9.3.tar.gz'
Complex example:
.. code-block:: python
substitute_version('https://www.hdfgroup.org/ftp/HDF/releases/HDF4.2.12/src/hdf-4.2.12.tar.gz', '2.3')
>>> 'https://www.hdfgroup.org/ftp/HDF/releases/HDF2.3/src/hdf-2.3.tar.gz'
"""
(name, ns, nl, noffs,
ver, vs, vl, voffs) = substitution_offsets(path)
@ -477,17 +565,16 @@ def substitute_version(path, new_version):
def color_url(path, **kwargs):
"""Color the parts of the url according to Spack's parsing.
Colors are:
Cyan: The version found by parse_version_offset().
Red: The name found by parse_name_offset().
Colors are:
| Cyan: The version found by :func:`parse_version_offset`.
| Red: The name found by :func:`parse_name_offset`.
Green: Instances of version string from substitute_version().
Magenta: Instances of the name (protected from substitution).
Optional args:
errors=True Append parse errors at end of string.
subs=True Color substitutions as well as parsed name/version.
| Green: Instances of version string from :func:`substitute_version`.
| Magenta: Instances of the name (protected from substitution).
:param str path: The filename or URL for the package
:keyword bool errors: Append parse errors at end of string.
:keyword bool subs: Color substitutions as well as parsed name/version.
"""
errors = kwargs.get('errors', False)
subs = kwargs.get('subs', False)