Refactor Spack's URL parsing commands (#2938)

* Replace `spack urls` and `spack url-parse` with `spack url` * Allow spack url list to only list incorrect parsings * Add spack url test reporting * Add unit tests for new URL commands
2017-01-31 10:14:52 -06:00 · 2017-01-31 10:14:52 -06:00 · 123f057089
commit 123f057089
parent 2e81fe4fb3
7 changed files with 796 additions and 311 deletions
--- a/lib/spack/docs/developer_guide.rst
+++ b/lib/spack/docs/developer_guide.rst
@ -300,6 +300,42 @@ Stage objects
 Writing commands
 ----------------
 Adding a new command to Spack is easy. Simply add a ``<name>.py`` file to
 ``lib/spack/spack/cmd/``, where ``<name>`` is the name of the subcommand.
 At the bare minimum, two functions are required in this file:
 ^^^^^^^^^^^^^^^^^^
 ``setup_parser()``
 ^^^^^^^^^^^^^^^^^^
 Unless your command doesn't accept any arguments, a ``setup_parser()``
 function is required to define what arguments and flags your command takes.
 See the `Argparse documentation <https://docs.python.org/2.7/library/argparse.html>`_
 for more details on how to add arguments.
 Some commands have a set of subcommands, like ``spack compiler find`` or
 ``spack module refresh``. You can add subparsers to your parser to handle
 this. Check out ``spack edit --command compiler`` for an example of this.
 A lot of commands take the same arguments and flags. These arguments should
 be defined in ``lib/spack/spack/cmd/common/arguments.py`` so that they don't
 need to be redefined in multiple commands.
 ^^^^^^^^^^^^
 ``<name>()``
 ^^^^^^^^^^^^
 In order to run your command, Spack searches for a function with the same
 name as your command in ``<name>.py``. This is the main method for your
 command, and can call other helper methods to handle common tasks.
 Remember, before adding a new command, think to yourself whether or not this
 new command is actually necessary. Sometimes, the functionality you desire
 can be added to an existing command. Also remember to add unit tests for
 your command. If it isn't used very frequently, changes to the rest of
 Spack can cause your command to break without sufficient unit tests to
 prevent this from happening.
 ----------
 Unit tests
 ----------
@ -312,14 +348,80 @@ Unit testing
 Developer commands
 ------------------
 .. _cmd-spack-doc:
 ^^^^^^^^^^^^^
 ``spack doc``
 ^^^^^^^^^^^^^
 .. _cmd-spack-test:
 ^^^^^^^^^^^^^^
 ``spack test``
 ^^^^^^^^^^^^^^
 .. _cmd-spack-url:
 ^^^^^^^^^^^^^
 ``spack url``
 ^^^^^^^^^^^^^
 A package containing a single URL can be used to download several different
 versions of the package. If you've ever wondered how this works, all of the
 magic is in :mod:`spack.url`. This module contains methods for extracting
 the name and version of a package from its URL. The name is used by
 ``spack create`` to guess the name of the package. By determining the version
 from the URL, Spack can replace it with other versions to determine where to
 download them from.
 The regular expressions in ``parse_name_offset`` and ``parse_version_offset``
 are used to extract the name and version, but they aren't perfect. In order
 to debug Spack's URL parsing support, the ``spack url`` command can be used.
 """""""""""""""""""
 ``spack url parse``
 """""""""""""""""""
 If you need to debug a single URL, you can use the following command:
 .. command-output:: spack url parse http://cache.ruby-lang.org/pub/ruby/2.2/ruby-2.2.0.tar.gz
 You'll notice that the name and version of this URL are correctly detected,
 and you can even see which regular expressions it was matched to. However,
 you'll notice that when it substitutes the version number in, it doesn't
 replace the ``2.2`` with ``9.9`` where we would expect ``9.9.9b`` to live.
 This particular package may require a ``list_url`` or ``url_for_version``
 function.
 This command also accepts a ``--spider`` flag. If provided, Spack searches
 for other versions of the package and prints the matching URLs.
 """"""""""""""""""
 ``spack url list``
 """"""""""""""""""
 This command lists every URL in every package in Spack. If given the
 ``--color`` and ``--extrapolation`` flags, it also colors the part of
 the string that it detected to be the name and version. The
 ``--incorrect-name`` and ``--incorrect-version`` flags can be used to
 print URLs that were not being parsed correctly.
 """"""""""""""""""
 ``spack url test``
 """"""""""""""""""
 This command attempts to parse every URL for every package in Spack
 and prints a summary of how many of them are being correctly parsed.
 It also prints a histogram showing which regular expressions are being
 matched and how frequently:
 .. command-output:: spack url test
 This command is essential for anyone adding or changing the regular
 expressions that parse names and versions. By running this command
 before and after the change, you can make sure that your regular
 expression fixes more packages than it breaks.
 ---------
 Profiling
 ---------
--- a/lib/spack/docs/packaging_guide.rst
+++ b/lib/spack/docs/packaging_guide.rst
@ -712,8 +712,8 @@ is at ``http://example.com/downloads/foo-1.0.tar.gz``, Spack will look
 in ``http://example.com/downloads/`` for links to additional versions.
 If you need to search another path for download links, you can supply
 some extra attributes that control how your package finds new
-versions. See the documentation on `attribute_list_url`_ and
+versions. See the documentation on :ref:`attribute_list_url` and
-`attribute_list_depth`_.
+:ref:`attribute_list_depth`.
 .. note::
@ -728,6 +728,102 @@ versions. See the documentation on `attribute_list_url`_ and
    syntax errors, or the ``import`` will fail.  Use this once you've
    got your package in working order.
 --------------------
 Finding new versions
 --------------------
 You've already seen the ``homepage`` and ``url`` package attributes:
 .. code-block:: python
   :linenos:
   from spack import *
   class Mpich(Package):
      """MPICH is a high performance and widely portable implementation of
         the Message Passing Interface (MPI) standard."""
      homepage = "http://www.mpich.org"
      url      = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
 These are class-level attributes used by Spack to show users
 information about the package, and to determine where to download its
 source code.
 Spack uses the tarball URL to extrapolate where to find other tarballs
 of the same package (e.g. in :ref:`cmd-spack-checksum`, but
 this does not always work.  This section covers ways you can tell
 Spack to find tarballs elsewhere.
 .. _attribute_list_url:
 ^^^^^^^^^^^^
 ``list_url``
 ^^^^^^^^^^^^
 When spack tries to find available versions of packages (e.g. with
 :ref:`cmd-spack-checksum`), it spiders the parent directory
 of the tarball in the ``url`` attribute.  For example, for libelf, the
 url is:
 .. code-block:: python
   url = "http://www.mr511.de/software/libelf-0.8.13.tar.gz"
 Here, Spack spiders ``http://www.mr511.de/software/`` to find similar
 tarball links and ultimately to make a list of available versions of
 ``libelf``.
 For many packages, the tarball's parent directory may be unlistable,
 or it may not contain any links to source code archives.  In fact,
 many times additional package downloads aren't even available in the
 same directory as the download URL.
 For these, you can specify a separate ``list_url`` indicating the page
 to search for tarballs.  For example, ``libdwarf`` has the homepage as
 the ``list_url``, because that is where links to old versions are:
 .. code-block:: python
   :linenos:
   class Libdwarf(Package):
       homepage = "http://www.prevanders.net/dwarf.html"
       url      = "http://www.prevanders.net/libdwarf-20130729.tar.gz"
       list_url = homepage
 .. _attribute_list_depth:
 ^^^^^^^^^^^^^^
 ``list_depth``
 ^^^^^^^^^^^^^^
 ``libdwarf`` and many other packages have a listing of available
 versions on a single webpage, but not all do.  For example, ``mpich``
 has a tarball URL that looks like this:
 .. code-block:: python
   url = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
 But its downloads are in many different subdirectories of
 ``http://www.mpich.org/static/downloads/``.  So, we need to add a
 ``list_url`` *and* a ``list_depth`` attribute:
 .. code-block:: python
   :linenos:
   class Mpich(Package):
       homepage   = "http://www.mpich.org"
       url        = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
       list_url   = "http://www.mpich.org/static/downloads/"
       list_depth = 2
 By default, Spack only looks at the top-level page available at
 ``list_url``.  ``list_depth`` tells it to follow up to 2 levels of
 links from the top-level page.  Note that here, this implies two
 levels of subdirectories, as the ``mpich`` website is structured much
 like a filesystem.  But ``list_depth`` really refers to link depth
 when spidering the page.
 .. _vcs-fetch:
@ -1241,103 +1337,6 @@ RPATHs in Spack are handled in one of three ways:
   links.  You can see this how this is used in the :ref:`PySide
   example <pyside-patch>` above.
 --------------------
 Finding new versions
 --------------------
 You've already seen the ``homepage`` and ``url`` package attributes:
 .. code-block:: python
   :linenos:
   from spack import *
   class Mpich(Package):
      """MPICH is a high performance and widely portable implementation of
         the Message Passing Interface (MPI) standard."""
      homepage = "http://www.mpich.org"
      url      = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
 These are class-level attributes used by Spack to show users
 information about the package, and to determine where to download its
 source code.
 Spack uses the tarball URL to extrapolate where to find other tarballs
 of the same package (e.g. in :ref:`cmd-spack-checksum`, but
 this does not always work.  This section covers ways you can tell
 Spack to find tarballs elsewhere.
 .. _attribute_list_url:
 ^^^^^^^^^^^^
 ``list_url``
 ^^^^^^^^^^^^
 When spack tries to find available versions of packages (e.g. with
 :ref:`cmd-spack-checksum`), it spiders the parent directory
 of the tarball in the ``url`` attribute.  For example, for libelf, the
 url is:
 .. code-block:: python
   url = "http://www.mr511.de/software/libelf-0.8.13.tar.gz"
 Here, Spack spiders ``http://www.mr511.de/software/`` to find similar
 tarball links and ultimately to make a list of available versions of
 ``libelf``.
 For many packages, the tarball's parent directory may be unlistable,
 or it may not contain any links to source code archives.  In fact,
 many times additional package downloads aren't even available in the
 same directory as the download URL.
 For these, you can specify a separate ``list_url`` indicating the page
 to search for tarballs.  For example, ``libdwarf`` has the homepage as
 the ``list_url``, because that is where links to old versions are:
 .. code-block:: python
   :linenos:
   class Libdwarf(Package):
       homepage = "http://www.prevanders.net/dwarf.html"
       url      = "http://www.prevanders.net/libdwarf-20130729.tar.gz"
       list_url = homepage
 .. _attribute_list_depth:
 ^^^^^^^^^^^^^^
 ``list_depth``
 ^^^^^^^^^^^^^^
 ``libdwarf`` and many other packages have a listing of available
 versions on a single webpage, but not all do.  For example, ``mpich``
 has a tarball URL that looks like this:
 .. code-block:: python
   url = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
 But its downloads are in many different subdirectories of
 ``http://www.mpich.org/static/downloads/``.  So, we need to add a
 ``list_url`` *and* a ``list_depth`` attribute:
 .. code-block:: python
   :linenos:
   class Mpich(Package):
       homepage   = "http://www.mpich.org"
       url        = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
       list_url   = "http://www.mpich.org/static/downloads/"
       list_depth = 2
 By default, Spack only looks at the top-level page available at
 ``list_url``.  ``list_depth`` tells it to follow up to 2 levels of
 links from the top-level page.  Note that here, this implies two
 levels of subdirectories, as the ``mpich`` website is structured much
 like a filesystem.  But ``list_depth`` really refers to link depth
 when spidering the page.
 .. _attribute_parallel:
 ---------------
--- a/lib/spack/spack/cmd/url.py
+++ b/lib/spack/spack/cmd/url.py
@ -0,0 +1,319 @@
 ##############################################################################
 # Copyright (c) 2013-2016, Lawrence Livermore National Security, LLC.
 # Produced at the Lawrence Livermore National Laboratory.
 #
 # This file is part of Spack.
 # Created by Todd Gamblin, tgamblin@llnl.gov, All rights reserved.
 # LLNL-CODE-647188
 #
 # For details, see https://github.com/llnl/spack
 # Please also see the LICENSE file for our notice and the LGPL.
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU Lesser General Public License (as
 # published by the Free Software Foundation) version 2.1, February 1999.
 #
 # This program is distributed in the hope that it will be useful, but
 # WITHOUT ANY WARRANTY; without even the IMPLIED WARRANTY OF
 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the terms and
 # conditions of the GNU Lesser General Public License for more details.
 #
 # You should have received a copy of the GNU Lesser General Public
 # License along with this program; if not, write to the Free Software
 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
 ##############################################################################
 from __future__ import division, print_function
 from collections import defaultdict
 import spack
 from llnl.util import tty
 from spack.url import *
 from spack.util.web import find_versions_of_archive
 description = "debugging tool for url parsing"
 def setup_parser(subparser):
    sp = subparser.add_subparsers(metavar='SUBCOMMAND', dest='subcommand')
    # Parse
    parse_parser = sp.add_parser('parse', help='attempt to parse a url')
    parse_parser.add_argument(
        'url',
        help='url to parse')
    parse_parser.add_argument(
        '-s', '--spider', action='store_true',
        help='spider the source page for versions')
    # List
    list_parser = sp.add_parser('list', help='list urls in all packages')
    list_parser.add_argument(
        '-c', '--color', action='store_true',
        help='color the parsed version and name in the urls shown '
             '(versions will be cyan, name red)')
    list_parser.add_argument(
        '-e', '--extrapolation', action='store_true',
        help='color the versions used for extrapolation as well '
             '(additional versions will be green, names magenta)')
    excl_args = list_parser.add_mutually_exclusive_group()
    excl_args.add_argument(
        '-n', '--incorrect-name', action='store_true',
        help='only list urls for which the name was incorrectly parsed')
    excl_args.add_argument(
        '-v', '--incorrect-version', action='store_true',
        help='only list urls for which the version was incorrectly parsed')
    # Test
    sp.add_parser(
        'test', help='print a summary of how well we are parsing package urls')
 def url(parser, args):
    action = {
        'parse': url_parse,
        'list':  url_list,
        'test':  url_test
    }
    action[args.subcommand](args)
 def url_parse(args):
    url = args.url
    tty.msg('Parsing URL: {0}'.format(url))
    print()
    ver,  vs, vl, vi, vregex = parse_version_offset(url)
    tty.msg('Matched version regex {0:>2}: r{1!r}'.format(vi, vregex))
    name, ns, nl, ni, nregex = parse_name_offset(url, ver)
    tty.msg('Matched  name   regex {0:>2}: r{1!r}'.format(ni, nregex))
    print()
    tty.msg('Detected:')
    try:
        print_name_and_version(url)
    except UrlParseError as e:
        tty.error(str(e))
    print('    name:    {0}'.format(name))
    print('    version: {0}'.format(ver))
    print()
    tty.msg('Substituting version 9.9.9b:')
    newurl = substitute_version(url, '9.9.9b')
    print_name_and_version(newurl)
    if args.spider:
        print()
        tty.msg('Spidering for versions:')
        versions = find_versions_of_archive(url)
        max_len = max(len(str(v)) for v in versions)
        for v in sorted(versions):
            print('{0:{1}}  {2}'.format(v, max_len, versions[v]))
 def url_list(args):
    urls = set()
    # Gather set of URLs from all packages
    for pkg in spack.repo.all_packages():
        url = getattr(pkg.__class__, 'url', None)
        urls = url_list_parsing(args, urls, url, pkg)
        for params in pkg.versions.values():
            url = params.get('url', None)
            urls = url_list_parsing(args, urls, url, pkg)
    # Print URLs
    for url in sorted(urls):
        if args.color or args.extrapolation:
            print(color_url(url, subs=args.extrapolation, errors=True))
        else:
            print(url)
    # Return the number of URLs that were printed, only for testing purposes
    return len(urls)
 def url_test(args):
    # Collect statistics on how many URLs were correctly parsed
    total_urls       = 0
    correct_names    = 0
    correct_versions = 0
    # Collect statistics on which regexes were matched and how often
    name_regex_dict    = dict()
    name_count_dict    = defaultdict(int)
    version_regex_dict = dict()
    version_count_dict = defaultdict(int)
    tty.msg('Generating a summary of URL parsing in Spack...')
    # Loop through all packages
    for pkg in spack.repo.all_packages():
        urls = set()
        url = getattr(pkg.__class__, 'url', None)
        if url:
            urls.add(url)
        for params in pkg.versions.values():
            url = params.get('url', None)
            if url:
                urls.add(url)
        # Calculate statistics
        for url in urls:
            total_urls += 1
            # Parse versions
            version = None
            try:
                version, vs, vl, vi, vregex = parse_version_offset(url)
                version_regex_dict[vi] = vregex
                version_count_dict[vi] += 1
                if version_parsed_correctly(pkg, version):
                    correct_versions += 1
            except UndetectableVersionError:
                pass
            # Parse names
            try:
                name, ns, nl, ni, nregex = parse_name_offset(url, version)
                name_regex_dict[ni] = nregex
                name_count_dict[ni] += 1
                if name_parsed_correctly(pkg, name):
                    correct_names += 1
            except UndetectableNameError:
                pass
    print()
    print('    Total URLs found:          {0}'.format(total_urls))
    print('    Names correctly parsed:    {0:>4}/{1:>4} ({2:>6.2%})'.format(
        correct_names, total_urls, correct_names / total_urls))
    print('    Versions correctly parsed: {0:>4}/{1:>4} ({2:>6.2%})'.format(
        correct_versions, total_urls, correct_versions / total_urls))
    print()
    tty.msg('Statistics on name regular expresions:')
    print()
    print('    Index  Count  Regular Expresion')
    for ni in name_regex_dict:
        print('    {0:>3}: {1:>6}   r{2!r}'.format(
            ni, name_count_dict[ni], name_regex_dict[ni]))
    print()
    tty.msg('Statistics on version regular expresions:')
    print()
    print('    Index  Count  Regular Expresion')
    for vi in version_regex_dict:
        print('    {0:>3}: {1:>6}   r{2!r}'.format(
            vi, version_count_dict[vi], version_regex_dict[vi]))
    print()
    # Return statistics, only for testing purposes
    return (total_urls, correct_names, correct_versions,
            name_count_dict, version_count_dict)
 def print_name_and_version(url):
    """Prints a URL. Underlines the detected name with dashes and
    the detected version with tildes.
    :param str url: The url to parse
    """
    name, ns, nl, ntup, ver, vs, vl, vtup = substitution_offsets(url)
    underlines = [' '] * max(ns + nl, vs + vl)
    for i in range(ns, ns + nl):
        underlines[i] = '-'
    for i in range(vs, vs + vl):
        underlines[i] = '~'
    print('    {0}'.format(url))
    print('    {0}'.format(''.join(underlines)))
 def url_list_parsing(args, urls, url, pkg):
    """Helper function for :func:`url_list`.
    :param argparse.Namespace args: The arguments given to ``spack url list``
    :param set urls: List of URLs that have already been added
    :param url: A URL to potentially add to ``urls`` depending on ``args``
    :type url: str or None
    :param spack.package.PackageBase pkg: The Spack package
    :returns: The updated ``urls`` list
    :rtype: set
    """
    if url:
        if args.incorrect_name:
            # Only add URLs whose name was incorrectly parsed
            try:
                name = parse_name(url)
                if not name_parsed_correctly(pkg, name):
                    urls.add(url)
            except UndetectableNameError:
                urls.add(url)
        elif args.incorrect_version:
            # Only add URLs whose version was incorrectly parsed
            try:
                version = parse_version(url)
                if not version_parsed_correctly(pkg, version):
                    urls.add(url)
            except UndetectableVersionError:
                urls.add(url)
        else:
            urls.add(url)
    return urls
 def name_parsed_correctly(pkg, name):
    """Determine if the name of a package was correctly parsed.
    :param spack.package.PackageBase pkg: The Spack package
    :param str name: The name that was extracted from the URL
    :returns: True if the name was correctly parsed, else False
    :rtype: bool
    """
    pkg_name = pkg.name
    # After determining a name, `spack create` determines a build system.
    # Some build systems prepend a special string to the front of the name.
    # Since this can't be guessed from the URL, it would be unfair to say
    # that these names are incorrectly parsed, so we remove them.
    if pkg_name.startswith('r-'):
        pkg_name = pkg_name[2:]
    elif pkg_name.startswith('py-'):
        pkg_name = pkg_name[3:]
    elif pkg_name.startswith('octave-'):
        pkg_name = pkg_name[7:]
    return name == pkg_name
 def version_parsed_correctly(pkg, version):
    """Determine if the version of a package was correctly parsed.
    :param spack.package.PackageBase pkg: The Spack package
    :param str version: The version that was extracted from the URL
    :returns: True if the name was correctly parsed, else False
    :rtype: bool
    """
    # If the version parsed from the URL is listed in a version()
    # directive, we assume it was correctly parsed
    for pkg_version in pkg.versions:
        if str(pkg_version) == str(version):
            return True
    return False
--- a/lib/spack/spack/cmd/url_parse.py
+++ b/lib/spack/spack/cmd/url_parse.py
@ -1,79 +0,0 @@
 ##############################################################################
 # Copyright (c) 2013-2016, Lawrence Livermore National Security, LLC.
 # Produced at the Lawrence Livermore National Laboratory.
 #
 # This file is part of Spack.
 # Created by Todd Gamblin, tgamblin@llnl.gov, All rights reserved.
 # LLNL-CODE-647188
 #
 # For details, see https://github.com/llnl/spack
 # Please also see the LICENSE file for our notice and the LGPL.
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU Lesser General Public License (as
 # published by the Free Software Foundation) version 2.1, February 1999.
 #
 # This program is distributed in the hope that it will be useful, but
 # WITHOUT ANY WARRANTY; without even the IMPLIED WARRANTY OF
 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the terms and
 # conditions of the GNU Lesser General Public License for more details.
 #
 # You should have received a copy of the GNU Lesser General Public
 # License along with this program; if not, write to the Free Software
 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
 ##############################################################################
 import llnl.util.tty as tty
 import spack
 import spack.url
 from spack.util.web import find_versions_of_archive
 description = "show parsing of a URL, optionally spider web for versions"
 def setup_parser(subparser):
    subparser.add_argument('url', help="url of a package archive")
    subparser.add_argument(
        '-s', '--spider', action='store_true',
        help="spider the source page for versions")
 def print_name_and_version(url):
    name, ns, nl, ntup, ver, vs, vl, vtup = spack.url.substitution_offsets(url)
    underlines = [" "] * max(ns + nl, vs + vl)
    for i in range(ns, ns + nl):
        underlines[i] = '-'
    for i in range(vs, vs + vl):
        underlines[i] = '~'
    print "    %s" % url
    print "    %s" % ''.join(underlines)
 def url_parse(parser, args):
    url = args.url
    ver,  vs, vl = spack.url.parse_version_offset(url, debug=True)
    name, ns, nl = spack.url.parse_name_offset(url, ver, debug=True)
    print
    tty.msg("Detected:")
    try:
        print_name_and_version(url)
    except spack.url.UrlParseError as e:
        tty.error(str(e))
    print '    name:     %s' % name
    print '    version:  %s' % ver
    print
    tty.msg("Substituting version 9.9.9b:")
    newurl = spack.url.substitute_version(url, '9.9.9b')
    print_name_and_version(newurl)
    if args.spider:
        print
        tty.msg("Spidering for versions:")
        versions = find_versions_of_archive(url)
        for v in sorted(versions):
            print "%-20s%s" % (v, versions[v])
--- a/lib/spack/spack/cmd/urls.py
+++ b/lib/spack/spack/cmd/urls.py
@ -1,59 +0,0 @@
 ##############################################################################
 # Copyright (c) 2013-2016, Lawrence Livermore National Security, LLC.
 # Produced at the Lawrence Livermore National Laboratory.
 #
 # This file is part of Spack.
 # Created by Todd Gamblin, tgamblin@llnl.gov, All rights reserved.
 # LLNL-CODE-647188
 #
 # For details, see https://github.com/llnl/spack
 # Please also see the LICENSE file for our notice and the LGPL.
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU Lesser General Public License (as
 # published by the Free Software Foundation) version 2.1, February 1999.
 #
 # This program is distributed in the hope that it will be useful, but
 # WITHOUT ANY WARRANTY; without even the IMPLIED WARRANTY OF
 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the terms and
 # conditions of the GNU Lesser General Public License for more details.
 #
 # You should have received a copy of the GNU Lesser General Public
 # License along with this program; if not, write to the Free Software
 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
 ##############################################################################
 import spack
 import spack.url
 description = "inspect urls used by packages in spack"
 def setup_parser(subparser):
    subparser.add_argument(
        '-c', '--color', action='store_true',
        help="color the parsed version and name in the urls shown. "
             "version will be cyan, name red")
    subparser.add_argument(
        '-e', '--extrapolation', action='store_true',
        help="color the versions used for extrapolation as well. "
             "additional versions are green, names magenta")
 def urls(parser, args):
    urls = set()
    for pkg in spack.repo.all_packages():
        url = getattr(pkg.__class__, 'url', None)
        if url:
            urls.add(url)
        for params in pkg.versions.values():
            url = params.get('url', None)
            if url:
                urls.add(url)
    for url in sorted(urls):
        if args.color or args.extrapolation:
            print spack.url.color_url(
                url, subs=args.extrapolation, errors=True)
        else:
            print url
--- a/lib/spack/spack/test/cmd/url.py
+++ b/lib/spack/spack/test/cmd/url.py
@ -0,0 +1,116 @@
 ##############################################################################
 # Copyright (c) 2013-2016, Lawrence Livermore National Security, LLC.
 # Produced at the Lawrence Livermore National Laboratory.
 #
 # This file is part of Spack.
 # Created by Todd Gamblin, tgamblin@llnl.gov, All rights reserved.
 # LLNL-CODE-647188
 #
 # For details, see https://github.com/llnl/spack
 # Please also see the LICENSE file for our notice and the LGPL.
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU Lesser General Public License (as
 # published by the Free Software Foundation) version 2.1, February 1999.
 #
 # This program is distributed in the hope that it will be useful, but
 # WITHOUT ANY WARRANTY; without even the IMPLIED WARRANTY OF
 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the terms and
 # conditions of the GNU Lesser General Public License for more details.
 #
 # You should have received a copy of the GNU Lesser General Public
 # License along with this program; if not, write to the Free Software
 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
 ##############################################################################
 import argparse
 import pytest
 from spack.cmd.url import *
@pytest.fixture(scope='module')
 def parser():
    """Returns the parser for the ``url`` command"""
    parser = argparse.ArgumentParser()
    setup_parser(parser)
    return parser
 class MyPackage:
    def __init__(self, name, versions):
        self.name = name
        self.versions = versions
 def test_name_parsed_correctly():
    # Expected True
    assert name_parsed_correctly(MyPackage('netcdf',         []), 'netcdf')
    assert name_parsed_correctly(MyPackage('r-devtools',     []), 'devtools')
    assert name_parsed_correctly(MyPackage('py-numpy',       []), 'numpy')
    assert name_parsed_correctly(MyPackage('octave-splines', []), 'splines')
    # Expected False
    assert not name_parsed_correctly(MyPackage('',            []), 'hdf5')
    assert not name_parsed_correctly(MyPackage('hdf5',        []), '')
    assert not name_parsed_correctly(MyPackage('imagemagick', []), 'ImageMagick')  # noqa
    assert not name_parsed_correctly(MyPackage('yaml-cpp',    []), 'yamlcpp')
    assert not name_parsed_correctly(MyPackage('yamlcpp',     []), 'yaml-cpp')
    assert not name_parsed_correctly(MyPackage('r-py-parser', []), 'parser')
    assert not name_parsed_correctly(MyPackage('oce',         []), 'oce-0.18.0')   # noqa
 def test_version_parsed_correctly():
    # Expected True
    assert version_parsed_correctly(MyPackage('', ['1.2.3']),        '1.2.3')
    assert version_parsed_correctly(MyPackage('', ['5.4a', '5.4b']), '5.4a')
    assert version_parsed_correctly(MyPackage('', ['5.4a', '5.4b']), '5.4b')
    # Expected False
    assert not version_parsed_correctly(MyPackage('', []),         '1.2.3')
    assert not version_parsed_correctly(MyPackage('', ['1.2.3']),  '')
    assert not version_parsed_correctly(MyPackage('', ['1.2.3']),  '1.2.4')
    assert not version_parsed_correctly(MyPackage('', ['3.4a']),   '3.4')
    assert not version_parsed_correctly(MyPackage('', ['3.4']),    '3.4b')
    assert not version_parsed_correctly(MyPackage('', ['0.18.0']), 'oce-0.18.0')   # noqa
 def test_url_parse(parser):
    args = parser.parse_args(['parse', 'http://zlib.net/fossils/zlib-1.2.10.tar.gz'])
    url(parser, args)
@pytest.mark.xfail
 def test_url_parse_xfail(parser):
    # No version in URL
    args = parser.parse_args(['parse', 'http://www.netlib.org/voronoi/triangle.zip'])
    url(parser, args)
 def test_url_list(parser):
    args = parser.parse_args(['list'])
    total_urls = url_list(args)
    # The following two options should not change the number of URLs printed.
    args = parser.parse_args(['list', '--color', '--extrapolation'])
    colored_urls = url_list(args)
    assert colored_urls == total_urls
    # The following two options should print fewer URLs than the default.
    # If they print the same number of URLs, something is horribly broken.
    # If they say we missed 0 URLs, something is probably broken too.
    args = parser.parse_args(['list', '--incorrect-name'])
    incorrect_name_urls = url_list(args)
    assert 0 < incorrect_name_urls < total_urls
    args = parser.parse_args(['list', '--incorrect-version'])
    incorrect_version_urls = url_list(args)
    assert 0 < incorrect_version_urls < total_urls
 def test_url_test(parser):
    args = parser.parse_args(['test'])
    (total_urls, correct_names, correct_versions,
     name_count_dict, version_count_dict) = url_test(args)
    assert 0 < correct_names    <= sum(name_count_dict.values())    <= total_urls  # noqa
    assert 0 < correct_versions <= sum(version_count_dict.values()) <= total_urls  # noqa
--- a/lib/spack/spack/url.py
+++ b/lib/spack/spack/url.py
@ -28,17 +28,17 @@
 download location of the package, and figure out version and name information
 from there.
-Example: when spack is given the following URL:
+**Example:** when spack is given the following URL:
-    ftp://ftp.ruby-lang.org/pub/ruby/1.9/ruby-1.9.1-p243.tar.gz
+    https://www.hdfgroup.org/ftp/HDF/releases/HDF4.2.12/src/hdf-4.2.12.tar.gz
-It can figure out that the package name is ruby, and that it is at version
+It can figure out that the package name is ``hdf``, and that it is at version
-1.9.1-p243.  This is useful for making the creation of packages simple: a user
+``4.2.12``. This is useful for making the creation of packages simple: a user
 just supplies a URL and skeleton code is generated automatically.
-Spack can also figure out that it can most likely download 1.8.1 at this URL:
+Spack can also figure out that it can most likely download 4.2.6 at this URL:
-    ftp://ftp.ruby-lang.org/pub/ruby/1.9/ruby-1.8.1.tar.gz
+    https://www.hdfgroup.org/ftp/HDF/releases/HDF4.2.6/src/hdf-4.2.6.tar.gz
 This is useful if a user asks for a package at a particular version number;
 spack doesn't need anyone to tell it where to get the tarball even though
@ -112,16 +112,15 @@ def split_url_extension(path):
    extension, but in (2) & (3), the filename is IN a single final query
    argument.
-       This strips the URL into three pieces: prefix, ext, and suffix.
+    This strips the URL into three pieces: ``prefix``, ``ext``, and ``suffix``.
    The suffix contains anything that was stripped off the URL to
-       get at the file extension.  In (1), it will be '?raw=true', but
+    get at the file extension.  In (1), it will be ``'?raw=true'``, but
    in (2), it will be empty. In (3) the suffix is a parameter that follows
    after the file extension, e.g.:
-           1. ('https://github.com/losalamos/CLAMR/blob/packages/PowerParser_v2.0.7', '.tgz', '?raw=true')
+    1. ``('https://github.com/losalamos/CLAMR/blob/packages/PowerParser_v2.0.7', '.tgz', '?raw=true')``
-           2. ('http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.2.0/apache-cassandra-1.2.0-rc2-bin',
+    2. ``('http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.2.0/apache-cassandra-1.2.0-rc2-bin', '.tar.gz', None)``
-               '.tar.gz', None)
+    3. ``('https://gitlab.kitware.com/vtk/vtk/repository/archive', '.tar.bz2', '?ref=v7.0.0')``
           3. ('https://gitlab.kitware.com/vtk/vtk/repository/archive', '.tar.bz2', '?ref=v7.0.0')
    """
    prefix, ext, suffix = path, '', ''
@ -166,19 +165,44 @@ def determine_url_file_extension(path):
    return ext
-def parse_version_offset(path, debug=False):
+def parse_version_offset(path):
-    """Try to extract a version string from a filename or URL.  This is taken
+    """Try to extract a version string from a filename or URL.
-       largely from Homebrew's Version class."""
+
    :param str path: The filename or URL for the package
    :return: A tuple containing:
        version of the package,
        first index of version,
        length of version string,
        the index of the matching regex
        the matching regex
    :rtype: tuple
    :raises UndetectableVersionError: If the URL does not match any regexes
    """
    original_path = path
    # path:   The prefix of the URL, everything before the ext and suffix
    # ext:    The file extension
    # suffix: Any kind of query string that begins with a '?'
    path, ext, suffix = split_url_extension(path)
-    # Allow matches against the basename, to avoid including parent
+    # stem:   Everything from path after the final '/'
    # dirs in version name Remember the offset of the stem in the path
    stem = os.path.basename(path)
    offset = len(path) - len(stem)
-    version_types = [
+    # List of the following format:
    #
    # [
    #     (regex, string),
    #     ...
    # ]
    #
    # The first regex that matches string will be used to determine
    # the version of the package. Thefore, hyperspecific regexes should
    # come first while generic, catch-all regexes should come last.
    version_regexes = [
        # GitHub tarballs, e.g. v1.2.3
        (r'github.com/.+/(?:zip|tar)ball/v?((\d+\.)+\d+)$', path),
@ -258,16 +282,13 @@ def parse_version_offset(path, debug=False):
        (r'\/(\d\.\d+)\/', path),
        # e.g. http://www.ijg.org/files/jpegsrc.v8d.tar.gz
-        (r'\.v(\d+[a-z]?)', stem)]
+        (r'\.v(\d+[a-z]?)', stem)
    ]
-    for i, vtype in enumerate(version_types):
+    for i, version_regex in enumerate(version_regexes):
-        regex, match_string = vtype
+        regex, match_string = version_regex
        match = re.search(regex, match_string)
        if match and match.group(1) is not None:
            if debug:
                tty.msg("Parsing URL: %s" % path,
                        "  Matched regex %d: r'%s'" % (i, regex))
            version = match.group(1)
            start   = match.start(1)
@ -275,30 +296,74 @@ def parse_version_offset(path, debug=False):
            if match_string is stem:
                start += offset
-            return version, start, len(version)
+            return version, start, len(version), i, regex
    raise UndetectableVersionError(original_path)
-def parse_version(path, debug=False):
+def parse_version(path):
-    """Given a URL or archive name, extract a version from it and return
+    """Try to extract a version string from a filename or URL.
-       a version object.
+
    :param str path: The filename or URL for the package
    :return: The version of the package
    :rtype: spack.version.Version
    :raises UndetectableVersionError: If the URL does not match any regexes
    """
-    ver, start, l = parse_version_offset(path, debug=debug)
+    version, start, length, i, regex = parse_version_offset(path)
-    return Version(ver)
+    return Version(version)
-def parse_name_offset(path, v=None, debug=False):
+def parse_name_offset(path, v=None):
    """Try to determine the name of a package from its filename or URL.
    :param str path: The filename or URL for the package
    :param str v: The version of the package
    :return: A tuple containing:
        name of the package,
        first index of name,
        length of name,
        the index of the matching regex
        the matching regex
    :rtype: tuple
    :raises UndetectableNameError: If the URL does not match any regexes
    """
    original_path = path
    # We really need to know the version of the package
    # This helps us prevent collisions between the name and version
    if v is None:
-        v = parse_version(path, debug=debug)
+        try:
            v = parse_version(path)
        except UndetectableVersionError:
            # Not all URLs contain a version. We still want to be able
            # to determine a name if possible.
            v = ''
    # path:   The prefix of the URL, everything before the ext and suffix
    # ext:    The file extension
    # suffix: Any kind of query string that begins with a '?'
    path, ext, suffix = split_url_extension(path)
-    # Allow matching with either path or stem, as with the version.
+    # stem:   Everything from path after the final '/'
    stem = os.path.basename(path)
    offset = len(path) - len(stem)
-    name_types = [
+    # List of the following format:
    #
    # [
    #     (regex, string),
    #     ...
    # ]
    #
    # The first regex that matches string will be used to determine
    # the name of the package. Thefore, hyperspecific regexes should
    # come first while generic, catch-all regexes should come last.
    name_regexes = [
        (r'/sourceforge/([^/]+)/', path),
        (r'github.com/[^/]+/[^/]+/releases/download/%s/(.*)-%s$' %
         (v, v), path),
@ -316,10 +381,11 @@ def parse_name_offset(path, v=None, debug=False):
        (r'/([^/]+)%s' % v, path),
        (r'^([^/]+)[_.-]v?%s' % v, path),
-        (r'^([^/]+)%s' % v, path)]
+        (r'^([^/]+)%s' % v, path)
    ]
-    for i, name_type in enumerate(name_types):
+    for i, name_regex in enumerate(name_regexes):
-        regex, match_string = name_type
+        regex, match_string = name_regex
        match = re.search(regex, match_string)
        if match:
            name  = match.group(1)
@ -333,17 +399,38 @@ def parse_name_offset(path, v=None, debug=False):
            name = name.lower()
            name = re.sub('[_.]', '-', name)
-            return name, start, len(name)
+            return name, start, len(name), i, regex
-    raise UndetectableNameError(path)
+    raise UndetectableNameError(original_path)
 def parse_name(path, ver=None):
-    name, start, l = parse_name_offset(path, ver)
+    """Try to determine the name of a package from its filename or URL.
    :param str path: The filename or URL for the package
    :param str ver: The version of the package
    :return: The name of the package
    :rtype: str
    :raises UndetectableNameError: If the URL does not match any regexes
    """
    name, start, length, i, regex = parse_name_offset(path, ver)
    return name
 def parse_name_and_version(path):
    """Try to determine the name of a package and extract its version
    from its filename or URL.
    :param str path: The filename or URL for the package
    :return: A tuple containing:
        The name of the package
        The version of the package
    :rtype: tuple
    """
    ver = parse_version(path)
    name = parse_name(path, ver)
    return (name, ver)
@ -371,12 +458,12 @@ def cumsum(elts, init=0, fn=lambda x: x):
 def substitution_offsets(path):
    """This returns offsets for substituting versions and names in the
-       provided path.  It is a helper for substitute_version().
+       provided path.  It is a helper for :func:`substitute_version`.
    """
    # Get name and version offsets
    try:
-        ver,  vs, vl = parse_version_offset(path)
+        ver,  vs, vl, vi, vregex = parse_version_offset(path)
-        name, ns, nl = parse_name_offset(path, ver)
+        name, ns, nl, ni, nregex = parse_name_offset(path, ver)
    except UndetectableNameError:
        return (None, -1, -1, (), ver, vs, vl, (vs,))
    except UndetectableVersionError:
@ -447,18 +534,19 @@ def substitute_version(path, new_version):
    substitute the new version for it.  Replace all occurrences of
    the version *if* they don't overlap with the package name.
-       Simple example::
+    Simple example:
    .. code-block:: python
       substitute_version('http://www.mr511.de/software/libelf-0.8.13.tar.gz', '2.9.3')
-         ->'http://www.mr511.de/software/libelf-2.9.3.tar.gz'
+       >>> 'http://www.mr511.de/software/libelf-2.9.3.tar.gz'
-       Complex examples::
+    Complex example:
         substitute_version('http://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.0.tar.gz', 2.1)
         -> 'http://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.1.tar.gz'
-         # In this string, the "2" in mvapich2 is NOT replaced.
+    .. code-block:: python
         substitute_version('http://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.tar.gz', 2.1)
         -> 'http://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.1.tar.gz'
       substitute_version('https://www.hdfgroup.org/ftp/HDF/releases/HDF4.2.12/src/hdf-4.2.12.tar.gz', '2.3')
       >>> 'https://www.hdfgroup.org/ftp/HDF/releases/HDF2.3/src/hdf-2.3.tar.gz'
    """
    (name, ns, nl, noffs,
     ver,  vs, vl, voffs) = substitution_offsets(path)
@ -478,16 +566,15 @@ def color_url(path, **kwargs):
    """Color the parts of the url according to Spack's parsing.
    Colors are:
-          Cyan: The version found by parse_version_offset().
+       | Cyan: The version found by :func:`parse_version_offset`.
-          Red:  The name found by parse_name_offset().
+       | Red:  The name found by :func:`parse_name_offset`.
-          Green:   Instances of version string from substitute_version().
+       | Green:   Instances of version string from :func:`substitute_version`.
-          Magenta: Instances of the name (protected from substitution).
+       | Magenta: Instances of the name (protected from substitution).
       Optional args:
          errors=True    Append parse errors at end of string.
          subs=True      Color substitutions as well as parsed name/version.
    :param str path: The filename or URL for the package
    :keyword bool errors: Append parse errors at end of string.
    :keyword bool subs: Color substitutions as well as parsed name/version.
    """
    errors = kwargs.get('errors', False)
    subs   = kwargs.get('subs', False)