Refactor Spack's URL parsing commands (#2938)

* Replace `spack urls` and `spack url-parse` with `spack url` * Allow spack url list to only list incorrect parsings * Add spack url test reporting * Add unit tests for new URL commands
2017-01-31 10:14:52 -06:00 · 2017-01-31 10:14:52 -06:00 · 123f057089
commit 123f057089
parent 2e81fe4fb3
7 changed files with 796 additions and 311 deletions
--- a/lib/spack/docs/developer_guide.rst
+++ b/lib/spack/docs/developer_guide.rst
@ -300,6 +300,42 @@ Stage objects
 Writing commands
 ----------------

+Adding a new command to Spack is easy. Simply add a ``<name>.py`` file to
+``lib/spack/spack/cmd/``, where ``<name>`` is the name of the subcommand.
+At the bare minimum, two functions are required in this file:
+
+^^^^^^^^^^^^^^^^^^
+``setup_parser()``
+^^^^^^^^^^^^^^^^^^
+
+Unless your command doesn't accept any arguments, a ``setup_parser()``
+function is required to define what arguments and flags your command takes.
+See the `Argparse documentation <https://docs.python.org/2.7/library/argparse.html>`_
+for more details on how to add arguments.
+
+Some commands have a set of subcommands, like ``spack compiler find`` or
+``spack module refresh``. You can add subparsers to your parser to handle
+this. Check out ``spack edit --command compiler`` for an example of this.
+
+A lot of commands take the same arguments and flags. These arguments should
+be defined in ``lib/spack/spack/cmd/common/arguments.py`` so that they don't
+need to be redefined in multiple commands.
+
+^^^^^^^^^^^^
+``<name>()``
+^^^^^^^^^^^^
+
+In order to run your command, Spack searches for a function with the same
+name as your command in ``<name>.py``. This is the main method for your
+command, and can call other helper methods to handle common tasks.
+
+Remember, before adding a new command, think to yourself whether or not this
+new command is actually necessary. Sometimes, the functionality you desire
+can be added to an existing command. Also remember to add unit tests for
+your command. If it isn't used very frequently, changes to the rest of
+Spack can cause your command to break without sufficient unit tests to
+prevent this from happening.
+
 ----------
 Unit tests
 ----------
@ -312,14 +348,80 @@ Unit testing
 Developer commands
 ------------------

+.. _cmd-spack-doc:
+
 ^^^^^^^^^^^^^
 ``spack doc``
 ^^^^^^^^^^^^^

+.. _cmd-spack-test:
+
 ^^^^^^^^^^^^^^
 ``spack test``
 ^^^^^^^^^^^^^^

+.. _cmd-spack-url:
+
+^^^^^^^^^^^^^
+``spack url``
+^^^^^^^^^^^^^
+
+A package containing a single URL can be used to download several different
+versions of the package. If you've ever wondered how this works, all of the
+magic is in :mod:`spack.url`. This module contains methods for extracting
+the name and version of a package from its URL. The name is used by
+``spack create`` to guess the name of the package. By determining the version
+from the URL, Spack can replace it with other versions to determine where to
+download them from.
+
+The regular expressions in ``parse_name_offset`` and ``parse_version_offset``
+are used to extract the name and version, but they aren't perfect. In order
+to debug Spack's URL parsing support, the ``spack url`` command can be used.
+
+"""""""""""""""""""
+``spack url parse``
+"""""""""""""""""""
+
+If you need to debug a single URL, you can use the following command:
+
+.. command-output:: spack url parse http://cache.ruby-lang.org/pub/ruby/2.2/ruby-2.2.0.tar.gz
+
+You'll notice that the name and version of this URL are correctly detected,
+and you can even see which regular expressions it was matched to. However,
+you'll notice that when it substitutes the version number in, it doesn't
+replace the ``2.2`` with ``9.9`` where we would expect ``9.9.9b`` to live.
+This particular package may require a ``list_url`` or ``url_for_version``
+function.
+
+This command also accepts a ``--spider`` flag. If provided, Spack searches
+for other versions of the package and prints the matching URLs.
+
+""""""""""""""""""
+``spack url list``
+""""""""""""""""""
+
+This command lists every URL in every package in Spack. If given the
+``--color`` and ``--extrapolation`` flags, it also colors the part of
+the string that it detected to be the name and version. The
+``--incorrect-name`` and ``--incorrect-version`` flags can be used to
+print URLs that were not being parsed correctly.
+
+""""""""""""""""""
+``spack url test``
+""""""""""""""""""
+
+This command attempts to parse every URL for every package in Spack
+and prints a summary of how many of them are being correctly parsed.
+It also prints a histogram showing which regular expressions are being
+matched and how frequently:
+
+.. command-output:: spack url test
+
+This command is essential for anyone adding or changing the regular
+expressions that parse names and versions. By running this command
+before and after the change, you can make sure that your regular
+expression fixes more packages than it breaks.
+
 ---------
 Profiling
 ---------
--- a/lib/spack/docs/packaging_guide.rst
+++ b/lib/spack/docs/packaging_guide.rst
@ -712,8 +712,8 @@ is at ``http://example.com/downloads/foo-1.0.tar.gz``, Spack will look
 in ``http://example.com/downloads/`` for links to additional versions.
 If you need to search another path for download links, you can supply
 some extra attributes that control how your package finds new
-versions. See the documentation on `attribute_list_url`_ and
-`attribute_list_depth`_.
+versions. See the documentation on :ref:`attribute_list_url` and
+:ref:`attribute_list_depth`.

 .. note::

@ -728,6 +728,102 @@ versions. See the documentation on `attribute_list_url`_ and
    syntax errors, or the ``import`` will fail.  Use this once you've
    got your package in working order.

+--------------------
+Finding new versions
+--------------------
+
+You've already seen the ``homepage`` and ``url`` package attributes:
+
+.. code-block:: python
+   :linenos:
+
+   from spack import *
+
+
+   class Mpich(Package):
+      """MPICH is a high performance and widely portable implementation of
+         the Message Passing Interface (MPI) standard."""
+      homepage = "http://www.mpich.org"
+      url      = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
+
+These are class-level attributes used by Spack to show users
+information about the package, and to determine where to download its
+source code.
+
+Spack uses the tarball URL to extrapolate where to find other tarballs
+of the same package (e.g. in :ref:`cmd-spack-checksum`, but
+this does not always work.  This section covers ways you can tell
+Spack to find tarballs elsewhere.
+
+.. _attribute_list_url:
+
+^^^^^^^^^^^^
+``list_url``
+^^^^^^^^^^^^
+
+When spack tries to find available versions of packages (e.g. with
+:ref:`cmd-spack-checksum`), it spiders the parent directory
+of the tarball in the ``url`` attribute.  For example, for libelf, the
+url is:
+
+.. code-block:: python
+
+   url = "http://www.mr511.de/software/libelf-0.8.13.tar.gz"
+
+Here, Spack spiders ``http://www.mr511.de/software/`` to find similar
+tarball links and ultimately to make a list of available versions of
+``libelf``.
+
+For many packages, the tarball's parent directory may be unlistable,
+or it may not contain any links to source code archives.  In fact,
+many times additional package downloads aren't even available in the
+same directory as the download URL.
+
+For these, you can specify a separate ``list_url`` indicating the page
+to search for tarballs.  For example, ``libdwarf`` has the homepage as
+the ``list_url``, because that is where links to old versions are:
+
+.. code-block:: python
+   :linenos:
+
+   class Libdwarf(Package):
+       homepage = "http://www.prevanders.net/dwarf.html"
+       url      = "http://www.prevanders.net/libdwarf-20130729.tar.gz"
+       list_url = homepage
+
+.. _attribute_list_depth:
+
+^^^^^^^^^^^^^^
+``list_depth``
+^^^^^^^^^^^^^^
+
+``libdwarf`` and many other packages have a listing of available
+versions on a single webpage, but not all do.  For example, ``mpich``
+has a tarball URL that looks like this:
+
+.. code-block:: python
+
+   url = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
+
+But its downloads are in many different subdirectories of
+``http://www.mpich.org/static/downloads/``.  So, we need to add a
+``list_url`` *and* a ``list_depth`` attribute:
+
+.. code-block:: python
+   :linenos:
+
+   class Mpich(Package):
+       homepage   = "http://www.mpich.org"
+       url        = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
+       list_url   = "http://www.mpich.org/static/downloads/"
+       list_depth = 2
+
+By default, Spack only looks at the top-level page available at
+``list_url``.  ``list_depth`` tells it to follow up to 2 levels of
+links from the top-level page.  Note that here, this implies two
+levels of subdirectories, as the ``mpich`` website is structured much
+like a filesystem.  But ``list_depth`` really refers to link depth
+when spidering the page.

 .. _vcs-fetch:

@ -1241,103 +1337,6 @@ RPATHs in Spack are handled in one of three ways:
   links.  You can see this how this is used in the :ref:`PySide
   example <pyside-patch>` above.

--------------------
-Finding new versions
--------------------
-
-You've already seen the ``homepage`` and ``url`` package attributes:
-
-.. code-block:: python
-   :linenos:
-
-   from spack import *
-
-
-   class Mpich(Package):
-      """MPICH is a high performance and widely portable implementation of
-         the Message Passing Interface (MPI) standard."""
-      homepage = "http://www.mpich.org"
-      url      = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
-
-These are class-level attributes used by Spack to show users
-information about the package, and to determine where to download its
-source code.
-
-Spack uses the tarball URL to extrapolate where to find other tarballs
-of the same package (e.g. in :ref:`cmd-spack-checksum`, but
-this does not always work.  This section covers ways you can tell
-Spack to find tarballs elsewhere.
-
-.. _attribute_list_url:
-
-^^^^^^^^^^^^
-``list_url``
-^^^^^^^^^^^^
-
-When spack tries to find available versions of packages (e.g. with
-:ref:`cmd-spack-checksum`), it spiders the parent directory
-of the tarball in the ``url`` attribute.  For example, for libelf, the
-url is:
-
-.. code-block:: python
-
-   url = "http://www.mr511.de/software/libelf-0.8.13.tar.gz"
-
-Here, Spack spiders ``http://www.mr511.de/software/`` to find similar
-tarball links and ultimately to make a list of available versions of
-``libelf``.
-
-For many packages, the tarball's parent directory may be unlistable,
-or it may not contain any links to source code archives.  In fact,
-many times additional package downloads aren't even available in the
-same directory as the download URL.
-
-For these, you can specify a separate ``list_url`` indicating the page
-to search for tarballs.  For example, ``libdwarf`` has the homepage as
-the ``list_url``, because that is where links to old versions are:
-
-.. code-block:: python
-   :linenos:
-
-   class Libdwarf(Package):
-       homepage = "http://www.prevanders.net/dwarf.html"
-       url      = "http://www.prevanders.net/libdwarf-20130729.tar.gz"
-       list_url = homepage
-
-.. _attribute_list_depth:
-
-^^^^^^^^^^^^^^
-``list_depth``
-^^^^^^^^^^^^^^
-
-``libdwarf`` and many other packages have a listing of available
-versions on a single webpage, but not all do.  For example, ``mpich``
-has a tarball URL that looks like this:
-
-.. code-block:: python
-
-   url = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
-
-But its downloads are in many different subdirectories of
-``http://www.mpich.org/static/downloads/``.  So, we need to add a
-``list_url`` *and* a ``list_depth`` attribute:
-
-.. code-block:: python
-   :linenos:
-
-   class Mpich(Package):
-       homepage   = "http://www.mpich.org"
-       url        = "http://www.mpich.org/static/downloads/3.0.4/mpich-3.0.4.tar.gz"
-       list_url   = "http://www.mpich.org/static/downloads/"
-       list_depth = 2
-
-By default, Spack only looks at the top-level page available at
-``list_url``.  ``list_depth`` tells it to follow up to 2 levels of
-links from the top-level page.  Note that here, this implies two
-levels of subdirectories, as the ``mpich`` website is structured much
-like a filesystem.  But ``list_depth`` really refers to link depth
-when spidering the page.
-
 .. _attribute_parallel:

 ---------------
--- a/lib/spack/spack/cmd/url.py
+++ b/lib/spack/spack/cmd/url.py
@ -0,0 +1,319 @@
+##############################################################################
+# Copyright (c) 2013-2016, Lawrence Livermore National Security, LLC.
+# Produced at the Lawrence Livermore National Laboratory.
+#
+# This file is part of Spack.
+# Created by Todd Gamblin, tgamblin@llnl.gov, All rights reserved.
+# LLNL-CODE-647188
+#
+# For details, see https://github.com/llnl/spack
+# Please also see the LICENSE file for our notice and the LGPL.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License (as
+# published by the Free Software Foundation) version 2.1, February 1999.
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the IMPLIED WARRANTY OF
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the terms and
+# conditions of the GNU Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+##############################################################################
+from __future__ import division, print_function
+
+from collections import defaultdict
+
+import spack
+
+from llnl.util import tty
+from spack.url import *
+from spack.util.web import find_versions_of_archive
+
+description = "debugging tool for url parsing"
+
+
+def setup_parser(subparser):
+    sp = subparser.add_subparsers(metavar='SUBCOMMAND', dest='subcommand')
+
+    # Parse
+    parse_parser = sp.add_parser('parse', help='attempt to parse a url')
+
+    parse_parser.add_argument(
+        'url',
+        help='url to parse')
+    parse_parser.add_argument(
+        '-s', '--spider', action='store_true',
+        help='spider the source page for versions')
+
+    # List
+    list_parser = sp.add_parser('list', help='list urls in all packages')
+
+    list_parser.add_argument(
+        '-c', '--color', action='store_true',
+        help='color the parsed version and name in the urls shown '
+             '(versions will be cyan, name red)')
+    list_parser.add_argument(
+        '-e', '--extrapolation', action='store_true',
+        help='color the versions used for extrapolation as well '
+             '(additional versions will be green, names magenta)')
+
+    excl_args = list_parser.add_mutually_exclusive_group()
+
+    excl_args.add_argument(
+        '-n', '--incorrect-name', action='store_true',
+        help='only list urls for which the name was incorrectly parsed')
+    excl_args.add_argument(
+        '-v', '--incorrect-version', action='store_true',
+        help='only list urls for which the version was incorrectly parsed')
+
+    # Test
+    sp.add_parser(
+        'test', help='print a summary of how well we are parsing package urls')
+
+
+def url(parser, args):
+    action = {
+        'parse': url_parse,
+        'list':  url_list,
+        'test':  url_test
+    }
+
+    action[args.subcommand](args)
+
+
+def url_parse(args):
+    url = args.url
+
+    tty.msg('Parsing URL: {0}'.format(url))
+    print()
+
+    ver,  vs, vl, vi, vregex = parse_version_offset(url)
+    tty.msg('Matched version regex {0:>2}: r{1!r}'.format(vi, vregex))
+
+    name, ns, nl, ni, nregex = parse_name_offset(url, ver)
+    tty.msg('Matched  name   regex {0:>2}: r{1!r}'.format(ni, nregex))
+
+    print()
+    tty.msg('Detected:')
+    try:
+        print_name_and_version(url)
+    except UrlParseError as e:
+        tty.error(str(e))
+
+    print('    name:    {0}'.format(name))
+    print('    version: {0}'.format(ver))
+    print()
+
+    tty.msg('Substituting version 9.9.9b:')
+    newurl = substitute_version(url, '9.9.9b')
+    print_name_and_version(newurl)
+
+    if args.spider:
+        print()
+        tty.msg('Spidering for versions:')
+        versions = find_versions_of_archive(url)
+
+        max_len = max(len(str(v)) for v in versions)
+
+        for v in sorted(versions):
+            print('{0:{1}}  {2}'.format(v, max_len, versions[v]))
+
+
+def url_list(args):
+    urls = set()
+
+    # Gather set of URLs from all packages
+    for pkg in spack.repo.all_packages():
+        url = getattr(pkg.__class__, 'url', None)
+        urls = url_list_parsing(args, urls, url, pkg)
+
+        for params in pkg.versions.values():
+            url = params.get('url', None)
+            urls = url_list_parsing(args, urls, url, pkg)
+
+    # Print URLs
+    for url in sorted(urls):
+        if args.color or args.extrapolation:
+            print(color_url(url, subs=args.extrapolation, errors=True))
+        else:
+            print(url)
+
+    # Return the number of URLs that were printed, only for testing purposes
+    return len(urls)
+
+
+def url_test(args):
+    # Collect statistics on how many URLs were correctly parsed
+    total_urls       = 0
+    correct_names    = 0
+    correct_versions = 0
+
+    # Collect statistics on which regexes were matched and how often
+    name_regex_dict    = dict()
+    name_count_dict    = defaultdict(int)
+    version_regex_dict = dict()
+    version_count_dict = defaultdict(int)
+
+    tty.msg('Generating a summary of URL parsing in Spack...')
+
+    # Loop through all packages
+    for pkg in spack.repo.all_packages():
+        urls = set()
+
+        url = getattr(pkg.__class__, 'url', None)
+        if url:
+            urls.add(url)
+
+        for params in pkg.versions.values():
+            url = params.get('url', None)
+            if url:
+                urls.add(url)
+
+        # Calculate statistics
+        for url in urls:
+            total_urls += 1
+
+            # Parse versions
+            version = None
+            try:
+                version, vs, vl, vi, vregex = parse_version_offset(url)
+                version_regex_dict[vi] = vregex
+                version_count_dict[vi] += 1
+                if version_parsed_correctly(pkg, version):
+                    correct_versions += 1
+            except UndetectableVersionError:
+                pass
+
+            # Parse names
+            try:
+                name, ns, nl, ni, nregex = parse_name_offset(url, version)
+                name_regex_dict[ni] = nregex
+                name_count_dict[ni] += 1
+                if name_parsed_correctly(pkg, name):
+                    correct_names += 1
+            except UndetectableNameError:
+                pass
+
+    print()
+    print('    Total URLs found:          {0}'.format(total_urls))
+    print('    Names correctly parsed:    {0:>4}/{1:>4} ({2:>6.2%})'.format(
+        correct_names, total_urls, correct_names / total_urls))
+    print('    Versions correctly parsed: {0:>4}/{1:>4} ({2:>6.2%})'.format(
+        correct_versions, total_urls, correct_versions / total_urls))
+    print()
+
+    tty.msg('Statistics on name regular expresions:')
+
+    print()
+    print('    Index  Count  Regular Expresion')
+    for ni in name_regex_dict:
+        print('    {0:>3}: {1:>6}   r{2!r}'.format(
+            ni, name_count_dict[ni], name_regex_dict[ni]))
+    print()
+
+    tty.msg('Statistics on version regular expresions:')
+
+    print()
+    print('    Index  Count  Regular Expresion')
+    for vi in version_regex_dict:
+        print('    {0:>3}: {1:>6}   r{2!r}'.format(
+            vi, version_count_dict[vi], version_regex_dict[vi]))
+    print()
+
+    # Return statistics, only for testing purposes
+    return (total_urls, correct_names, correct_versions,
+            name_count_dict, version_count_dict)
+
+
+def print_name_and_version(url):
+    """Prints a URL. Underlines the detected name with dashes and
+    the detected version with tildes.
+
+    :param str url: The url to parse
+    """
+    name, ns, nl, ntup, ver, vs, vl, vtup = substitution_offsets(url)
+    underlines = [' '] * max(ns + nl, vs + vl)
+    for i in range(ns, ns + nl):
+        underlines[i] = '-'
+    for i in range(vs, vs + vl):
+        underlines[i] = '~'
+
+    print('    {0}'.format(url))
+    print('    {0}'.format(''.join(underlines)))
+
+
+def url_list_parsing(args, urls, url, pkg):
+    """Helper function for :func:`url_list`.
+
+    :param argparse.Namespace args: The arguments given to ``spack url list``
+    :param set urls: List of URLs that have already been added
+    :param url: A URL to potentially add to ``urls`` depending on ``args``
+    :type url: str or None
+    :param spack.package.PackageBase pkg: The Spack package
+    :returns: The updated ``urls`` list
+    :rtype: set
+    """
+    if url:
+        if args.incorrect_name:
+            # Only add URLs whose name was incorrectly parsed
+            try:
+                name = parse_name(url)
+                if not name_parsed_correctly(pkg, name):
+                    urls.add(url)
+            except UndetectableNameError:
+                urls.add(url)
+        elif args.incorrect_version:
+            # Only add URLs whose version was incorrectly parsed
+            try:
+                version = parse_version(url)
+                if not version_parsed_correctly(pkg, version):
+                    urls.add(url)
+            except UndetectableVersionError:
+                urls.add(url)
+        else:
+            urls.add(url)
+
+    return urls
+
+
+def name_parsed_correctly(pkg, name):
+    """Determine if the name of a package was correctly parsed.
+
+    :param spack.package.PackageBase pkg: The Spack package
+    :param str name: The name that was extracted from the URL
+    :returns: True if the name was correctly parsed, else False
+    :rtype: bool
+    """
+    pkg_name = pkg.name
+
+    # After determining a name, `spack create` determines a build system.
+    # Some build systems prepend a special string to the front of the name.
+    # Since this can't be guessed from the URL, it would be unfair to say
+    # that these names are incorrectly parsed, so we remove them.
+    if pkg_name.startswith('r-'):
+        pkg_name = pkg_name[2:]
+    elif pkg_name.startswith('py-'):
+        pkg_name = pkg_name[3:]
+    elif pkg_name.startswith('octave-'):
+        pkg_name = pkg_name[7:]
+
+    return name == pkg_name
+
+
+def version_parsed_correctly(pkg, version):
+    """Determine if the version of a package was correctly parsed.
+
+    :param spack.package.PackageBase pkg: The Spack package
+    :param str version: The version that was extracted from the URL
+    :returns: True if the name was correctly parsed, else False
+    :rtype: bool
+    """
+    # If the version parsed from the URL is listed in a version()
+    # directive, we assume it was correctly parsed
+    for pkg_version in pkg.versions:
+        if str(pkg_version) == str(version):
+            return True
+    return False
--- a/lib/spack/spack/cmd/url_parse.py
+++ b/lib/spack/spack/cmd/url_parse.py
@ -1,79 +0,0 @@
-##############################################################################
-# Copyright (c) 2013-2016, Lawrence Livermore National Security, LLC.
-# Produced at the Lawrence Livermore National Laboratory.
-#
-# This file is part of Spack.
-# Created by Todd Gamblin, tgamblin@llnl.gov, All rights reserved.
-# LLNL-CODE-647188
-#
-# For details, see https://github.com/llnl/spack
-# Please also see the LICENSE file for our notice and the LGPL.
-#
-# This program is free software; you can redistribute it and/or modify
-# it under the terms of the GNU Lesser General Public License (as
-# published by the Free Software Foundation) version 2.1, February 1999.
-#
-# This program is distributed in the hope that it will be useful, but
-# WITHOUT ANY WARRANTY; without even the IMPLIED WARRANTY OF
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the terms and
-# conditions of the GNU Lesser General Public License for more details.
-#
-# You should have received a copy of the GNU Lesser General Public
-# License along with this program; if not, write to the Free Software
-# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
-##############################################################################
-import llnl.util.tty as tty
-
-import spack
-import spack.url
-from spack.util.web import find_versions_of_archive
-
-description = "show parsing of a URL, optionally spider web for versions"
-
-
-def setup_parser(subparser):
-    subparser.add_argument('url', help="url of a package archive")
-    subparser.add_argument(
-        '-s', '--spider', action='store_true',
-        help="spider the source page for versions")
-
-
-def print_name_and_version(url):
-    name, ns, nl, ntup, ver, vs, vl, vtup = spack.url.substitution_offsets(url)
-    underlines = [" "] * max(ns + nl, vs + vl)
-    for i in range(ns, ns + nl):
-        underlines[i] = '-'
-    for i in range(vs, vs + vl):
-        underlines[i] = '~'
-
-    print "    %s" % url
-    print "    %s" % ''.join(underlines)
-
-
-def url_parse(parser, args):
-    url = args.url
-
-    ver,  vs, vl = spack.url.parse_version_offset(url, debug=True)
-    name, ns, nl = spack.url.parse_name_offset(url, ver, debug=True)
-    print
-
-    tty.msg("Detected:")
-    try:
-        print_name_and_version(url)
-    except spack.url.UrlParseError as e:
-        tty.error(str(e))
-
-    print '    name:     %s' % name
-    print '    version:  %s' % ver
-
-    print
-    tty.msg("Substituting version 9.9.9b:")
-    newurl = spack.url.substitute_version(url, '9.9.9b')
-    print_name_and_version(newurl)
-
-    if args.spider:
-        print
-        tty.msg("Spidering for versions:")
-        versions = find_versions_of_archive(url)
-        for v in sorted(versions):
-            print "%-20s%s" % (v, versions[v])
--- a/lib/spack/spack/cmd/urls.py
+++ b/lib/spack/spack/cmd/urls.py
@ -1,59 +0,0 @@
-##############################################################################
-# Copyright (c) 2013-2016, Lawrence Livermore National Security, LLC.
-# Produced at the Lawrence Livermore National Laboratory.
-#
-# This file is part of Spack.
-# Created by Todd Gamblin, tgamblin@llnl.gov, All rights reserved.
-# LLNL-CODE-647188
-#
-# For details, see https://github.com/llnl/spack
-# Please also see the LICENSE file for our notice and the LGPL.
-#
-# This program is free software; you can redistribute it and/or modify
-# it under the terms of the GNU Lesser General Public License (as
-# published by the Free Software Foundation) version 2.1, February 1999.
-#
-# This program is distributed in the hope that it will be useful, but
-# WITHOUT ANY WARRANTY; without even the IMPLIED WARRANTY OF
-# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the terms and
-# conditions of the GNU Lesser General Public License for more details.
-#
-# You should have received a copy of the GNU Lesser General Public
-# License along with this program; if not, write to the Free Software
-# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
-##############################################################################
-import spack
-import spack.url
-
-description = "inspect urls used by packages in spack"
-
-
-def setup_parser(subparser):
-    subparser.add_argument(
-        '-c', '--color', action='store_true',
-        help="color the parsed version and name in the urls shown. "
-             "version will be cyan, name red")
-    subparser.add_argument(
-        '-e', '--extrapolation', action='store_true',
-        help="color the versions used for extrapolation as well. "
-             "additional versions are green, names magenta")
-
-
-def urls(parser, args):
-    urls = set()
-    for pkg in spack.repo.all_packages():
-        url = getattr(pkg.__class__, 'url', None)
-        if url:
-            urls.add(url)
-
-        for params in pkg.versions.values():
-            url = params.get('url', None)
-            if url:
-                urls.add(url)
-
-    for url in sorted(urls):
-        if args.color or args.extrapolation:
-            print spack.url.color_url(
-                url, subs=args.extrapolation, errors=True)
-        else:
-            print url
--- a/lib/spack/spack/test/cmd/url.py
+++ b/lib/spack/spack/test/cmd/url.py
@ -0,0 +1,116 @@
+##############################################################################
+# Copyright (c) 2013-2016, Lawrence Livermore National Security, LLC.
+# Produced at the Lawrence Livermore National Laboratory.
+#
+# This file is part of Spack.
+# Created by Todd Gamblin, tgamblin@llnl.gov, All rights reserved.
+# LLNL-CODE-647188
+#
+# For details, see https://github.com/llnl/spack
+# Please also see the LICENSE file for our notice and the LGPL.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU Lesser General Public License (as
+# published by the Free Software Foundation) version 2.1, February 1999.
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY; without even the IMPLIED WARRANTY OF
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the terms and
+# conditions of the GNU Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+##############################################################################
+import argparse
+import pytest
+
+from spack.cmd.url import *
+
+
+@pytest.fixture(scope='module')
+def parser():
+    """Returns the parser for the ``url`` command"""
+    parser = argparse.ArgumentParser()
+    setup_parser(parser)
+    return parser
+
+
+class MyPackage:
+    def __init__(self, name, versions):
+        self.name = name
+        self.versions = versions
+
+
+def test_name_parsed_correctly():
+    # Expected True
+    assert name_parsed_correctly(MyPackage('netcdf',         []), 'netcdf')
+    assert name_parsed_correctly(MyPackage('r-devtools',     []), 'devtools')
+    assert name_parsed_correctly(MyPackage('py-numpy',       []), 'numpy')
+    assert name_parsed_correctly(MyPackage('octave-splines', []), 'splines')
+
+    # Expected False
+    assert not name_parsed_correctly(MyPackage('',            []), 'hdf5')
+    assert not name_parsed_correctly(MyPackage('hdf5',        []), '')
+    assert not name_parsed_correctly(MyPackage('imagemagick', []), 'ImageMagick')  # noqa
+    assert not name_parsed_correctly(MyPackage('yaml-cpp',    []), 'yamlcpp')
+    assert not name_parsed_correctly(MyPackage('yamlcpp',     []), 'yaml-cpp')
+    assert not name_parsed_correctly(MyPackage('r-py-parser', []), 'parser')
+    assert not name_parsed_correctly(MyPackage('oce',         []), 'oce-0.18.0')   # noqa
+
+
+def test_version_parsed_correctly():
+    # Expected True
+    assert version_parsed_correctly(MyPackage('', ['1.2.3']),        '1.2.3')
+    assert version_parsed_correctly(MyPackage('', ['5.4a', '5.4b']), '5.4a')
+    assert version_parsed_correctly(MyPackage('', ['5.4a', '5.4b']), '5.4b')
+
+    # Expected False
+    assert not version_parsed_correctly(MyPackage('', []),         '1.2.3')
+    assert not version_parsed_correctly(MyPackage('', ['1.2.3']),  '')
+    assert not version_parsed_correctly(MyPackage('', ['1.2.3']),  '1.2.4')
+    assert not version_parsed_correctly(MyPackage('', ['3.4a']),   '3.4')
+    assert not version_parsed_correctly(MyPackage('', ['3.4']),    '3.4b')
+    assert not version_parsed_correctly(MyPackage('', ['0.18.0']), 'oce-0.18.0')   # noqa
+
+
+def test_url_parse(parser):
+    args = parser.parse_args(['parse', 'http://zlib.net/fossils/zlib-1.2.10.tar.gz'])
+    url(parser, args)
+
+
+@pytest.mark.xfail
+def test_url_parse_xfail(parser):
+    # No version in URL
+    args = parser.parse_args(['parse', 'http://www.netlib.org/voronoi/triangle.zip'])
+    url(parser, args)
+
+
+def test_url_list(parser):
+    args = parser.parse_args(['list'])
+    total_urls = url_list(args)
+
+    # The following two options should not change the number of URLs printed.
+    args = parser.parse_args(['list', '--color', '--extrapolation'])
+    colored_urls = url_list(args)
+    assert colored_urls == total_urls
+
+    # The following two options should print fewer URLs than the default.
+    # If they print the same number of URLs, something is horribly broken.
+    # If they say we missed 0 URLs, something is probably broken too.
+    args = parser.parse_args(['list', '--incorrect-name'])
+    incorrect_name_urls = url_list(args)
+    assert 0 < incorrect_name_urls < total_urls
+
+    args = parser.parse_args(['list', '--incorrect-version'])
+    incorrect_version_urls = url_list(args)
+    assert 0 < incorrect_version_urls < total_urls
+
+
+def test_url_test(parser):
+    args = parser.parse_args(['test'])
+    (total_urls, correct_names, correct_versions,
+     name_count_dict, version_count_dict) = url_test(args)
+
+    assert 0 < correct_names    <= sum(name_count_dict.values())    <= total_urls  # noqa
+    assert 0 < correct_versions <= sum(version_count_dict.values()) <= total_urls  # noqa
--- a/lib/spack/spack/url.py
+++ b/lib/spack/spack/url.py
@ -28,17 +28,17 @@
 download location of the package, and figure out version and name information
 from there.

-Example: when spack is given the following URL:
+**Example:** when spack is given the following URL:

-    ftp://ftp.ruby-lang.org/pub/ruby/1.9/ruby-1.9.1-p243.tar.gz
+    https://www.hdfgroup.org/ftp/HDF/releases/HDF4.2.12/src/hdf-4.2.12.tar.gz

-It can figure out that the package name is ruby, and that it is at version
-1.9.1-p243.  This is useful for making the creation of packages simple: a user
+It can figure out that the package name is ``hdf``, and that it is at version
+``4.2.12``. This is useful for making the creation of packages simple: a user
 just supplies a URL and skeleton code is generated automatically.

-Spack can also figure out that it can most likely download 1.8.1 at this URL:
+Spack can also figure out that it can most likely download 4.2.6 at this URL:

-    ftp://ftp.ruby-lang.org/pub/ruby/1.9/ruby-1.8.1.tar.gz
+    https://www.hdfgroup.org/ftp/HDF/releases/HDF4.2.6/src/hdf-4.2.6.tar.gz

 This is useful if a user asks for a package at a particular version number;
 spack doesn't need anyone to tell it where to get the tarball even though
@ -104,24 +104,23 @@ def strip_query_and_fragment(path):
 def split_url_extension(path):
    """Some URLs have a query string, e.g.:

-          1. https://github.com/losalamos/CLAMR/blob/packages/PowerParser_v2.0.7.tgz?raw=true
-          2. http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.2.0/apache-cassandra-1.2.0-rc2-bin.tar.gz
-          3. https://gitlab.kitware.com/vtk/vtk/repository/archive.tar.bz2?ref=v7.0.0
+    1. https://github.com/losalamos/CLAMR/blob/packages/PowerParser_v2.0.7.tgz?raw=true
+    2. http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.2.0/apache-cassandra-1.2.0-rc2-bin.tar.gz
+    3. https://gitlab.kitware.com/vtk/vtk/repository/archive.tar.bz2?ref=v7.0.0

-       In (1), the query string needs to be stripped to get at the
-       extension, but in (2) & (3), the filename is IN a single final query
-       argument.
+    In (1), the query string needs to be stripped to get at the
+    extension, but in (2) & (3), the filename is IN a single final query
+    argument.

-       This strips the URL into three pieces: prefix, ext, and suffix.
-       The suffix contains anything that was stripped off the URL to
-       get at the file extension.  In (1), it will be '?raw=true', but
-       in (2), it will be empty. In (3) the suffix is a parameter that follows
-       after the file extension, e.g.:
+    This strips the URL into three pieces: ``prefix``, ``ext``, and ``suffix``.
+    The suffix contains anything that was stripped off the URL to
+    get at the file extension.  In (1), it will be ``'?raw=true'``, but
+    in (2), it will be empty. In (3) the suffix is a parameter that follows
+    after the file extension, e.g.:

-           1. ('https://github.com/losalamos/CLAMR/blob/packages/PowerParser_v2.0.7', '.tgz', '?raw=true')
-           2. ('http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.2.0/apache-cassandra-1.2.0-rc2-bin',
-               '.tar.gz', None)
-           3. ('https://gitlab.kitware.com/vtk/vtk/repository/archive', '.tar.bz2', '?ref=v7.0.0')
+    1. ``('https://github.com/losalamos/CLAMR/blob/packages/PowerParser_v2.0.7', '.tgz', '?raw=true')``
+    2. ``('http://www.apache.org/dyn/closer.cgi?path=/cassandra/1.2.0/apache-cassandra-1.2.0-rc2-bin', '.tar.gz', None)``
+    3. ``('https://gitlab.kitware.com/vtk/vtk/repository/archive', '.tar.bz2', '?ref=v7.0.0')``
    """
    prefix, ext, suffix = path, '', ''

@ -149,7 +148,7 @@ def determine_url_file_extension(path):
    """This returns the type of archive a URL refers to.  This is
       sometimes confusing because of URLs like:

-           (1) https://github.com/petdance/ack/tarball/1.93_02
+       (1) https://github.com/petdance/ack/tarball/1.93_02

       Where the URL doesn't actually contain the filename.  We need
       to know what type it is so that we can appropriately name files
@ -166,19 +165,44 @@ def determine_url_file_extension(path):
    return ext


-def parse_version_offset(path, debug=False):
-    """Try to extract a version string from a filename or URL.  This is taken
-       largely from Homebrew's Version class."""
+def parse_version_offset(path):
+    """Try to extract a version string from a filename or URL.
+
+    :param str path: The filename or URL for the package
+
+    :return: A tuple containing:
+        version of the package,
+        first index of version,
+        length of version string,
+        the index of the matching regex
+        the matching regex
+
+    :rtype: tuple
+
+    :raises UndetectableVersionError: If the URL does not match any regexes
+    """
    original_path = path

+    # path:   The prefix of the URL, everything before the ext and suffix
+    # ext:    The file extension
+    # suffix: Any kind of query string that begins with a '?'
    path, ext, suffix = split_url_extension(path)

-    # Allow matches against the basename, to avoid including parent
-    # dirs in version name Remember the offset of the stem in the path
+    # stem:   Everything from path after the final '/'
    stem = os.path.basename(path)
    offset = len(path) - len(stem)

-    version_types = [
+    # List of the following format:
+    #
+    # [
+    #     (regex, string),
+    #     ...
+    # ]
+    #
+    # The first regex that matches string will be used to determine
+    # the version of the package. Thefore, hyperspecific regexes should
+    # come first while generic, catch-all regexes should come last.
+    version_regexes = [
        # GitHub tarballs, e.g. v1.2.3
        (r'github.com/.+/(?:zip|tar)ball/v?((\d+\.)+\d+)$', path),

@ -258,16 +282,13 @@ def parse_version_offset(path, debug=False):
        (r'\/(\d\.\d+)\/', path),

        # e.g. http://www.ijg.org/files/jpegsrc.v8d.tar.gz
-        (r'\.v(\d+[a-z]?)', stem)]
+        (r'\.v(\d+[a-z]?)', stem)
+    ]

-    for i, vtype in enumerate(version_types):
-        regex, match_string = vtype
+    for i, version_regex in enumerate(version_regexes):
+        regex, match_string = version_regex
        match = re.search(regex, match_string)
        if match and match.group(1) is not None:
-            if debug:
-                tty.msg("Parsing URL: %s" % path,
-                        "  Matched regex %d: r'%s'" % (i, regex))
-
            version = match.group(1)
            start   = match.start(1)

@ -275,30 +296,74 @@ def parse_version_offset(path, debug=False):
            if match_string is stem:
                start += offset

-            return version, start, len(version)
+            return version, start, len(version), i, regex

    raise UndetectableVersionError(original_path)


-def parse_version(path, debug=False):
-    """Given a URL or archive name, extract a version from it and return
-       a version object.
+def parse_version(path):
+    """Try to extract a version string from a filename or URL.
+
+    :param str path: The filename or URL for the package
+
+    :return: The version of the package
+    :rtype: spack.version.Version
+
+    :raises UndetectableVersionError: If the URL does not match any regexes
    """
-    ver, start, l = parse_version_offset(path, debug=debug)
-    return Version(ver)
+    version, start, length, i, regex = parse_version_offset(path)
+    return Version(version)


-def parse_name_offset(path, v=None, debug=False):
+def parse_name_offset(path, v=None):
+    """Try to determine the name of a package from its filename or URL.
+
+    :param str path: The filename or URL for the package
+    :param str v: The version of the package
+
+    :return: A tuple containing:
+        name of the package,
+        first index of name,
+        length of name,
+        the index of the matching regex
+        the matching regex
+
+    :rtype: tuple
+
+    :raises UndetectableNameError: If the URL does not match any regexes
+    """
+    original_path = path
+
+    # We really need to know the version of the package
+    # This helps us prevent collisions between the name and version
    if v is None:
-        v = parse_version(path, debug=debug)
+        try:
+            v = parse_version(path)
+        except UndetectableVersionError:
+            # Not all URLs contain a version. We still want to be able
+            # to determine a name if possible.
+            v = ''

+    # path:   The prefix of the URL, everything before the ext and suffix
+    # ext:    The file extension
+    # suffix: Any kind of query string that begins with a '?'
    path, ext, suffix = split_url_extension(path)

-    # Allow matching with either path or stem, as with the version.
+    # stem:   Everything from path after the final '/'
    stem = os.path.basename(path)
    offset = len(path) - len(stem)

-    name_types = [
+    # List of the following format:
+    #
+    # [
+    #     (regex, string),
+    #     ...
+    # ]
+    #
+    # The first regex that matches string will be used to determine
+    # the name of the package. Thefore, hyperspecific regexes should
+    # come first while generic, catch-all regexes should come last.
+    name_regexes = [
        (r'/sourceforge/([^/]+)/', path),
        (r'github.com/[^/]+/[^/]+/releases/download/%s/(.*)-%s$' %
         (v, v), path),
@ -316,10 +381,11 @@ def parse_name_offset(path, v=None, debug=False):
        (r'/([^/]+)%s' % v, path),

        (r'^([^/]+)[_.-]v?%s' % v, path),
-        (r'^([^/]+)%s' % v, path)]
+        (r'^([^/]+)%s' % v, path)
+    ]

-    for i, name_type in enumerate(name_types):
-        regex, match_string = name_type
+    for i, name_regex in enumerate(name_regexes):
+        regex, match_string = name_regex
        match = re.search(regex, match_string)
        if match:
            name  = match.group(1)
@ -333,17 +399,38 @@ def parse_name_offset(path, v=None, debug=False):
            name = name.lower()
            name = re.sub('[_.]', '-', name)

-            return name, start, len(name)
+            return name, start, len(name), i, regex

-    raise UndetectableNameError(path)
+    raise UndetectableNameError(original_path)


 def parse_name(path, ver=None):
-    name, start, l = parse_name_offset(path, ver)
+    """Try to determine the name of a package from its filename or URL.
+
+    :param str path: The filename or URL for the package
+    :param str ver: The version of the package
+
+    :return: The name of the package
+    :rtype: str
+
+    :raises UndetectableNameError: If the URL does not match any regexes
+    """
+    name, start, length, i, regex = parse_name_offset(path, ver)
    return name


 def parse_name_and_version(path):
+    """Try to determine the name of a package and extract its version
+    from its filename or URL.
+
+    :param str path: The filename or URL for the package
+
+    :return: A tuple containing:
+        The name of the package
+        The version of the package
+
+    :rtype: tuple
+    """
    ver = parse_version(path)
    name = parse_name(path, ver)
    return (name, ver)
@ -371,12 +458,12 @@ def cumsum(elts, init=0, fn=lambda x: x):

 def substitution_offsets(path):
    """This returns offsets for substituting versions and names in the
-       provided path.  It is a helper for substitute_version().
+       provided path.  It is a helper for :func:`substitute_version`.
    """
    # Get name and version offsets
    try:
-        ver,  vs, vl = parse_version_offset(path)
-        name, ns, nl = parse_name_offset(path, ver)
+        ver,  vs, vl, vi, vregex = parse_version_offset(path)
+        name, ns, nl, ni, nregex = parse_name_offset(path, ver)
    except UndetectableNameError:
        return (None, -1, -1, (), ver, vs, vl, (vs,))
    except UndetectableVersionError:
@ -444,21 +531,22 @@ def wildcard_version(path):

 def substitute_version(path, new_version):
    """Given a URL or archive name, find the version in the path and
-       substitute the new version for it.  Replace all occurrences of
-       the version *if* they don't overlap with the package name.
+    substitute the new version for it.  Replace all occurrences of
+    the version *if* they don't overlap with the package name.

-       Simple example::
-         substitute_version('http://www.mr511.de/software/libelf-0.8.13.tar.gz', '2.9.3')
-         ->'http://www.mr511.de/software/libelf-2.9.3.tar.gz'
+    Simple example:

-       Complex examples::
-         substitute_version('http://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.0.tar.gz', 2.1)
-         -> 'http://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.1.tar.gz'
+    .. code-block:: python

-         # In this string, the "2" in mvapich2 is NOT replaced.
-         substitute_version('http://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.tar.gz', 2.1)
-         -> 'http://mvapich.cse.ohio-state.edu/download/mvapich/mv2/mvapich2-2.1.tar.gz'
+       substitute_version('http://www.mr511.de/software/libelf-0.8.13.tar.gz', '2.9.3')
+       >>> 'http://www.mr511.de/software/libelf-2.9.3.tar.gz'

+    Complex example:
+
+    .. code-block:: python
+
+       substitute_version('https://www.hdfgroup.org/ftp/HDF/releases/HDF4.2.12/src/hdf-4.2.12.tar.gz', '2.3')
+       >>> 'https://www.hdfgroup.org/ftp/HDF/releases/HDF2.3/src/hdf-2.3.tar.gz'
    """
    (name, ns, nl, noffs,
     ver,  vs, vl, voffs) = substitution_offsets(path)
@ -477,17 +565,16 @@ def substitute_version(path, new_version):
 def color_url(path, **kwargs):
    """Color the parts of the url according to Spack's parsing.

-       Colors are:
-          Cyan: The version found by parse_version_offset().
-          Red:  The name found by parse_name_offset().
+    Colors are:
+       | Cyan: The version found by :func:`parse_version_offset`.
+       | Red:  The name found by :func:`parse_name_offset`.

-          Green:   Instances of version string from substitute_version().
-          Magenta: Instances of the name (protected from substitution).
-
-       Optional args:
-          errors=True    Append parse errors at end of string.
-          subs=True      Color substitutions as well as parsed name/version.
+       | Green:   Instances of version string from :func:`substitute_version`.
+       | Magenta: Instances of the name (protected from substitution).

+    :param str path: The filename or URL for the package
+    :keyword bool errors: Append parse errors at end of string.
+    :keyword bool subs: Color substitutions as well as parsed name/version.
    """
    errors = kwargs.get('errors', False)
    subs   = kwargs.get('subs', False)