Mitar · Mitar · Mitar · Remi Rampin · Sujen · Evan Peterson
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -34,6 +34,33 @@ tests:
    - python3 setup.py sdist
    - pip3 install $(ls dist/*)
    - rm -rf d3m d3m.*
+    - pip3 freeze
+    - pip3 check
+    - python3 ./run_tests.py
+
+tests_oldest_dependencies:
+  stage: build
+
+  image: registry.gitlab.com/datadrivendiscovery/images/testing:ubuntu-bionic-python36
+
+  services:
+    - docker:dind
+
+  before_script:
+    - docker info
+
+  variables:
+    GIT_SUBMODULE_STRATEGY: recursive
+    DOCKER_HOST: tcp://docker:2375
+
+  script:
+    - sed -i "s/__version__ = 'devel'/__version__ = '1970.1.1'/" d3m/__init__.py
+    - python3 setup.py sdist
+    - pip3 install $(ls dist/*)
+    - rm -rf d3m d3m.*
+    - ./oldest_dependencies.py | pip3 install --upgrade --upgrade-strategy only-if-needed --exists-action w --requirement /dev/stdin
+    - pip3 freeze
+    - pip3 check
    - python3 ./run_tests.py

 benchmarks:
@@ -68,6 +95,7 @@ test_datasets:
  before_script:
    - docker info
    - git -C /data/datasets show -s
+    - git -C /data/datasets_public show -s

  variables:
    GIT_SUBMODULE_STRATEGY: recursive
@@ -78,7 +106,10 @@ test_datasets:
    - python3 setup.py sdist
    - pip3 install $(ls dist/*)
    - rm -rf d3m
-    - find /data/datasets -name datasetDoc.json -print0 | xargs -0 python3 -m d3m dataset describe --continue --list --time
+    - find /data/datasets -name datasetDoc.json -print0 | xargs -r -0 python3 -m d3m dataset describe --continue --list --time
+    - find /data/datasets_public -name datasetDoc.json -print0 | xargs -r -0 python3 -m d3m dataset describe --continue --list --time
+    - find /data/datasets -name problemDoc.json -print0 | xargs -r -0 python3 -m d3m problem describe --continue --list --no-print
+    - find /data/datasets_public -name problemDoc.json -print0 | xargs -r -0 python3 -m d3m problem describe --continue --list --no-print

 run_check:
  stage: build

--- a/HISTORY.md
+++ b/HISTORY.md
+## v2020.1.9
+
+### Enhancements
+
+* Support for D3M datasets with minimal metadata.
+  [#429](https://gitlab.com/datadrivendiscovery/d3m/issues/429)
+  [!327](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/327)
+* Pipeline runs (and in fact many other input documents) can now be directly used gzipped
+  in all CLI commands. They have to have filename end with `.gz` for decompression to happen
+  automatically.
+  [#420](https://gitlab.com/datadrivendiscovery/d3m/issues/420)
+  [!317](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/317)
+* Made problem descriptions again more readable when converted to JSON.
+  [!316](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/316)
+* Improved YAML handling to encourage faster C implementation.
+  [#416](https://gitlab.com/datadrivendiscovery/d3m/issues/416)
+  [!313](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/313)
+
+### Bugfixes
+
+* Fixed the error message if all required CLI arguments are not passed to the runtime.
+  [#411](https://gitlab.com/datadrivendiscovery/d3m/issues/411)
+  [!319](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/319)
+* Removed assumption that all successful pipeline run steps have method calls.
+  [#422](https://gitlab.com/datadrivendiscovery/d3m/issues/422)
+  [!320](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/320)
+* Fixed "Duplicate problem ID" warnings when multiple problem descriptions
+  have the same problem ID, but in fact they are the same problem description.
+  No warning is made in this case anymore.
+  [#417](https://gitlab.com/datadrivendiscovery/d3m/issues/417)
+  [!321](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/321)
+* Fixed the use of D3M container types in recent versions of Keras and TensorFlow.
+  [#426](https://gitlab.com/datadrivendiscovery/d3m/issues/426)
+  [!322](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/322)
+* Fixed `validate` CLI commands to work on YAML files.
+
+### Other
+
+* Updated upper bounds of core dependencies to latest available versions.
+  [#427](https://gitlab.com/datadrivendiscovery/d3m/issues/427)
+  [!325](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/325)
+* Refactored default pipeline run parser implementation to make it
+  easier to provide alternative dataset and problem resolvers.
+  [!314](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/314)
+* Moved out local test primitives into [`tests/data` git submodule](https://gitlab.com/datadrivendiscovery/tests-data).
+  Now all test primitives are in one place.
+  [#254](https://gitlab.com/datadrivendiscovery/d3m/issues/254)
+  [!312](https://gitlab.com/datadrivendiscovery/d3m/merge_requests/312)
+
 ## v2019.11.10

 * Support for version 4.0.0 of D3M dataset schema has been added.

--- a/HOW_TO_RELEASE.md
+++ b/HOW_TO_RELEASE.md
@@ -27,7 +27,7 @@
  * Commit with message `Version bump for development.`
  * `git push`
 * After a release:
-  * Create a new [`core` Docker image](https://gitlab.com/datadrivendiscovery/images) for the release.
+  * Create a new [`core` and `primitives` Docker images](https://gitlab.com/datadrivendiscovery/images) for the release.
  * Add new release to the [primitives index repository](https://gitlab.com/datadrivendiscovery/primitives/blob/master/HOW_TO_MANAGE.md).

 If there is a need for a patch version to fix a released version on the same day,

--- a/LICENSE.txt
+++ b/LICENSE.txt
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/MANIFEST.in
+++ b/MANIFEST.in
+include README.md
+include LICENSE.txt
--- a/README.md
+++ b/README.md
@@ -9,6 +9,7 @@ This package works with Python 3.6+ and pip 19+. You need to have the following

 * `libssl-dev`
 * `libcurl4-openssl-dev`
+* `libyaml-dev`

 You can install latest stable version from [PyPI](https://pypi.org/):


--- a/d3m/__init__.py
+++ b/d3m/__init__.py
-__version__ = '2019.11.10'
+__version__ = '2020.1.9'
 __description__ = 'Common code for D3M project'
 __author__ = 'DARPA D3M Program'


--- a/d3m/cli.py
+++ b/d3m/cli.py
@@ -212,6 +212,11 @@ def problem_configure_parser(parser: argparse.ArgumentParser, *, skip_arguments:
            '-o', '--output', type=utils.FileType('w', encoding='utf8'), default='-', action='store',
            help="save output to a file, default stdout",
        )
+    if 'no_print' not in skip_arguments:
+        describe_parser.add_argument(
+            '--no-print', default=False, action='store_true',
+            help="do not print JSON",
+        )
    if 'problems' not in skip_arguments:
        describe_parser.add_argument(
            'problems', metavar='PROBLEM', nargs='+',
@@ -477,7 +482,7 @@ def runtime_handler(
                '{command} requires either -u/--input-run or the following arguments: {manual_arguments}'.format(
                    command=arguments.runtime_command,
                    manual_arguments=', '.join(
-                        name for (name, dest) in manual_config if getattr(arguments, dest, None) is not None
+                        name for (name, dest) in manual_config
                    ),
                )
            )

--- a/d3m/container/dataset.py
+++ b/d3m/container/dataset.py
@@ -1335,13 +1335,6 @@ class D3MDatasetLoader(Loader):
        column_names = None
        data_path = os.path.join(dataset_path, data_resource['resPath'])

-        expected_names = None
-        if data_resource.get('columns', None):
-            expected_names = []
-            for i, column in enumerate(data_resource['columns']):
-                assert i == column['colIndex'], (i, column['colIndex'])
-                expected_names.append(column['colName'])
-
        if utils.is_sequence(data_resource['resFormat']) and len(data_resource['resFormat']) == 1:
            resource_format = data_resource['resFormat'][0]
        elif isinstance(data_resource['resFormat'], typing.Mapping) and len(data_resource['resFormat']) == 1:
@@ -1352,7 +1345,6 @@ class D3MDatasetLoader(Loader):
        if resource_format in ['text/csv', 'text/csv+gzip']:
            data = pandas.read_csv(
                data_path,
-                usecols=expected_names,
                # We do not want to do any conversion of values at this point.
                # This should be done by primitives later on.
                dtype=str,
@@ -1368,12 +1360,6 @@ class D3MDatasetLoader(Loader):

            column_names = list(data.columns)

-            if expected_names is not None and expected_names != column_names:
-                raise ValueError("Mismatch between column names in data {column_names} and expected names {expected_names}.".format(
-                    column_names=column_names,
-                    expected_names=expected_names,
-                ))
-
            if hash is not None:
                # We include both the filename and the content.
                # TODO: Currently we read the file twice, once for reading and once to compute digest. Could we do it in one pass? Would it make it faster?
@@ -1424,60 +1410,76 @@ class D3MDatasetLoader(Loader):
                'structural_type': str,
            })

-        if expected_names is not None:
-            for i, column in enumerate(data_resource['columns']):
-                column_metadata = self._get_column_metadata(column)
-
-                if 'https://metadata.datadrivendiscovery.org/types/Boundary' in column_metadata['semantic_types'] and 'boundary_for' not in column_metadata:
-                    # Let's reconstruct for which column this is a boundary: currently
-                    # this seems to be the first non-boundary column before this one.
-                    for column_index in range(i - 1, 0, -1):
-                        column_semantic_types = metadata.query((data_resource['resID'], metadata_base.ALL_ELEMENTS, column_index)).get('semantic_types', ())
-                        if 'https://metadata.datadrivendiscovery.org/types/Boundary' not in column_semantic_types:
-                            column_metadata['boundary_for'] = {
-                                'resource_id': data_resource['resID'],
-                                'column_index': column_index,
-                            }
-                            break
+        metadata_columns = {}
+        for column in data_resource.get('columns', []):
+            metadata_columns[column['colIndex']] = column
+
+        for i in range(len(column_names)):
+            if i in metadata_columns:
+                if column_names[i] != metadata_columns[i]['colName']:
+                    raise ValueError("Mismatch between column name in data '{data_name}' and column name in metadata '{metadata_name}'.".format(
+                        data_name=column_names[i],
+                        metadata_name=metadata_columns[i]['colName'],
+                    ))
+
+                column_metadata = self._get_column_metadata(metadata_columns[i])
+            else:
+                column_metadata = {
+                    'semantic_types': [
+                        D3M_COLUMN_TYPE_CONSTANTS_TO_SEMANTIC_TYPES['unknown'],
+                    ],
+                }

-                metadata = metadata.update((data_resource['resID'], metadata_base.ALL_ELEMENTS, i), column_metadata)
-
-            current_boundary_start = None
-            current_boundary_list: typing.Tuple[str, ...] = None
-            column_index = 0
-            while column_index < len(data_resource['columns']):
-                column_semantic_types = metadata.query((data_resource['resID'], metadata_base.ALL_ELEMENTS, column_index)).get('semantic_types', ())
-                if is_simple_boundary(column_semantic_types):
-                    # Let's reconstruct which type of a boundary this is. Heuristic is simple.
-                    # If there are two boundary columns next to each other, it is an interval.
-                    if current_boundary_start is None:
-                        assert current_boundary_list is None
-
-                        count = 1
-                        for next_column_index in range(column_index + 1, len(data_resource['columns'])):
-                            if is_simple_boundary(metadata.query((data_resource['resID'], metadata_base.ALL_ELEMENTS, next_column_index)).get('semantic_types', ())):
-                                count += 1
-                            else:
-                                break
-
-                        if count == 2:
-                            current_boundary_start = column_index
-                            current_boundary_list = INTERVAL_SEMANTIC_TYPES
+            if 'https://metadata.datadrivendiscovery.org/types/Boundary' in column_metadata['semantic_types'] and 'boundary_for' not in column_metadata:
+                # Let's reconstruct for which column this is a boundary: currently
+                # this seems to be the first non-boundary column before this one.
+                for column_index in range(i - 1, 0, -1):
+                    column_semantic_types = metadata.query((data_resource['resID'], metadata_base.ALL_ELEMENTS, column_index)).get('semantic_types', ())
+                    if 'https://metadata.datadrivendiscovery.org/types/Boundary' not in column_semantic_types:
+                        column_metadata['boundary_for'] = {
+                            'resource_id': data_resource['resID'],
+                            'column_index': column_index,
+                        }
+                        break
+
+            metadata = metadata.update((data_resource['resID'], metadata_base.ALL_ELEMENTS, i), column_metadata)
+
+        current_boundary_start = None
+        current_boundary_list: typing.Tuple[str, ...] = None
+        column_index = 0
+        while column_index < len(column_names):
+            column_semantic_types = metadata.query((data_resource['resID'], metadata_base.ALL_ELEMENTS, column_index)).get('semantic_types', ())
+            if is_simple_boundary(column_semantic_types):
+                # Let's reconstruct which type of a boundary this is. Heuristic is simple.
+                # If there are two boundary columns next to each other, it is an interval.
+                if current_boundary_start is None:
+                    assert current_boundary_list is None
+
+                    count = 1
+                    for next_column_index in range(column_index + 1, len(column_names)):
+                        if is_simple_boundary(metadata.query((data_resource['resID'], metadata_base.ALL_ELEMENTS, next_column_index)).get('semantic_types', ())):
+                            count += 1
                        else:
-                            # Unsupported group of boundary columns, let's skip them all.
-                            column_index += count
-                            continue
+                            break

-                    column_semantic_types = column_semantic_types + (current_boundary_list[column_index - current_boundary_start],)
-                    metadata = metadata.update((data_resource['resID'], metadata_base.ALL_ELEMENTS, column_index), {
-                        'semantic_types': column_semantic_types,
-                    })
+                    if count == 2:
+                        current_boundary_start = column_index
+                        current_boundary_list = INTERVAL_SEMANTIC_TYPES
+                    else:
+                        # Unsupported group of boundary columns, let's skip them all.
+                        column_index += count
+                        continue
+
+                column_semantic_types = column_semantic_types + (current_boundary_list[column_index - current_boundary_start],)
+                metadata = metadata.update((data_resource['resID'], metadata_base.ALL_ELEMENTS, column_index), {
+                    'semantic_types': column_semantic_types,
+                })

-                    if column_index - current_boundary_start + 1 == len(current_boundary_list):
-                        current_boundary_start = None
-                        current_boundary_list = None
+                if column_index - current_boundary_start + 1 == len(current_boundary_list):
+                    current_boundary_start = None
+                    current_boundary_list = None

-                column_index += 1
+            column_index += 1

        return data, metadata

@@ -3142,7 +3144,17 @@ if pyarrow_lib is not None:
    )


-def get_dataset(dataset_uri: str, *, compute_digest: ComputeDigest = ComputeDigest.ONLY_IF_MISSING, strict_digest: bool = False, lazy: bool = False) -> Dataset:
+def get_dataset(
+    dataset_uri: str, *, compute_digest: ComputeDigest = ComputeDigest.ONLY_IF_MISSING,
+    strict_digest: bool = False, lazy: bool = False,
+    datasets_dir: str = None, handle_score_split: bool = True,
+) -> Dataset:
+    if datasets_dir is not None:
+        datasets, problem_descriptions = utils.get_datasets_and_problems(datasets_dir, handle_score_split)
+
+        if dataset_uri in datasets:
+            dataset_uri = datasets[dataset_uri]
+
    dataset_uri = utils.fix_uri(dataset_uri)

    return Dataset.load(dataset_uri, compute_digest=compute_digest, strict_digest=strict_digest, lazy=lazy)

--- a/d3m/container/numpy.py
+++ b/d3m/container/numpy.py
@@ -151,9 +151,6 @@ class matrix(numpy.matrix, ndarray):
        ndarray.__array_finalize__(self, obj)


-typing.Sequence.register(numpy.ndarray)  # type: ignore
-
-
 def ndarray_serializer(obj: ndarray) -> dict:
    data = {
        'metadata': obj.metadata,

--- a/d3m/container/pandas.py
+++ b/d3m/container/pandas.py
@@ -161,7 +161,7 @@ class DataFrame(pandas.DataFrame):

        self.metadata = state['metadata']

-    def to_csv(self, path_or_buf: typing.Union[typing.TextIO, str] = None, sep: str = ',', na_rep: str = '',
+    def to_csv(self, path_or_buf: typing.Union[typing.IO[typing.Any], str] = None, sep: str = ',', na_rep: str = '',
               float_format: str = None, columns: typing.Sequence = None, header: typing.Union[bool, typing.Sequence[str]] = True,
               index: bool = False, **kwargs: typing.Any) -> typing.Optional[str]:
        """
@@ -458,9 +458,6 @@ class DataFrame(pandas.DataFrame):
        return self.append_columns(right, use_right_metadata=use_right_metadata)


-typing.Sequence.register(pandas.DataFrame)  # type: ignore
-
-
 def dataframe_serializer(obj: DataFrame) -> dict:
    data = {
        'metadata': obj.metadata,

--- a/d3m/metadata/base.py
+++ b/d3m/metadata/base.py
@@ -1441,7 +1441,7 @@ class Metadata:

        return output

-    def pretty_print(self, selector: Selector = None, handle: typing.TextIO = None, _level: int = 0) -> None:
+    def pretty_print(self, selector: Selector = None, handle: typing.IO[typing.Any] = None, _level: int = 0) -> None:
        """
        Pretty-prints metadata to ``handle``, or `sys.stdout` if not specified.


--- a/d3m/metadata/pipeline.py
+++ b/d3m/metadata/pipeline.py
@@ -135,17 +135,18 @@ class Resolver:

    def _from_file(self, pipeline_description: typing.Dict) -> 'typing.Optional[Pipeline]':
        for path in self.pipeline_search_paths:
-            pipeline_path = os.path.join(path, '{pipeline_id}.json'.format(pipeline_id=pipeline_description['id']))
-            try:
-                with open(pipeline_path, 'r', encoding='utf8') as pipeline_file:
-                    return self.get_pipeline_class().from_json(pipeline_file, resolver=self, strict_digest=self.strict_digest)
-            except FileNotFoundError:
-                pass
+            for extension in ['json', 'json.gz']:
+                pipeline_path = os.path.join(path, '{pipeline_id}.{extension}'.format(pipeline_id=pipeline_description['id'], extension=extension))
+                try:
+                    with utils.open(pipeline_path, 'r', encoding='utf8') as pipeline_file:
+                        return self.get_pipeline_class().from_json(pipeline_file, resolver=self, strict_digest=self.strict_digest)
+                except FileNotFoundError:
+                    pass

-            for extension in ['yml', 'yaml']:
+            for extension in ['yml', 'yaml', 'yml.gz', 'yaml.gz']:
                pipeline_path = os.path.join(path, '{pipeline_id}.{extension}'.format(pipeline_id=pipeline_description['id'], extension=extension))
                try:
-                    with open(pipeline_path, 'r', encoding='utf8') as pipeline_file:
+                    with utils.open(pipeline_path, 'r', encoding='utf8') as pipeline_file:
                        return self.get_pipeline_class().from_yaml(pipeline_file, resolver=self, strict_digest=self.strict_digest)
                except FileNotFoundError:
                    pass
@@ -1770,14 +1771,14 @@ class Pipeline:
        return outputs_types

    @classmethod
-    def from_yaml(cls: typing.Type[P], string_or_file: typing.Union[str, typing.TextIO], *, resolver: typing.Optional[Resolver] = None,
+    def from_yaml(cls: typing.Type[P], string_or_file: typing.Union[str, typing.IO[typing.Any]], *, resolver: typing.Optional[Resolver] = None,
                  strict_digest: bool = False) -> P:
-        description = yaml.safe_load(string_or_file)
+        description = utils.yaml_load(string_or_file)

        return cls.from_json_structure(description, resolver=resolver, strict_digest=strict_digest)

    @classmethod
-    def from_json(cls: typing.Type[P], string_or_file: typing.Union[str, typing.TextIO], *, resolver: typing.Optional[Resolver] = None,
+    def from_json(cls: typing.Type[P], string_or_file: typing.Union[str, typing.IO[typing.Any]], *, resolver: typing.Optional[Resolver] = None,
                  strict_digest: bool = False) -> P:
        if isinstance(string_or_file, str):
            description = json.loads(string_or_file)
@@ -1947,7 +1948,7 @@ class Pipeline:

        return pipeline_description

-    def to_json(self, file: typing.TextIO = None, *, nest_subpipelines: bool = False, canonical: bool = False, **kwargs: typing.Any) -> typing.Optional[str]:
+    def to_json(self, file: typing.IO[typing.Any] = None, *, nest_subpipelines: bool = False, canonical: bool = False, **kwargs: typing.Any) -> typing.Optional[str]:
        obj = self.to_json_structure(nest_subpipelines=nest_subpipelines, canonical=canonical)

        if 'allow_nan' not in kwargs:
@@ -1959,10 +1960,10 @@ class Pipeline:
            json.dump(obj, file, **kwargs)
            return None

-    def to_yaml(self, file: typing.TextIO = None, *, nest_subpipelines: bool = False, canonical: bool = False, **kwargs: typing.Any) -> typing.Optional[str]:
+    def to_yaml(self, file: typing.IO[typing.Any] = None, *, nest_subpipelines: bool = False, canonical: bool = False, **kwargs: typing.Any) -> typing.Optional[str]:
        obj = self.to_json_structure(nest_subpipelines=nest_subpipelines, canonical=canonical)

-        return yaml.safe_dump(obj, stream=file, **kwargs)
+        return utils.yaml_dump(obj, stream=file, **kwargs)

    def equals(self, pipeline: P, *, strict_order: bool = False, only_control_hyperparams: bool = False) -> bool:
        """
@@ -2793,7 +2794,7 @@ def get_pipeline(
    )

    if os.path.exists(pipeline_path):
-        with open(pipeline_path, 'r', encoding='utf8') as pipeline_file:
+        with utils.open(pipeline_path, 'r', encoding='utf8') as pipeline_file:
            if pipeline_path.endswith('.yml') or pipeline_path.endswith('.yaml'):
                return pipeline_class.from_yaml(pipeline_file, resolver=resolver, strict_digest=strict_digest)
            elif pipeline_path.endswith('.json'):
@@ -2833,14 +2834,14 @@ def describe_handler(
            print(pipeline_path, file=output_stream)

        try:
-            with open(pipeline_path, 'r', encoding='utf8') as pipeline_file:
-                if pipeline_path.endswith('.yml') or pipeline_path.endswith('.yaml'):
+            with utils.open(pipeline_path, 'r', encoding='utf8') as pipeline_file:
+                if pipeline_path.endswith('.yml') or pipeline_path.endswith('.yaml') or pipeline_path.endswith('.yml.gz') or pipeline_path.endswith('.yaml.gz'):
                    pipeline = pipeline_class.from_yaml(
                        pipeline_file,
                        resolver=resolver,
                        strict_digest=getattr(arguments, 'strict_digest', False),
                    )
-                elif pipeline_path.endswith('.json'):
+                elif pipeline_path.endswith('.json') or pipeline_path.endswith('.json.gz'):
                    pipeline = pipeline_class.from_json(
                        pipeline_file,
                        resolver=resolver,

--- a/d3m/metadata/pipeline_run.py
+++ b/d3m/metadata/pipeline_run.py
@@ -68,8 +68,7 @@ class User(dict):
        return dumper.represent_dict(data)


-yaml.Dumper.add_representer(User, User._yaml_representer)
-yaml.SafeDumper.add_representer(User, User._yaml_representer)
+utils.yaml_add_representer(User, User._yaml_representer)


 class PipelineRunStep:
@@ -412,13 +411,13 @@ class PipelineRun:

        return json_structure

-    def to_yaml(self, file: typing.TextIO, *, appending: bool = False, **kwargs: typing.Any) -> typing.Optional[str]:
+    def to_yaml(self, file: typing.IO[typing.Any], *, appending: bool = False, **kwargs: typing.Any) -> typing.Optional[str]:
        obj = self.to_json_structure()

        if appending and 'explicit_start' not in kwargs:
            kwargs['explicit_start'] = True

-        return yaml.safe_dump(obj, stream=file, **kwargs)
+        return utils.yaml_dump(obj, stream=file, **kwargs)

    def add_input_dataset(self, dataset: container.Dataset) -> None:
        metadata = dataset.metadata.query(())
@@ -1160,8 +1159,7 @@ class RuntimeEnvironment(dict):
        return dumper.represent_dict(data)


-yaml.Dumper.add_representer(RuntimeEnvironment, RuntimeEnvironment._yaml_representer)
-yaml.SafeDumper.add_representer(RuntimeEnvironment, RuntimeEnvironment._yaml_representer)
+utils.yaml_add_representer(RuntimeEnvironment, RuntimeEnvironment._yaml_representer)


 def _validate_pipeline_run_random_seeds(pipeline_run: typing.Dict) -> None:
@@ -1215,10 +1213,7 @@ def _validate_pipeline_run_timestamps(pipeline_run: typing.Dict, parent_start: d

 def _validate_success_step(step: typing.Dict) -> None:
    if step['type'] == metadata_base.PipelineStepType.PRIMITIVE:
-        if 'method_calls' not in step:
-            raise exceptions.InvalidPipelineRunError("Successful primitive step with missing method calls.")
-
-        for method_call in step['method_calls']:
+        for method_call in step.get('method_calls', []):
            if method_call['status']['state'] != metadata_base.PipelineRunStatusState.SUCCESS:
                raise exceptions.InvalidPipelineRunError(
                    "Step with '{expected_status}' status has a method call with '{status}' status".format(
@@ -1523,14 +1518,16 @@ def pipeline_run_handler(arguments: argparse.Namespace) -> None:
            print(pipeline_run_path)

        try:
-            with open(pipeline_run_path, 'r', encoding='utf8') as pipeline_run_file:
-                if pipeline_run_path.endswith('.yml') or pipeline_run_path.endswith('.yaml'):
-                    pipeline_runs: typing.Iterator[typing.Dict] = yaml.safe_load_all(pipeline_run_file)
+            with utils.open(pipeline_run_path, 'r', encoding='utf8') as pipeline_run_file:
+                if pipeline_run_path.endswith('.yml') or pipeline_run_path.endswith('.yaml') or pipeline_run_path.endswith('.yml.gz') or pipeline_run_path.endswith('.yaml.gz'):
+                    pipeline_runs: typing.Iterator[typing.Dict] = utils.yaml_load_all(pipeline_run_file)
                else:
                    pipeline_runs = typing.cast(typing.Iterator[typing.Dict], [json.load(pipeline_run_file)])

-            for pipeline_run in pipeline_runs:
-                validate_pipeline_run(pipeline_run)
+                # It has to be inside context manager because YAML loader returns a lazy iterator
+                # which requires an open file while iterating.
+                for pipeline_run in pipeline_runs:
+                    validate_pipeline_run(pipeline_run)
        except Exception as error:
            if getattr(arguments, 'continue', False):
                traceback.print_exc(file=sys.stdout)
@@ -1555,14 +1552,14 @@ def pipeline_handler(
            print(pipeline_path)

        try:
-            with open(pipeline_path, 'r', encoding='utf8') as pipeline_file:
-                if pipeline_path.endswith('.yml') or pipeline_path.endswith('.yaml'):
-                    pipelines: typing.Iterator[typing.Dict] = yaml.safe_load_all(pipeline_file)
+            with utils.open(pipeline_path, 'r', encoding='utf8') as pipeline_file:
+                if pipeline_path.endswith('.yml') or pipeline_path.endswith('.yaml') or pipeline_path.endswith('.yml.gz') or pipeline_path.endswith('.yaml.gz'):
+                    pipelines: typing.Iterator[typing.Dict] = utils.yaml_load_all(pipeline_file)
                else:
                    pipelines = typing.cast(typing.Iterator[typing.Dict], [json.load(pipeline_file)])

-            for pipeline in pipelines:
-                validate_pipeline(pipeline)
+                for pipeline in pipelines:
+                    validate_pipeline(pipeline)
        except Exception as error:
            if getattr(arguments, 'continue', False):
                traceback.print_exc(file=sys.stdout)
@@ -1584,14 +1581,14 @@ def problem_handler(arguments: argparse.Namespace, *, problem_resolver: typing.C
            print(problem_path)

        try:
-            with open(problem_path, 'r', encoding='utf8') as problem_file:
-                if problem_path.endswith('.yml') or problem_path.endswith('.yaml'):
-                    problems: typing.Iterator[typing.Dict] = yaml.safe_load_all(problem_file)
+            with utils.open(problem_path, 'r', encoding='utf8') as problem_file:
+                if problem_path.endswith('.yml') or problem_path.endswith('.yaml') or problem_path.endswith('.yml.gz') or problem_path.endswith('.yaml.gz'):
+                    problems: typing.Iterator[typing.Dict] = utils.yaml_load_all(problem_file)
                else:
                    problems = typing.cast(typing.Iterator[typing.Dict], [json.load(problem_file)])

-            for problem in problems:
-                validate_problem(problem)
+                for problem in problems:
+                    validate_problem(problem)
        except Exception as error:
            if getattr(arguments, 'continue', False):
                traceback.print_exc(file=sys.stdout)
@@ -1613,14 +1610,14 @@ def dataset_handler(arguments: argparse.Namespace, *, dataset_resolver: typing.C
            print(dataset_path)

        try:
-            with open(dataset_path, 'r', encoding='utf8') as dataset_file:
+            with utils.open(dataset_path, 'r', encoding='utf8') as dataset_file:
                if dataset_path.endswith('.yml') or dataset_path.endswith('.yaml'):
-                    datasets: typing.Iterator[typing.Dict] = yaml.safe_load_all(dataset_file)
+                    datasets: typing.Iterator[typing.Dict] = utils.yaml_load_all(dataset_file)
                else:
                    datasets = typing.cast(typing.Iterator[typing.Dict], [json.load(dataset_file)])

-            for dataset in datasets:
-                validate_dataset(dataset)
+                for dataset in datasets:
+                    validate_dataset(dataset)
        except Exception as error:
            if getattr(arguments, 'continue', False):
                traceback.print_exc(file=sys.stdout)

--- a/d3m/metadata/primitive_names.py
+++ b/d3m/metadata/primitive_names.py
@@ -51,6 +51,7 @@ PRIMITIVE_NAMES = [
    'cluster_curve_fitting_kmeans',
    'collaborative_filtering_link_prediction',
    'collaborative_filtering_parser',
+    'collaborative_filtering_recommender_system',
    'column_fold',
    'column_map',
    'column_parser',
@@ -61,10 +62,10 @@ PRIMITIVE_NAMES = [
    'concat',
    'conditioner',
    'construct_predictions',
-    'convolutional_neural_net',
    'convolution_1d',
    'convolution_2d',
    'convolution_3d',
+    'convolutional_neural_net',
    'corex_continuous',
    'corex_supervised',
    'corex_text',
@@ -96,18 +97,20 @@ PRIMITIVE_NAMES = [
    'diagonal_mvn',
    'dict_vectorizer',
    'dimension_selection',
-    'doc_2_vec',
+    'discriminative_structured_classifier',
    'do_nothing',
    'do_nothing_for_dataset',
+    'doc_2_vec',
    'dropout',
    'dummy',
-    'discriminative_structured_classifier',
    'echo_ib',
    'echo_linear',
    'edge_list_to_graph',
    'ekss',
    'elastic_net',
    'encoder',
+    'enrich_dates',
+    'ensemble_forest',
    'ensemble_voting',
    'extra_trees',
    'extract_columns',
@@ -127,12 +130,14 @@ PRIMITIVE_NAMES = [
    'gaussian_random_projection',
    'gcn_mixhop',
    'general_relational_dataset',
+    'generative_structured_classifier',
    'generic_univariate_select',
    'geocoding',
    'glda',
    'global_average_pooling_1d',
    'global_average_pooling_2d',
    'global_average_pooling_3d',
+    'global_structure_imputer',
    'gmm',
    'go_dec',
    'goturn',
@@ -146,8 +151,6 @@ PRIMITIVE_NAMES = [
    'grasta_masked',
    'greedy_imputation',
    'grouse',
-    'global_structure_imputer',
-    'generative_structured_classifier',
    'hdbscan',
    'hdp',
    'hinge',
@@ -157,6 +160,7 @@ PRIMITIVE_NAMES = [
    'ibex',
    'identity_parentchildren_markov_blanket',
    'image_reader',
+    'image_transfer',
    'image_transfer_learning_transformer',
    'imputer',
    'inceptionV3_image_feature',
@@ -165,14 +169,14 @@ PRIMITIVE_NAMES = [
    'iterative_labeling',
    'iterative_regression_imputation',
    'joint_mutual_information',
+    'k_means',
    'k_neighbors',
    'kernel_pca',
    'kernel_ridge',
    'kfold_dataset_split',
    'kfold_timeseries_split',
-    'kullback_leibler_divergence',
-    'k_means',
    'kss',
+    'kullback_leibler_divergence',
    'l1_low_rank',
    'label_decoder',
    'label_encoder',
@@ -193,23 +197,26 @@ PRIMITIVE_NAMES = [
    'link_prediction',
    'list_to_dataframe',
    'list_to_ndarray',
-    'logcosh',
+    'load_edgelist',
+    'load_graphs',
+    'load_single_graph',
+    'local_structure_imputer',
    'log_mel_spectrogram',
+    'logcosh',
    'logistic_regression',
    'loss',
    'lstm',
    'lupi_svm',
-    'local_structure_imputer',
+    'max_abs_scaler',
    'max_pooling_1d',
    'max_pooling_2d',
    'max_pooling_3d',
-    'max_abs_scaler',
    'mean_absolute_error',
    'mean_absolute_percentage_error',
-    'mean_squared_error',
-    'mean_squared_logarithmic_error',
    'mean_baseline',
    'mean_imputation',
+    'mean_squared_error',
+    'mean_squared_logarithmic_error',
    'metafeature_extractor',
    'mice_imputation',
    'min_max_scaler',
@@ -239,6 +246,7 @@ PRIMITIVE_NAMES = [
    'ordinal_encoder',
    'out_of_sample_adjacency_spectral_embedding',
    'out_of_sample_laplacian_spectral_embedding',
+    'output_dataframe',
    'owl_regression',
    'pass_to_ranks',
    'passive_aggressive',
@@ -271,6 +279,7 @@ PRIMITIVE_NAMES = [
    'remove_semantic_types',
    'rename_duplicate_name',
    'replace_semantic_types',
+    'replace_singletons',
    'resnet50_image_feature',
    'resnext101_kinetics_video_features',
    'retina_net',
@@ -327,17 +336,19 @@ PRIMITIVE_NAMES = [
    'tensor_machines_binary_classification',
    'tensor_machines_regularized_least_squares',
    'term_filter',
+    'text_classifier',
    'text_encoder',
    'text_reader',
    'text_summarization',
    'text_to_vocabulary',
    'text_tokenizer',
    'tfidf_vectorizer',
-    'time_series_to_list',
+    'time_series_formatter',
    'time_series_neighbours',
    'time_series_reshaper',
-    'topic_vectorizer',
+    'time_series_to_list',
    'to_numeric',
+    'topic_vectorizer',
    'train_score_dataset_split',
    'trecs',
    'tree_augmented_naive_bayes',
@@ -356,8 +367,8 @@ PRIMITIVE_NAMES = [
    'vertical_concatenate',
    'vgg16',
    'vgg16_image_feature',
-    'voter',
    'video_reader',
+    'voter',
    'voting',
    'wikifier',
    'word_2_vec',

--- a/d3m/metadata/problem.py
+++ b/d3m/metadata/problem.py
@@ -416,6 +416,29 @@ TASK_TYPE_TO_KEYWORDS_MAP: typing.Dict[typing.Optional[str], typing.List] = {
    'overlapping': [TaskKeyword.OVERLAPPING],  # type: ignore
    'nonOverlapping': [TaskKeyword.NONOVERLAPPING],  # type: ignore
 }
+JSON_TASK_TYPE_TO_KEYWORDS_MAP: typing.Dict[typing.Optional[str], typing.List] = {
+    None: [],
+    'CLASSIFICATION': [TaskKeyword.CLASSIFICATION],  # type: ignore
+    'REGRESSION': [TaskKeyword.REGRESSION],  # type: ignore
+    'CLUSTERING': [TaskKeyword.CLUSTERING],  # type: ignore
+    'LINK_PREDICTION': [TaskKeyword.LINK_PREDICTION],  # type: ignore
+    'VERTEX_CLASSIFICATION': [TaskKeyword.VERTEX_CLASSIFICATION],  # type: ignore
+    'VERTEX_NOMINATION': [TaskKeyword.VERTEX_NOMINATION],  # type: ignore
+    'COMMUNITY_DETECTION': [TaskKeyword.COMMUNITY_DETECTION],  # type: ignore
+    'GRAPH_MATCHING': [TaskKeyword.GRAPH_MATCHING],  # type: ignore
+    'TIME_SERIES_FORECASTING': [TaskKeyword.TIME_SERIES, TaskKeyword.FORECASTING],  # type: ignore
+    'COLLABORATIVE_FILTERING': [TaskKeyword.COLLABORATIVE_FILTERING],  # type: ignore
+    'OBJECT_DETECTION': [TaskKeyword.OBJECT_DETECTION],  # type: ignore
+    'SEMISUPERVISED_CLASSIFICATION': [TaskKeyword.SEMISUPERVISED, TaskKeyword.CLASSIFICATION],  # type: ignore
+    'SEMISUPERVISED_REGRESSION': [TaskKeyword.SEMISUPERVISED, TaskKeyword.REGRESSION],  # type: ignore
+    'BINARY': [TaskKeyword.BINARY],  # type: ignore
+    'MULTICLASS': [TaskKeyword.MULTICLASS],  # type: ignore
+    'MULTILABEL': [TaskKeyword.MULTILABEL],  # type: ignore
+    'UNIVARIATE': [TaskKeyword.UNIVARIATE],  # type: ignore
+    'MULTIVARIATE': [TaskKeyword.MULTIVARIATE],  # type: ignore
+    'OVERLAPPING': [TaskKeyword.OVERLAPPING],  # type: ignore
+    'nonoverlapping': [TaskKeyword.NONOVERLAPPING],  # type: ignore
+}


 class Loader(metaclass=utils.AbstractMetaclass):
@@ -812,13 +835,55 @@ class Problem(dict):
        return cls(structure, strict_digest=strict_digest)

    def to_json_structure(self, *, canonical: bool = False) -> typing.Dict:
+        """
+        For standard enumerations we map them to strings. Non-standard problem
+        description fields we convert in a reversible manner.
+        """
+
        PROBLEM_SCHEMA_VALIDATOR.validate(self)

-        return utils.to_reversible_json_structure(self.to_simple_structure(canonical=canonical))
+        simple_structure = copy.deepcopy(self.to_simple_structure(canonical=canonical))
+
+        if simple_structure.get('problem', {}).get('task_keywords', []):
+            simple_structure['problem']['task_keywords'] = [task_keyword.name for task_keyword in simple_structure['problem']['task_keywords']]
+        if simple_structure.get('problem', {}).get('performance_metrics', []):
+            for metric in simple_structure['problem']['performance_metrics']:
+                metric['metric'] = metric['metric'].name
+
+        return utils.to_reversible_json_structure(simple_structure)

    @classmethod
    def from_json_structure(cls: typing.Type[P], structure: typing.Dict, *, strict_digest: bool = False) -> P:
-        return cls.from_simple_structure(utils.from_reversible_json_structure(structure), strict_digest=strict_digest)
+        """
+        For standard enumerations we map them from strings. For non-standard problem
+        description fields we used a reversible conversion.
+        """
+
+        simple_structure = utils.from_reversible_json_structure(structure)
+
+        # Legacy (before v4.0.0).
+        legacy_task_keywords: typing.List[TaskKeyword] = []  # type: ignore
+        legacy_task_keywords += JSON_TASK_TYPE_TO_KEYWORDS_MAP[simple_structure.get('problem', {}).get('task_type', None)]
+        legacy_task_keywords += JSON_TASK_TYPE_TO_KEYWORDS_MAP[simple_structure.get('problem', {}).get('task_subtype', None)]
+
+        if legacy_task_keywords:
+            # We know "problem" field exists.
+            simple_structure['problem']['task_keywords'] = simple_structure['problem'].get('task_keywords', []) + legacy_task_keywords
+
+        if simple_structure.get('problem', {}).get('task_keywords', []):
+            mapped_task_keywords = []
+            for task_keyword in simple_structure['problem']['task_keywords']:
+                if isinstance(task_keyword, str):
+                    mapped_task_keywords.append(TaskKeyword[task_keyword])
+                else:
+                    mapped_task_keywords.append(task_keyword)
+            simple_structure['problem']['task_keywords'] = mapped_task_keywords
+        if simple_structure.get('problem', {}).get('performance_metrics', []):
+            for metric in simple_structure['problem']['performance_metrics']:
+                if isinstance(metric['metric'], str):
+                    metric['metric'] = PerformanceMetric[metric['metric']]
+
+        return cls.from_simple_structure(simple_structure, strict_digest=strict_digest)


 @deprecate.function(message="use Problem.load class method instead")
@@ -866,7 +931,13 @@ if pyarrow_lib is not None:
    )


-def get_problem(problem_uri: str, *, strict_digest: bool = False) -> Problem:
+def get_problem(problem_uri: str, *, strict_digest: bool = False, datasets_dir: str = None, handle_score_split: bool = True) -> Problem:
+    if datasets_dir is not None:
+        datasets, problem_descriptions = utils.get_datasets_and_problems(datasets_dir, handle_score_split)
+
+        if problem_uri in problem_descriptions:
+            problem_uri = problem_descriptions[problem_uri]
+
    problem_uri = utils.fix_uri(problem_uri)

    return Problem.load(problem_uri, strict_digest=strict_digest)
@@ -902,7 +973,7 @@ def describe_handler(

            if getattr(arguments, 'print', False):
                pprint.pprint(problem_description, stream=output_stream)
-            else:
+            elif not getattr(arguments, 'no_print', False):
                json.dump(
                    problem_description,
                    output_stream,

--- a/d3m/runtime.py
+++ b/d3m/runtime.py
 import argparse
 import inspect
 import json
+import functools
 import logging
 import os
 import os.path
@@ -15,7 +16,6 @@ import uuid
 import jsonschema  # type: ignore
 import frozendict  # type: ignore
 import pandas  # type: ignore
-import yaml  # type: ignore

 from d3m import container, deprecate, exceptions, types, utils
 from d3m.container import dataset as dataset_module
@@ -1525,106 +1525,24 @@ get_problem = deprecate.function(message="use d3m.metadata.problem.get_problem i
 get_pipeline = deprecate.function(message="use d3m.metadata.pipeline.get_pipeline instead")(pipeline_module.get_pipeline)


-# TODO: Do not traverse the datasets directory every time.
+@deprecate.function(message="use d3m.utils.get_datasets_and_problems instead")
 def _get_datasets_and_problems(
    datasets_dir: str, handle_score_split: bool = True,
 ) -> typing.Tuple[typing.Dict[str, str], typing.Dict[str, str]]:
-    datasets: typing.Dict[str, str] = {}
-    problem_descriptions: typing.Dict[str, str] = {}
-
-    for dirpath, dirnames, filenames in os.walk(datasets_dir, followlinks=True):
-        if 'datasetDoc.json' in filenames:
-            # Do not traverse further (to not parse "datasetDoc.json" or "problemDoc.json" if they
-            # exists in raw data filename).
-            dirnames[:] = []
-
-            dataset_path = os.path.join(os.path.abspath(dirpath), 'datasetDoc.json')
-
-            try:
-                with open(dataset_path, 'r', encoding='utf8') as dataset_file:
-                    dataset_doc = json.load(dataset_file)
-
-                dataset_id = dataset_doc['about']['datasetID']
-                # Handle a special case for SCORE dataset splits (those which have "targets.csv" file).
-                # They are the same as TEST dataset splits, but we present them differently, so that
-                # SCORE dataset splits have targets as part of data. Because of this we also update
-                # corresponding dataset ID.
-                # See: https://gitlab.com/datadrivendiscovery/d3m/issues/176
-                if handle_score_split and os.path.exists(os.path.join(dirpath, '..', 'targets.csv')) and dataset_id.endswith('_TEST'):
-                    dataset_id = dataset_id[:-5] + '_SCORE'
-
-                if dataset_id in datasets:
-                    logger.warning(
-                        "Duplicate dataset ID '%(dataset_id)s': '%(old_dataset)s' and '%(dataset)s'", {
-                            'dataset_id': dataset_id,
-                            'dataset': dataset_path,
-                            'old_dataset': datasets[dataset_id],
-                        },
-                    )
-                else:
-                    datasets[dataset_id] = dataset_path
-
-            except (ValueError, KeyError):
-                logger.exception(
-                    "Unable to read dataset '%(dataset)s'.", {
-                        'dataset': dataset_path,
-                    },
-                )
-
-        if 'problemDoc.json' in filenames:
-            # We continue traversing further in this case.
-
-            problem_path = os.path.join(os.path.abspath(dirpath), 'problemDoc.json')
-
-            try:
-                with open(problem_path, 'r', encoding='utf8') as problem_file:
-                    problem_doc = json.load(problem_file)
-
-                problem_id = problem_doc['about']['problemID']
-                # Handle a special case for SCORE dataset splits (those which have "targets.csv" file).
-                # They are the same as TEST dataset splits, but we present them differently, so that
-                # SCORE dataset splits have targets as part of data. Because of this we also update
-                # corresponding problem ID.
-                # See: https://gitlab.com/datadrivendiscovery/d3m/issues/176
-                if handle_score_split and os.path.exists(os.path.join(dirpath, '..', 'targets.csv')) and problem_id.endswith('_TEST'):
-                    problem_id = problem_id[:-5] + '_SCORE'
-
-                    # Also update dataset references.
-                    for data in problem_doc.get('inputs', {}).get('data', []):
-                        if data['datasetID'].endswith('_TEST'):
-                            data['datasetID'] = data['datasetID'][:-5] + '_SCORE'
-
-                if problem_id in problem_descriptions:
-                    logger.warning(
-                        "Duplicate problem ID '%(problem_id)s': '%(old_problem)s' and '%(problem)s'", {
-                            'problem_id': problem_id,
-                            'problem': problem_path,
-                            'old_problem': problem_descriptions[problem_id],
-                        },
-                    )
-                else:
-                    problem_descriptions[problem_id] = problem_path
-
-            except (ValueError, KeyError):
-                logger.exception(
-                    "Unable to read problem description '%(problem)s'.", {
-                        'problem': problem_path,
-                    },
-                )
-
-    return datasets, problem_descriptions
+    return utils.get_datasets_and_problems(datasets_dir, handle_score_split)


 def _resolve_pipeline_run_datasets(
-    datasets: typing.Dict[str, str], pipeline_run_datasets: typing.Sequence[typing.Dict[str, str]], *,
+    pipeline_run_datasets: typing.Sequence[typing.Dict[str, str]], *,
    dataset_resolver: typing.Callable, compute_digest: dataset_module.ComputeDigest, strict_digest: bool,
-    strict_resolving: bool = False,
+    strict_resolving: bool, datasets_dir: typing.Optional[str], handle_score_split: bool,
 ) -> typing.Sequence[container.Dataset]:
    resolved_datasets = []

    for dataset_reference in pipeline_run_datasets:
        resolved_dataset = dataset_resolver(
-            datasets[dataset_reference['id']], compute_digest=compute_digest, strict_digest=strict_digest,
+            dataset_reference['id'], compute_digest=compute_digest, strict_digest=strict_digest,
+            datasets_dir=datasets_dir, handle_score_split=handle_score_split,
        )

        resolved_dataset_digest = resolved_dataset.metadata.query(()).get('digest', None)
@@ -1656,15 +1574,12 @@ def _resolve_pipeline_run_datasets(


 def parse_pipeline_run(
-    pipeline_run_file: typing.TextIO, pipeline_search_paths: typing.Sequence[str], datasets_dir: str, *,
+    pipeline_run_file: typing.IO[typing.Any], pipeline_search_paths: typing.Sequence[str], datasets_dir: typing.Optional[str], *,
    pipeline_resolver: typing.Callable = None, dataset_resolver: typing.Callable = None,
    problem_resolver: typing.Callable = None, strict_resolving: bool = False,
    compute_digest: dataset_module.ComputeDigest = dataset_module.ComputeDigest.ONLY_IF_MISSING,
    strict_digest: bool = False, handle_score_split: bool = True,
 ) -> typing.Sequence[typing.Dict[str, typing.Any]]:
-    if datasets_dir is None:
-        raise exceptions.InvalidArgumentValueError("Datasets directory has to be provided to resolve pipeline run files.")
-
    if pipeline_resolver is None:
        pipeline_resolver = pipeline_module.get_pipeline
    if dataset_resolver is None:
@@ -1672,13 +1587,11 @@ def parse_pipeline_run(
    if problem_resolver is None:
        problem_resolver = problem.get_problem

-    pipeline_runs = list(yaml.safe_load_all(pipeline_run_file))
+    pipeline_runs = list(utils.yaml_load_all(pipeline_run_file))

    if not pipeline_runs:
        raise exceptions.InvalidArgumentValueError("Pipeline run file must contain at least one pipeline run document.")

-    datasets, problem_descriptions = _get_datasets_and_problems(datasets_dir, handle_score_split)
-
    for pipeline_run in pipeline_runs:
        try:
            pipeline_run_module.validate_pipeline_run(pipeline_run)
@@ -1686,12 +1599,19 @@ def parse_pipeline_run(
            raise exceptions.InvalidArgumentValueError("Provided pipeline run document is not valid.") from error

        pipeline_run['datasets'] = _resolve_pipeline_run_datasets(
-            datasets, pipeline_run['datasets'], dataset_resolver=dataset_resolver,
-            compute_digest=compute_digest, strict_digest=strict_digest, strict_resolving=strict_resolving,
+            pipeline_run['datasets'], dataset_resolver=dataset_resolver,
+            compute_digest=compute_digest, strict_digest=strict_digest,
+            strict_resolving=strict_resolving, datasets_dir=datasets_dir,
+            handle_score_split=handle_score_split,
        )

        if 'problem' in pipeline_run:
-            pipeline_run['problem'] = problem_resolver(problem_descriptions[pipeline_run['problem']['id']], strict_digest=strict_digest)
+            pipeline_run['problem'] = problem_resolver(
+                pipeline_run['problem']['id'],
+                strict_digest=strict_digest,
+                datasets_dir=datasets_dir,
+                handle_score_split=handle_score_split,
+            )

        pipeline_run['pipeline'] = pipeline_resolver(
            pipeline_run['pipeline']['id'],
@@ -1712,8 +1632,9 @@ def parse_pipeline_run(
            if 'datasets' in pipeline_run['run']['scoring']:
                assert 'data_preparation' not in pipeline_run['run']
                pipeline_run['run']['scoring']['datasets'] = _resolve_pipeline_run_datasets(
-                    datasets, pipeline_run['run']['scoring']['datasets'], dataset_resolver=dataset_resolver,
+                    pipeline_run['run']['scoring']['datasets'], dataset_resolver=dataset_resolver,
                    compute_digest=compute_digest, strict_digest=strict_digest, strict_resolving=strict_resolving,
+                    datasets_dir=datasets_dir, handle_score_split=handle_score_split,
                )

            if pipeline_run['run']['scoring']['pipeline']['id'] == DEFAULT_SCORING_PIPELINE_ID:
@@ -1884,7 +1805,7 @@ def combine_pipeline_runs(


 @deprecate.function(message="use extended DataFrame.to_csv method instead")
-def export_dataframe(dataframe: container.DataFrame, output_file: typing.TextIO = None) -> typing.Optional[str]:
+def export_dataframe(dataframe: container.DataFrame, output_file: typing.IO[typing.Any] = None) -> typing.Optional[str]:
    return dataframe.to_csv(output_file)



--- a/d3m/utils.py
+++ b/d3m/utils.py
@@ -7,6 +7,8 @@ import copy
 import datetime
 import decimal
 import enum
+import functools
+import gzip
 import hashlib
 import inspect
 import json
@@ -38,6 +40,13 @@ from pytypes import type_util  # type: ignore
 import d3m
 from d3m import deprecate, exceptions

+if yaml.__with_libyaml__:
+    from yaml import CSafeLoader as SafeLoader, CSafeDumper as SafeDumper  # type: ignore
+else:
+    from yaml import SafeLoader, SafeDumper
+
+logger = logging.getLogger(__name__)
+
 NONE_TYPE: typing.Type = type(None)

 # Only types without elements can be listed here. If they are elements, we have to
@@ -233,6 +242,58 @@ def type_to_str(obj: type) -> str:
    return type_util.type_str(obj, assumed_globals={}, update_assumed_globals=False)


+yaml_warning_issued = False
+
+
+def yaml_dump_all(documents: typing.Sequence[typing.Any], stream: typing.IO[typing.Any] = None, **kwds: typing.Any) -> typing.Any:
+    global yaml_warning_issued
+
+    if not yaml.__with_libyaml__ and not yaml_warning_issued:
+        yaml_warning_issued = True
+        logger.warning("cyaml not found, using a slower pure Python YAML implementation.")
+
+    return yaml.dump_all(documents, stream, Dumper=SafeDumper, **kwds)
+
+
+def yaml_dump(data: typing.Any, stream: typing.IO[typing.Any] = None, **kwds: typing.Any) -> typing.Any:
+    global yaml_warning_issued
+
+    if not yaml.__with_libyaml__ and not yaml_warning_issued:
+        yaml_warning_issued = True
+        logger.warning("cyaml not found, using a slower pure Python YAML implementation.")
+
+    return yaml.dump_all([data], stream, Dumper=SafeDumper, **kwds)
+
+
+def yaml_load_all(stream: typing.Union[str, typing.IO[typing.Any]]) -> typing.Any:
+    global yaml_warning_issued
+
+    if not yaml.__with_libyaml__ and not yaml_warning_issued:
+        yaml_warning_issued = True
+        logger.warning("cyaml not found, using a slower pure Python YAML implementation.")
+
+    return yaml.load_all(stream, SafeLoader)
+
+
+def yaml_load(stream: typing.Union[str, typing.IO[typing.Any]]) -> typing.Any:
+    global yaml_warning_issued
+
+    if not yaml.__with_libyaml__ and not yaml_warning_issued:
+        yaml_warning_issued = True
+        logger.warning("cyaml not found, using a slower pure Python YAML implementation.")
+
+    return yaml.load(stream, SafeLoader)
+
+
+def yaml_add_representer(value_type: typing.Type, represented: typing.Callable) -> None:
+    yaml.Dumper.add_representer(value_type, represented)
+    yaml.SafeDumper.add_representer(value_type, represented)
+
+    if yaml.__with_libyaml__:
+        yaml.CDumper.add_representer(value_type, represented)  # type: ignore
+        yaml.CSafeDumper.add_representer(value_type, represented)  # type: ignore
+
+
 class EnumMeta(enum.EnumMeta):
    def __new__(mcls, class_name, bases, namespace, **kwargs):  # type: ignore
        def __reduce_ex__(self: typing.Any, proto: int) -> typing.Any:
@@ -246,7 +307,7 @@ class EnumMeta(enum.EnumMeta):
        def yaml_representer(dumper, data):  # type: ignore
            return yaml.ScalarNode('tag:yaml.org,2002:str', data.name)

-        yaml.add_representer(cls, yaml_representer)
+        yaml_add_representer(cls, yaml_representer)

        return cls

@@ -346,7 +407,7 @@ def make_immutable_copy(obj: typing.Any) -> typing.Any:
        return type(obj)(make_immutable_copy(o) for o in obj)
    if isinstance(obj, pandas.DataFrame):
        return tuple(make_immutable_copy(o) for o in obj.itertuples(index=False, name=None))
-    if isinstance(obj, typing.Sequence):
+    if isinstance(obj, (typing.Sequence, numpy.ndarray)):
        return tuple(make_immutable_copy(o) for o in obj)

    raise TypeError("{obj} is not known to be immutable.".format(obj=obj))
@@ -560,7 +621,7 @@ class JsonEncoder(json.JSONEncoder):
            return sorted(o, key=str)
        if isinstance(o, pandas.DataFrame):
            return list(o.itertuples(index=False, name=None))
-        if isinstance(o, typing.Sequence):
+        if isinstance(o, (typing.Sequence, numpy.ndarray)):
            return list(o)
        if isinstance(o, decimal.Decimal):
            return float(o)
@@ -1095,8 +1156,7 @@ def register_yaml_representers() -> None:
    ]

    for representer in representers:
-        yaml.Dumper.add_representer(representer['type'], representer['representer'])
-        yaml.SafeDumper.add_representer(representer['type'], representer['representer'])
+        yaml_add_representer(representer['type'], representer['representer'])


 def matches_structural_type(source_structural_type: type, target_structural_type: typing.Union[str, type]) -> bool:
@@ -1262,8 +1322,23 @@ def log_once(logger: logging.Logger, level: int, msg: str, *args: typing.Any, **
 # A workaround to handle also binary stdin/stdout.
 # See: https://gitlab.com/datadrivendiscovery/d3m/issues/353
 # See: https://bugs.python.org/issue14156
+# Moreover, if filename ends in ".gz" it decompresses the file as well.
 class FileType(argparse.FileType):
    def __call__(self, string: str) -> typing.IO[typing.Any]:
+        if string.endswith('.gz'):
+            # "gzip.open" has as a default binary mode,
+            # but we want text mode as a default.
+            if 't' not in self._mode and 'b' not in self._mode:  # type: ignore
+                mode = self._mode + 't'  # type: ignore
+            else:
+                mode = self._mode  # type: ignore
+
+            try:
+                return gzip.open(string, mode=mode, encoding=self._encoding, errors=self._errors)  # type: ignore
+            except OSError as error:
+                message = argparse._("can't open '%s': %s")  # type: ignore
+                raise argparse.ArgumentTypeError(message % (string, error))
+
        handle = super().__call__(string)

        if string == '-' and 'b' in self._mode:  # type: ignore
@@ -1272,6 +1347,16 @@ class FileType(argparse.FileType):
        return handle


+def open(file: str, mode: str = 'r', buffering: int = -1, encoding: str = None, errors: str = None) -> typing.IO[typing.Any]:
+    try:
+        return FileType(mode=mode, bufsize=buffering, encoding=encoding, errors=errors)(file)
+    except argparse.ArgumentTypeError as error:
+        original_error = error.__context__
+
+    # So that we are outside of the except clause.
+    raise original_error
+
+
 def filter_local_location_uris(doc: typing.Dict, *, empty_value: typing.Any = None) -> None:
    if 'location_uris' in doc:
        location_uris = []
@@ -1365,3 +1450,101 @@ def json_structure_equals(

    else:
        return obj1 == obj2
+
+
+@functools.lru_cache()
+def get_datasets_and_problems(
+    datasets_dir: str, handle_score_split: bool = True,
+) -> typing.Tuple[typing.Dict[str, str], typing.Dict[str, str]]:
+    if datasets_dir is None:
+        raise exceptions.InvalidArgumentValueError("Datasets directory has to be provided.")
+
+    datasets: typing.Dict[str, str] = {}
+    problem_descriptions: typing.Dict[str, str] = {}
+    problem_description_contents: typing.Dict[str, typing.Dict] = {}
+
+    for dirpath, dirnames, filenames in os.walk(datasets_dir, followlinks=True):
+        if 'datasetDoc.json' in filenames:
+            # Do not traverse further (to not parse "datasetDoc.json" or "problemDoc.json" if they
+            # exists in raw data filename).
+            dirnames[:] = []
+
+            dataset_path = os.path.join(os.path.abspath(dirpath), 'datasetDoc.json')
+
+            try:
+                with open(dataset_path, 'r', encoding='utf8') as dataset_file:
+                    dataset_doc = json.load(dataset_file)
+
+                dataset_id = dataset_doc['about']['datasetID']
+                # Handle a special case for SCORE dataset splits (those which have "targets.csv" file).
+                # They are the same as TEST dataset splits, but we present them differently, so that
+                # SCORE dataset splits have targets as part of data. Because of this we also update
+                # corresponding dataset ID.
+                # See: https://gitlab.com/datadrivendiscovery/d3m/issues/176
+                if handle_score_split and os.path.exists(os.path.join(dirpath, '..', 'targets.csv')) and dataset_id.endswith('_TEST'):
+                    dataset_id = dataset_id[:-5] + '_SCORE'
+
+                if dataset_id in datasets:
+                    logger.warning(
+                        "Duplicate dataset ID '%(dataset_id)s': '%(old_dataset)s' and '%(dataset)s'", {
+                            'dataset_id': dataset_id,
+                            'dataset': dataset_path,
+                            'old_dataset': datasets[dataset_id],
+                        },
+                    )
+                else:
+                    datasets[dataset_id] = dataset_path
+
+            except (ValueError, KeyError):
+                logger.exception(
+                    "Unable to read dataset '%(dataset)s'.", {
+                        'dataset': dataset_path,
+                    },
+                )
+
+        if 'problemDoc.json' in filenames:
+            # We continue traversing further in this case.
+
+            problem_path = os.path.join(os.path.abspath(dirpath), 'problemDoc.json')
+
+            try:
+                with open(problem_path, 'r', encoding='utf8') as problem_file:
+                    problem_doc = json.load(problem_file)
+
+                problem_id = problem_doc['about']['problemID']
+                # Handle a special case for SCORE dataset splits (those which have "targets.csv" file).
+                # They are the same as TEST dataset splits, but we present them differently, so that
+                # SCORE dataset splits have targets as part of data. Because of this we also update
+                # corresponding problem ID.
+                # See: https://gitlab.com/datadrivendiscovery/d3m/issues/176
+                if handle_score_split and os.path.exists(os.path.join(dirpath, '..', 'targets.csv')) and problem_id.endswith('_TEST'):
+                    problem_id = problem_id[:-5] + '_SCORE'
+
+                    # Also update dataset references.
+                    for data in problem_doc.get('inputs', {}).get('data', []):
+                        if data['datasetID'].endswith('_TEST'):
+                            data['datasetID'] = data['datasetID'][:-5] + '_SCORE'
+
+                with open(problem_path, 'r', encoding='utf8') as problem_file:
+                    problem_description = json.load(problem_file)
+
+                if problem_id in problem_descriptions and problem_description != problem_description_contents[problem_id]:
+                    logger.warning(
+                        "Duplicate problem ID '%(problem_id)s': '%(old_problem)s' and '%(problem)s'", {
+                            'problem_id': problem_id,
+                            'problem': problem_path,
+                            'old_problem': problem_descriptions[problem_id],
+                        },
+                    )
+                else:
+                    problem_descriptions[problem_id] = problem_path
+                    problem_description_contents[problem_id] = problem_description
+
+            except (ValueError, KeyError):
+                logger.exception(
+                    "Unable to read problem description '%(problem)s'.", {
+                        'problem': problem_path,
+                    },
+                )
+
+    return datasets, problem_descriptions
--- a/docs/conf.py
+++ b/docs/conf.py
@@ -181,7 +181,15 @@ texinfo_documents = [
 # -- Options for intersphinx extension ---------------------------------------

 # Example configuration for intersphinx: refer to the Python standard library.
-intersphinx_mapping = {'https://docs.python.org/': None}
+intersphinx_mapping = {
+    'https://docs.python.org/': None,
+    'pandas': ('https://pandas.pydata.org/pandas-docs/stable/', None),
+    'numpy': ('https://docs.scipy.org/doc/numpy/', None),
+    #'numpy': ('https://numpydoc.readthedocs.io/en/latest/', None),
+    'scikit-learn': ('https://scikit-learn.org/stable/', None),
+    'mypy': ('https://mypy.readthedocs.io/en/stable/', None),
+    'setuptools': ('https://setuptools.readthedocs.io/en/latest/', None),
+}

 # -- Options for todo extension ----------------------------------------------


--- a/docs/discovery.rst
+++ b/docs/discovery.rst
@@ -4,11 +4,10 @@ Primitives discovery
 Primitives D3M namespace
 ------------------------

-The ``d3m.primitives`` module exposes all primitives under the same
+The :mod:`d3m.primitives` module exposes all primitives under the same
 ``d3m.primitives`` namespace.

-This is achieved using `Python entry
-points <https://setuptools.readthedocs.io/en/latest/setuptools.html#dynamic-discovery-of-services-and-plugins>`__.
+This is achieved using :ref:`Python entry points <setuptools:Entry Points>`.
 Python packages containing primitives should register them and expose
 them under the common namespace by adding an entry like the following to
 package's ``setup.py``:
@@ -31,8 +30,8 @@ your primitives on the system. Then your package with primitives just
 have to be installed on the system and can be automatically discovered
 and used by any other Python code.

-    **Note:** Only primitive classes are available thorough the
-    ``d3m.primitives`` namespace, not other symbols from a source
+    **Note:** Only primitive classes are available through the
+    ``d3m.primitives`` namespace, no other symbols from a source
    module. In the example above, only ``PrimitiveClassName`` is
    available, not other symbols inside ``my_module`` (except if they
    are other classes also added to entry points).
@@ -65,7 +64,7 @@ compatible Python Package Index), publish a package with a keyword
 d3m.index API
 --------------------------

-The ``d3m.index`` module exposes the following Python utility functions.
+The :mod:`d3m.index` module exposes the following Python utility functions.

 ``search``
 ~~~~~~~~~~
@@ -116,7 +115,7 @@ keywords.
 Command line
 ------------

-The ``d3m.index`` module also provides a command line interface by
+The :mod:`d3m.index` module also provides a command line interface by
 running ``python3 -m d3m index``. The following commands are currently
 available.