Changelog#

0.8.6#

Breaking Changes

  • The dagster-celery module has been broken apart to manage dependencies more coherently. There are now three modules: dagster-celery, dagster-celery-k8s, and dagster-celery-docker.
  • Related to above, the dagster-celery worker start command now takes a required -A parameter which must point to the app.py file within the appropriate module. E.g if you are using the celery_k8s_job_executor then you must use the -A dagster_celery_k8s.app option when using the celery or dagster-celery cli tools. Similar for the celery_docker_executor: -A dagster_celery_docker.app must be used.
  • Renamed the input_hydration_config and output_materialization_config decorators to dagster_type_ and dagster_type_materializer respectively. Renamed DagsterType's input_hydration_config and output_materialization_config arguments to loader and materializer respectively.

New

  • New pipeline scoped runs tab in Dagit

  • Add the following Dask Job Queue clusters: moab, sge, lsf, slurm, oar (thanks @DavidKatz-il!)

  • K8s resource-requirements for run coordinator pods can be specified using the dagster-k8s/ resource_requirements tag on pipeline definitions:

    @pipeline(
        tags={
            'dagster-k8s/resource_requirements': {
                'requests': {'cpu': '250m', 'memory': '64Mi'},
                'limits': {'cpu': '500m', 'memory': '2560Mi'},
            }
        },
    )
    def foo_bar_pipeline():
    
  • Added better error messaging in dagit for partition set and schedule configuration errors

  • An initial version of the CeleryDockerExecutor was added (thanks @mrdrprofuroboros!). The celery workers will launch tasks in docker containers.

  • Experimental: Great Expectations integration is currently under development in the new library dagster-ge. Example usage can be found here

0.8.5#

Breaking Changes

  • Python 3.5 is no longer under test.
  • Engine and ExecutorConfig have been deleted in favor of Executor. Instead of the @executor decorator decorating a function that returns an ExecutorConfig it should now decorate a function that returns an Executor.

New

  • The python built-in dict can be used as an alias for Permissive() within a config schema declaration.
  • Use StringSource in the S3ComputeLogManager configuration schema to support using environment variables in the configuration (Thanks @mrdrprofuroboros!)
  • Improve Backfill CLI help text
  • Add options to spark_df_output_schema (Thanks @DavidKatz-il!)
  • Helm: Added support for overriding the PostgreSQL image/version used in the init container checks.
  • Update celery k8s helm chart to include liveness checks for celery workers and flower
  • Support step level retries to celery k8s executor

Bugfixes

  • Improve error message shown when a RepositoryDefinition returns objects that are not one of the allowed definition types (Thanks @sd2k!)
  • Show error message when $DAGSTER_HOME environment variable is not an absolute path (Thanks @AndersonReyes!)
  • Update default value for staging_prefix in the DatabricksPySparkStepLauncher configuration to be an absolute path (Thanks @sd2k!)
  • Improve error message shown when Databricks logs can't be retrieved (Thanks @sd2k!)
  • Fix errors in documentation fo input_hydration_config (Thanks @joeyfreund!)

0.8.4#

Bugfix

  • Reverted changed in 0.8.3 that caused error during run launch in certain circumstances
  • Updated partition graphs on schedule page to select most recent run
  • Forced reload of partitions for partition sets to ensure not serving stale data

New

  • Added reload button to dagit to reload current repository
  • Added option to wipe a single asset key by using dagster asset wipe <asset_key>
  • Simplified schedule page, removing ticks table, adding tags for last tick attempt
  • Better debugging tools for launch errors

0.8.3#

Breaking Changes

  • Previously, the gcs_resource returned a GCSResource wrapper which had a single client property that returned a google.cloud.storage.client.Client. Now, the gcs_resource returns the client directly.

    To update solids that use the gcp_resource, change:

    context.resources.gcs.client
    

    To:

    context.resources.gcs
    

New

  • Introduced a new Python API reexecute_pipeline to reexecute an existing pipeline run.
  • Performance improvements in Pipeline Overview and other pages.
  • Long metadata entries in the asset details view are now scrollable.
  • Added a project field to the gcs_resource in dagster_gcp.
  • Added new CLI command dagster asset wipe to remove all existing asset keys.

Bugfix

  • Several Dagit bugfixes and performance improvements
  • Fixes pipeline execution issue with custom run launchers that call executeRunInProcess.
  • Updates dagster schedule up output to be repository location scoped

0.8.2#

Bugfix

  • Fixes issues with dagster instance migrate.
  • Fixes bug in launch_scheduled_execution that would mask configuration errors.
  • Fixes bug in dagit where schedule related errors were not shown.
  • Fixes JSON-serialization error in dagster-k8s when specifying per-step resources.

New

  • Makes label optional parameter for materializations with asset_key specified.
  • Changes Assets page to have a typeahead selector and hierarchical views based on asset_key path.
  • dagster-ssh
    • adds SFTP get and put functions to SSHResource, replacing sftp_solid.

Docs

  • Various docs corrections