- [dagster-dbt] dbt rpc resources now surface dbt log messages in the Dagster event log.
- [dagster-databricks] The
databricks_pyspark_step_launcher
now streams Dagster logs back from Databricks rather than waiting for the step to completely finish before exporting all events. Fixed an issue where all events from the external step would share the same timestamp. Immediately after execution, stdout and stderr logs captured from the Databricks worker will be automatically surfaced to the event log, removing the need to set the wait_for_logs
option in most scenarios. - [dagster-databricks] The
databricks_pyspark_step_launcher
now supports dynamically mapped steps. - If the scheduler is unable to reach a code server when executing a schedule tick, it will now wait until the code server is reachable again before continuing, instead of marking the schedule tick as failed.
- The scheduler will now check every 5 seconds for new schedules to run, instead of every 30 seconds.
- The run viewer and workspace pages of Dagit are significantly more performant.
- Dagit loads large (100+ node) asset graphs faster and retrieves information about the assets being rendered only.
- When viewing an asset graph in Dagit, you can now rematerialize the entire graph by clicking a single “Refresh” button, or select assets to rematerialize them individually. You can also launch a job to rebuild an asset directly from the asset details page.
- When viewing a software-defined asset, Dagit displays its upstream and downstream assets in two lists instead of a mini-graph for easier scrolling and navigation. The statuses of these assets are updated in real-time. This new UI also resolves a bug where only one downstream asset would appear.
- Fixed bug where
execute_in_process
would not work for graphs with nothing inputs. - In the Launchpad in Dagit, the
Ctrl+A
command did not correctly allow select-all behavior in the editor for non-Mac users, this has now been fixed. - When viewing a DAG in Dagit and hovering on a specific input or output for an op, the connections between the highlighted inputs and outputs were too subtle to see. These are now a bright blue color.
- In Dagit, when viewing an in-progress run, a caching bug prevented the page from updating in real time in some cases. For instance, runs might appear to be stuck in a queued state long after being dequeued. This has been fixed.
- Fixed a bug in the
k8s_job_executor
where the same step could start twice in rare cases. - Enabled faster queries for the asset catalog by migrating asset database entries to store extra materialization data.
- [dagster-aws] Viewing the compute logs for in-progress ops for instances configured with the
S3ComputeLogManager
would cause errors in Dagit. This is now fixed. - [dagster-pandas] Fixed bug where Pandas categorical dtype did not work by default with dagster-pandas
categorical_column
constraint. - Fixed an issue where schedules that yielded a
SkipReason
from the schedule function did not display the skip reason in the tick timeline in Dagit, or output the skip message in the dagster-daemon log output. - Fixed an issue where the snapshot link of a finished run in Dagit would sometimes fail to load with a GraphQL error.
- Dagit now supports software-defined assets that are defined in multiple jobs within a repo, and displays a warning when assets in two repos share the same name.
- We previously allowed schedules to be defined with cron strings like
@daily
rather than 0 0 * * *
. However, these schedules would fail to actually run successfully in the daemon and would also cause errors when viewing certain pages in Dagit. We now raise an DagsterInvalidDefinitionError
for schedules that do not have a cron expression consisting of a 5 space-separated fields.
- In dagster-dask, a schema can now be conditionally specified for ops materializing outputs to parquet files, thank you @kudryk!
- Dagster-gcp change from @AndreaGiardini that replaces
get_bucket()
calls with bucket()
, to avoid unnecessary bucket metadata fetches, thanks! - Typo fix from @sebastianbertoli, thank you!
- [dagster-k8s] Kubernetes jobs and pods created by Dagster now have labels identifying the name of the Dagster job or op they are running. Thanks @skirino!
- [dagit] Made performance improvements for loading the asset graph.
- [dagit] The debug console logging output now tracks calls to fetch data from the database, to help track inefficient queries.
- The dagit and dagster-daemon processes now use a structured Python logger for command-line output.
- Dagster command-line logs now include the system timezone in the logging timestamp.
- When running your own Dagster gRPC code server, the server process will now log a message to stdout when it starts up and when it shuts down.
- [dagit] The sensor details page and sensor list page now display links to the assets tracked by
@asset_sensor
s. - [dagit] Improved instance warning in Dagit. Previously, Dagit showed an instance warning for daemon not running when no repos have schedulers or sensors.
- [dagster-celery-k8s] You can now specify volumes and volume mounts to runs using the
CeleryK8sRunLauncher
that will be included in all launched jobs. - [dagster-databricks] You are no longer required to specify storage configuration when using the databricks_pyspark_step_launcher.
- [dagster-databricks] The databricks_pyspark_step_launcher can now be used with dynamic mapping and collect steps.
- [dagster-mlflow] The
end_mlflow_on_run_finished
hook is now a top-level export of the dagster mlflow library. The API reference also now includes an entry for it.
- Better backwards-compatibility for fetching asset keys materialized from older versions of dagster.
- Fixed an issue where jobs running with op subsets required some resource configuration as part of the run config, even when they weren’t required by the selected ops.
RetryPolicy
is now respected when execution is interrupted.- [dagit] Fixed "Open in Playground" link on the scheduled ticks.
- [dagit] Fixed the run ID links on the Asset list view.
- [dagit] When viewing an in-progress run, the run status sometimes failed to update as new logs arrived, resulting in a Gantt chart that either never updated from a “queued” state or did so only after a long delay. The run status and Gantt chart now accurately match incoming logs.
- [dagster-k8s] Fixed an issue where specifying
job_metadata
in tags did not correctly propagate to Kubernetes jobs created by Dagster. Thanks @ibelikov!
- [dagit] Made performance improvements for loading the asset graph.
- The Versioning and Memoization guide has been updated to reflect a new set of core memoization APIs.
- [dagster-dbt] Updated the dagster-dbt integration guide to mention the new dbt Cloud integration.
- [dagster-dbt] Added documentation for the
default_flags
property of DbtCliResource
.
- [dagit] Made performance improvements to the Run page.
- [dagit] Highlighting a specific sensor / schedule ticks is now reflected in a shareable URL.
- [dagit] On the Runs page, when filtering runs with a tag containing a comma, the filter input would incorrectly break the tag apart. This has been fixed.
- [dagit] For sensors that do not target a specific job (e.g. un_status_sensor, we are now hiding potentially confusing Job details
- [dagit] Fixed an issue where some graph explorer views generated multiple scrollbars.
- [dagit] Fixed an issue with the Run view where the Gantt view incorrectly showed in-progress steps when the run had exited.
- [dagster-celery-k8s] Fixed an issue where setting a custom Celery broker URL but not a custom Celery backend URL in the helm chart would produce an incorrect Celery configuration.
- [dagster-k8s] Fixed an issue where Kubernetes volumes using list or dict types could not be set in the Helm chart.
- [dagster-k8s] Added the ability to set a custom location name when configuring a workspace in the Helm chart. Thanks @pcherednichenko!
- [dagit] Asset jobs now display with spinners on assets that are currently in progress.
- [dagit] Assets jobs that are in progress will now display a dot icon on all assets that are not yet running but will be re-materialized in the run.
- [dagit] Fixed broken links to the asset catalog entries from the explorer view of asset jobs.
- The
AssetIn
input object now accepts an asset key so upstream assets can be explicitly specified (e.g. AssetIn(asset_key=AssetKey("asset1"))
) - The
@asset
decorator now has an optional non_argument_deps
parameter that accepts AssetKeys of assets that do not pass data but are upstream dependencies. ForeignAsset
objects now have an optional description
attribute.
- “Validating Data with Dagster Type Factories” guide added.
run_id
, job_name
, and op_exception
have been added as parameters to build_hook_context
.- You can now define inputs on the top-level job / graph. Those inputs can be can configured as an inputs key on the top level of your run config. For example, consider the following job:
from dagster import job, op
@op
def add_one(x):
return x + 1
@job
def my_job(x):
add_one(x)
You can now add config for x at the top level of my run_config like so:
run_config = {
"inputs": {
"x": {
"value": 2
}
}
}
- You can now create partitioned jobs and reference a run’s partition from inside an op body or IOManager load_input or handle_output method, without threading partition values through config. For example, where previously you might have written:
@op(config_schema={"partition_key": str})
def my_op(context):
print("partition_key: " + context.op_config["partition_key"])
@static_partitioned_config(partition_keys=["a", "b"])
def my_static_partitioned_config(partition_key: str):
return {"ops": {"my_op": {"config": {"partition_key": partition_key}}}}
@job(config=my_static_partitioned_config)
def my_partitioned_job():
my_op()
You can now write:
@op
def my_op(context):
print("partition_key: " + context.partition_key)
@job(partitions_def=StaticPartitionsDefinition(["a", "b"]))
def my_partitioned_job():
my_op()
- Added
op_retry_policy
to @job
. You can also specify op_retry_policy
when invoking to_job
on graphs. - [dagster-fivetran] The
fivetran_sync_op
will now be rendered with a fivetran tag in Dagit. - [dagster-fivetran] The
fivetran_sync_op
now supports producing AssetMaterializations
for each table updated during the sync. To this end, it now outputs a structured FivetranOutput
containing this schema information, instead of an unstructured dictionary. - [dagster-dbt]
AssetMaterializations
produced from the dbt_cloud_run_op now include a link to the dbt Cloud docs for each asset (if docs were generated for that run). - You can now use the
@schedule
decorator with RunRequest
- based evaluation functions. For example, you can now write:
@schedule(cron_schedule="* * * * *", job=my_job)
def my_schedule(context):
yield RunRequest(run_key="a", ...)
yield RunRequest(run_key="b", ...)
- [dagster-k8s] You may now configure instance-level
python_logs
settings using the Dagster Helm chart. - [dagster-k8s] You can now manage a secret that contains the Celery broker and backend URLs, rather than the Helm chart
- [Dagster-slack] Improved the default messages in
make_slack_on_run_failure_sensor
to use Slack layout blocks and include clickable link to Dagit. Previously, it sent a plain text message.
- Made performance improvements to the Run page.
- The Run page now has a pane control that splits the Gantt view and log table evenly on the screen.
- The Run page now includes a list of succeeded steps in the status panel next to the Gantt chart.
- In the Schedules list, execution timezone is now shown alongside tick timestamps.
- If no repositories are successfully loaded when viewing Dagit, we now redirect to /workspace to quickly surface errors to the user.
- Increased the size of the reload repository button
- Repositories that had been hidden from the left nav became inaccessible when loaded in a workspace containing only that repository. Now, when loading a workspace containing a single repository, jobs for that repository will always appear in the left nav.
- In the Launchpad, selected ops were incorrectly hidden in the lower right panel.
- Repaired asset search input keyboard interaction.
- In the Run page, the list of previous runs was incorrectly ordered based on run ID, and is now ordered by start time.
- Using keyboard commands with the / key (e.g. toggling commented code) in the config editor
- Previously, if an asset in software-defined assets job depended on a
ForeignAsset
, the repository containing that job would fail to load. - Incorrectly triggered global search. This has been fixed.
- Fix type on tags of EMR cluster config (thanks Chris)!
- Fixes to the tests in dagster new-project , which were previously using an outdated result API (thanks Vašek)!
- You can now mount AWS Secrets Manager secrets as environment variables in runs launched by the
EcsRunLauncher
. - You can now specify the CPU and Memory for runs launched by the
EcsRunLauncher
. - The
EcsRunLauncher
now dynamically chooses between assigning a public IP address or not based on whether it’s running in a public or private subnet. - The
@asset
and @multi_asset
decorator now return AssetsDefinition
objects instead of OpDefinitions
- The tutorial now uses
get_dagster_logger
instead of context.log
. - In the API docs, most configurable objects (such as ops and resources) now have their configuration schema documented in-line.
- Removed typo from CLI readme (thanks Kan (https://github.com/zkan))!
- Memoization can now be used with the multiprocess, k8s, celery-k8s, and dask executors.