Show HN: Hamilton's UI – observability, lineage, and catalog for data pipelines

https://github.com/DAGWorks-Inc/hamilton/tree/main/ui

This contains the code for the new Hamilton UI. For an overview of getting started & features see this documentation. For a lengthier post and intro see our blog post.

One operational UI for all your dataflows

The Hamilton UI is a system that provides the following capabilities:

  1. Execution tracking with associated metadata
    • Provides a persistent database to store/manage these
    • Provides a server that allows reading/writing/authentication
  2. Data/artifact observability: provides telemetry/observability of Hamilton executions + specific function results/code through a web interface
  3. Lineage & provenance: allows you to quickly inspect how code and data is connected.
  4. Catalog: everything is observed and cataloged, so you can quickly search and find what exists and when it was run.

The UI is meant to monitor/debug Hamilton dataflows both in development and production. The aim is to enable dataflow authors to move faster during all phases of the software development lifecycle.

For an overview of some of these features you can watch this quick video.

Execution Tracking

Description1 Description2 Description3

See what's slow (left), pinpoint errors (middle) compare execution performance (right)

Data/Artifact Observability

Description3 Description2 Description1

Visualize data for a run (left), track code the run used (middle) compare data across executions (right)

Lineage & Provenance

Description3 Description2

See how things connect: what's upstream/downstream (left), walk through code visually (right)

Catalog

Description3 Description2

Understand artifacts produced (left), find features and when they were used (right)

Getting started

You can watch this video walkthrough on getting set up.

Make sure you have docker running:

# clone the repository if you haven't
git clone https://github.com/dagworks-inc/hamilton
# change into the UI directory
cd hamilton/ui
# run docker
./run.sh

Once docker is running navigate to http://localhost:8242 and create an email and a project; then follow instructions on integrating with Hamilton.

A fuller guide can be found here.

Architecture

The architecture is simple.

architecture-diagram

The tracking server stores data on postgres, as well as any blobs on s3. This is stored in a docker volume on local mode. The frontend is a simple React application. There are a few authentication/ACL capabilities, but the default is to use local/unauthenticated (open). Please talk to us if you have a need for more custom authentication.

Development

The structure involves a bit of cleverness to ensure the UI can easily be deployed and served from the CLI.

We have a symlink from backend/hamilton_ui to backend/server, allowing us to work with django's structure while simultaneously allowing for import as hamilton_ui. (this should probably be changed at some point but not worth it now).

To deploy, use the admin.py script in the UI directory.

This:

  1. Builds the frontend
  2. Copies it into the build/ directory
  3. Publishes to the sf-hamilton-ui package on pypi

Then you'll run it with hamilton ui after installing sf-hamilton[ui]. Note to talk to it you'll need the hamilton_sdk pacakge which can be installed with pip install sf-hamilton[sdk].

Building docker

Dev mode

For development you'll want to run

cd hamilton/ui
./dev.sh --build # to build it all
./dev.sh # to pull docker images but use local code

You need 9GB assigned to Docker or more to build the frontend

The frontend build requires around 8GB of memory to be assigned to docker to build. If you run into this, bump your docker memory allocation up to 9GB or more.

Prod mode

For production build you'll want to run

cd hamilton/ui
./run.sh # to pull from docker and run
./run.sh --build # to rebuild images for prod

Caveats:

You'll want to clean the backend/dist/ directory to not add unnecessary files to the docker image.

Pushing

How to push to docker hub:

# retag if needed
docker tag local-image:tagname dagworks/ui-backend:VERSION
# push built image
docker push dagworks/ui-backend:VERSION
# retag as latest
docker tag dagworks/ui-backend:VERSION dagworks/ui-backend:latest
# push latest
docker push dagworks/ui-backend:latest
# retag if needed
docker tag local-image:tagname dagworks/ui-frontend:VERSION
# push built image
docker push dagworks/ui-frontend:VERSION
# retag as latest
docker tag dagworks/ui-backend:VERSION dagworks/ui-backend:latest
# push latest
docker push dagworks/ui-backend:latest
{
"by": "elijahbenizzy",
"descendants": 10,
"id": 40235944,
"kids": [
40237169,
40244558,
40237525,
40244753,
40249414
],
"score": 40,
"text": "Hey HN – Stefan and Elijah here from DAGWorks (<a href=\"http:&#x2F;&#x2F;dagworks.io&#x2F;\">http:&#x2F;&#x2F;dagworks.io&#x2F;</a>, YC W23).<p>If you don’t remember us from our previous HN launch (<a href=\"https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=35056903\">https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=35056903</a>), we’re the authors of Hamilton (<a href=\"https:&#x2F;&#x2F;github.com&#x2F;dagworks-inc&#x2F;hamilton\">https:&#x2F;&#x2F;github.com&#x2F;dagworks-inc&#x2F;hamilton</a>), an open-source library for building self-documenting, modular dataflows in python that works for data, ML, LLM pipelines, &amp; even web-workflows.<p>We’ve been developing this UI for a while and we’re excited to say we open-sourced it! It comes out of the box with the following capabilities, and only requires a single line code change to get:<p>1. Execution + metadata capture, e.g. automatic code profiling<p>2. Data&#x2F;artifact observability, e.g. summary statistics over dataframes, pydantic objects, etc...<p>3. Lineage &amp; provenance of data, e.g. quickly see what is upstream &amp; downstream of code&#x2F;data.<p>4. Asset&#x2F;transform catalog, e.g. search &amp; find if feature transforms&#x2F;metrics&#x2F;datasets&#x2F;models exist and where they’re used.<p>While the UI currently only self-populates for Hamilton dataflows, we’re looking to expand to other frameworks (we’d love your feedback!).<p>Check out the following video for an overview: <a href=\"https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=0VIVSeN7Ij8\" rel=\"nofollow\">https:&#x2F;&#x2F;www.youtube.com&#x2F;watch?v=0VIVSeN7Ij8</a>, as well as the documentation: <a href=\"https:&#x2F;&#x2F;hamilton.dagworks.io&#x2F;en&#x2F;latest&#x2F;concepts&#x2F;ui&#x2F;\">https:&#x2F;&#x2F;hamilton.dagworks.io&#x2F;en&#x2F;latest&#x2F;concepts&#x2F;ui&#x2F;</a>.<p>We’re looking for feedback&#x2F;adopters – feel free to reach out if you have any questions!",
"time": 1714656159,
"title": "Show HN: Hamilton's UI – observability, lineage, and catalog for data pipelines",
"type": "story",
"url": "https://github.com/DAGWorks-Inc/hamilton/tree/main/ui"
}
{
"author": "DAGWorks-Inc",
"date": null,
"description": "Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does. - DAGWorks-Inc/h…",
"image": "https://opengraph.githubassets.com/5f442e7cf5d4e52ef0320e83bd1e6395eeaa57421dc9ba12f94fe34505781939/DAGWorks-Inc/hamilton",
"logo": "https://logo.clearbit.com/github.com",
"publisher": "GitHub",
"title": "hamilton/ui at main · DAGWorks-Inc/hamilton",
"url": "https://github.com/DAGWorks-Inc/hamilton/tree/main/ui"
}
{
"url": "https://github.com/DAGWorks-Inc/hamilton/tree/main/ui",
"title": "hamilton/ui at main · DAGWorks-Inc/hamilton",
"description": "Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does. - DAGWorks-Inc/h...",
"links": [
"https://github.com/DAGWorks-Inc/hamilton/tree/main/ui"
],
"image": "https://opengraph.githubassets.com/5f442e7cf5d4e52ef0320e83bd1e6395eeaa57421dc9ba12f94fe34505781939/DAGWorks-Inc/hamilton",
"content": "<div><article>\n<p>This contains the code for the new Hamilton UI. For an overview of getting started &amp; features\n<a target=\"_blank\" href=\"https://hamilton.dagworks.io/en/latest/concepts/ui\">see this documentation</a>. For a lengthier post and intro see our <a target=\"_blank\" href=\"https://blog.dagworks.io/p/hamilton-ui-streamlining-metadata\">blog post</a>.</p>\n<p></p><h2>One operational UI for all your dataflows</h2><a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/tree/main/ui#one-operational-ui-for-all-your-dataflows\"></a><p></p>\n<p>The Hamilton UI is a system that provides the following capabilities:</p>\n<ol>\n<li>Execution tracking with associated metadata\n<ul>\n<li>Provides a persistent database to store/manage these</li>\n<li>Provides a server that allows reading/writing/authentication</li>\n</ul>\n</li>\n<li>Data/artifact observability: provides telemetry/observability of Hamilton executions + specific function results/code through a web interface</li>\n<li>Lineage &amp; provenance: allows you to quickly inspect how code and data is connected.</li>\n<li>Catalog: everything is observed and cataloged, so you can quickly search and find what exists and when it was run.</li>\n</ol>\n<p>The UI is meant to monitor/debug Hamilton dataflows both in <strong>development</strong> and <strong>production</strong>. The aim is to enable\ndataflow authors to move faster during all phases of the software development lifecycle.</p>\n<p>For an overview of some of these features <a target=\"_blank\" href=\"https://youtu.be/0VIVSeN7Ij8?si=maeV0zdzTPSqUl1N\">you can watch this quick video</a>.</p>\n<p></p><h3>Execution Tracking</h3><a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/tree/main/ui#execution-tracking\"></a><p></p>\n<p>\n <a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/blob/main/ui/screenshots/execution_waterfall_view.png\"><img src=\"https://github.com/DAGWorks-Inc/hamilton/raw/main/ui/screenshots/execution_waterfall_view.png\" alt=\"Description1\" /></a>\n <a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/blob/main/ui/screenshots/execution_graph_error.png\"><img src=\"https://github.com/DAGWorks-Inc/hamilton/raw/main/ui/screenshots/execution_graph_error.png\" alt=\"Description2\" /></a>\n <a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/blob/main/ui/screenshots/execution_comparison_waterfall.png\"><img src=\"https://github.com/DAGWorks-Inc/hamilton/raw/main/ui/screenshots/execution_comparison_waterfall.png\" alt=\"Description3\" /></a>\n</p>\n<p>\n <em>See what's slow (left), pinpoint errors (middle) compare execution performance (right)</em>\n</p>\n<p></p><h3>Data/Artifact Observability</h3><a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/tree/main/ui#dataartifact-observability\"></a><p></p>\n<p>\n <a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/blob/main/ui/screenshots/execution_data_view.png\"><img src=\"https://github.com/DAGWorks-Inc/hamilton/raw/main/ui/screenshots/execution_data_view.png\" alt=\"Description3\" /></a>\n <a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/blob/main/ui/screenshots/execution_code_view.png\"><img src=\"https://github.com/DAGWorks-Inc/hamilton/raw/main/ui/screenshots/execution_code_view.png\" alt=\"Description2\" /></a>\n <a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/blob/main/ui/screenshots/execution_data_comparison.png\"><img src=\"https://github.com/DAGWorks-Inc/hamilton/raw/main/ui/screenshots/execution_data_comparison.png\" alt=\"Description1\" /></a>\n</p>\n<p>\n <em>Visualize data for a run (left), track code the run used (middle) compare data across executions (right)</em>\n</p>\n<p></p><h3>Lineage &amp; Provenance</h3><a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/tree/main/ui#lineage--provenance\"></a><p></p>\n<p>\n <a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/blob/main/ui/screenshots/lineage_view.png\"><img src=\"https://github.com/DAGWorks-Inc/hamilton/raw/main/ui/screenshots/lineage_view.png\" alt=\"Description3\" /></a>\n <a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/blob/main/ui/screenshots/lineage_code_view_grouped_by_module.png\"><img src=\"https://github.com/DAGWorks-Inc/hamilton/raw/main/ui/screenshots/lineage_code_view_grouped_by_module.png\" alt=\"Description2\" /></a>\n</p>\n<p>\n <em>See how things connect: what's upstream/downstream (left), walk through code visually (right) </em>\n</p>\n<p></p><h3>Catalog</h3><a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/tree/main/ui#catalog\"></a><p></p>\n<p>\n <a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/blob/main/ui/screenshots/catalog_artifact.png\"><img src=\"https://github.com/DAGWorks-Inc/hamilton/raw/main/ui/screenshots/catalog_artifact.png\" alt=\"Description3\" /></a>\n <a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/blob/main/ui/screenshots/catalog_transform.png\"><img src=\"https://github.com/DAGWorks-Inc/hamilton/raw/main/ui/screenshots/catalog_transform.png\" alt=\"Description2\" /></a>\n</p>\n<p>\n <em>Understand artifacts produced (left), find features and when they were used (right) </em>\n</p>\n<p></p><h2>Getting started</h2><a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/tree/main/ui#getting-started\"></a><p></p>\n<p>You can watch this <a target=\"_blank\" href=\"https://youtu.be/DPfxlTwaNsM\">video walkthrough on getting set up</a>.</p>\n<p>Make sure you have docker running:</p>\n<div><pre><span><span>#</span> clone the repository if you haven't</span>\ngit clone https://github.com/dagworks-inc/hamilton\n<span><span>#</span> change into the UI directory</span>\n<span>cd</span> hamilton/ui\n<span><span>#</span> run docker</span>\n./run.sh</pre></div>\n<p>Once docker is running navigate to <a target=\"_blank\" href=\"http://localhost:8242/\">http://localhost:8242</a> and create an email and a project; then follow\ninstructions on integrating with Hamilton.</p>\n<p>A fuller guide can be found <a target=\"_blank\" href=\"https://hamilton.dagworks.io/en/latest/concepts/ui\">here</a>.</p>\n<p></p><h2>Architecture</h2><a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/tree/main/ui#architecture\"></a><p></p>\n<p>The architecture is simple.</p>\n<p><a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/blob/main/ui/hamilton-ui-architecture.png\"><img src=\"https://github.com/DAGWorks-Inc/hamilton/raw/main/ui/hamilton-ui-architecture.png\" alt=\"architecture-diagram\" /></a></p>\n<p>The tracking server stores data on postgres, as well as any blobs on s3. This is stored in a docker volume\non local mode. The frontend is a simple React application. There are a few authentication/ACL capabilities,\nbut the default is to use local/unauthenticated (open). Please talk to us if you have a need for more custom authentication.</p>\n<p></p><h2>Development</h2><a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/tree/main/ui#development\"></a><p></p>\n<p>The structure involves a bit of cleverness to ensure the UI can easily be deployed and served from the CLI.</p>\n<p>We have a symlink from <code>backend/hamilton_ui</code> to <code>backend/server</code>, allowing us to work with django's structure\nwhile simultaneously allowing for import as hamilton_ui. (this should probably be changed at some point but not worth it now).</p>\n<p>To deploy, use the <code>admin.py</code> script in the UI directory.</p>\n<p>This:</p>\n<ol>\n<li>Builds the frontend</li>\n<li>Copies it into the build/ directory</li>\n<li>Publishes to the <a target=\"_blank\" href=\"https://pypi.org/project/sf-hamilton-ui/\">sf-hamilton-ui</a> package on pypi</li>\n</ol>\n<p>Then you'll run it with <code>hamilton ui</code> after installing <code>sf-hamilton[ui]</code>. Note to\ntalk to it you'll need the hamilton_sdk pacakge which can be installed with <code>pip install sf-hamilton[sdk]</code>.</p>\n<p></p><h2>Building docker</h2><a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/tree/main/ui#building-docker\"></a><p></p>\n<p></p><h3>Dev mode</h3><a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/tree/main/ui#dev-mode\"></a><p></p>\n<p>For development you'll want to run</p>\n<div><pre><span>cd</span> hamilton/ui\n./dev.sh --build <span><span>#</span> to build it all</span>\n./dev.sh <span><span>#</span> to pull docker images but use local code</span></pre></div>\n<p></p><h3>You need 9GB assigned to Docker or more to build the frontend</h3><a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/tree/main/ui#you-need-9gb-assigned-to-docker-or-more-to-build-the-frontend\"></a><p></p>\n<p>The frontend build requires around 8GB of memory to be assigned to docker to build.\nIf you run into this, bump your docker memory allocation up to 9GB or more.</p>\n<p></p><h3>Prod mode</h3><a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/tree/main/ui#prod-mode\"></a><p></p>\n<p>For production build you'll want to run</p>\n<div><pre><span>cd</span> hamilton/ui\n./run.sh <span><span>#</span> to pull from docker and run</span>\n./run.sh --build <span><span>#</span> to rebuild images for prod</span>\n</pre></div>\n<p></p><h4>Caveats:</h4><a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/tree/main/ui#caveats\"></a><p></p>\n<p>You'll want to clean the <code>backend/dist/</code> directory to not add unnecessary files to the docker image.</p>\n<p></p><h3>Pushing</h3><a target=\"_blank\" href=\"https://github.com/DAGWorks-Inc/hamilton/tree/main/ui#pushing\"></a><p></p>\n<p>How to push to docker hub:</p>\n<div><pre><span><span>#</span> retag if needed</span>\ndocker tag local-image:tagname dagworks/ui-backend:VERSION\n<span><span>#</span> push built image</span>\ndocker push dagworks/ui-backend:VERSION\n<span><span>#</span> retag as latest</span>\ndocker tag dagworks/ui-backend:VERSION dagworks/ui-backend:latest\n<span><span>#</span> push latest</span>\ndocker push dagworks/ui-backend:latest</pre></div>\n<div><pre><span><span>#</span> retag if needed</span>\ndocker tag local-image:tagname dagworks/ui-frontend:VERSION\n<span><span>#</span> push built image</span>\ndocker push dagworks/ui-frontend:VERSION\n<span><span>#</span> retag as latest</span>\ndocker tag dagworks/ui-backend:VERSION dagworks/ui-backend:latest\n<span><span>#</span> push latest</span>\ndocker push dagworks/ui-backend:latest</pre></div>\n</article></div>",
"author": "",
"favicon": "https://github.githubassets.com/favicons/favicon.svg",
"source": "github.com",
"published": "",
"ttr": 129,
"type": "object"
}