With Data Fusion, pipeline developers can create custom compute profiles to “right size” the Dataproc instance running the pipeline. These profiles are independent of the pipeline itself which enables developers to swap out profiles at runtime depending on how much data being processed.

Select from a list of available compute profiles at runtime

In the image above, a user can…

TL;DR Google provides pre-built Dataflow templates to accelerate deployment of common data integration patterns in Google Cloud. This enables developers can quickly get started building pipelines without having to build pipelines from scratch. This article examines building a streaming pipeline with Dataflow templates to feed downstream systems.

What is Apache Beam?

Apache Beam is…

TL;DR You should be using BigQuery Table Clustering

BigQuery has recently introduced the ability to independently cluster tables [BETA as of 2020–06]. Up until now, the clustering feature was only usable when configured with table partitioning. …

TL;DR: BigQuery materialized views and streaming data can be used together for building cost-effective near real-time dashboards.

In the previous post, we explored the basics behind using BigQuery materialized views. In this post we’ll explore how you can leverage materialized views with streaming data.

Streaming Data into BigQuery

Batch loading into BigQuery is sufficient…

Justin Taras

I’m a Google Customer Engineer interested in all things data. I love helping customers leverage their data to build new and powerful data driven applications!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store