Cloud Data Fusion: Update deployed pipelines through REST API

Justin Taras
4 min readMar 14, 2022

TL;DR Just about anything you can do in the Data Fusion UI you can do with the vast REST interface. In the event you need to update your pipeline by adding the latest plugin release or updating a configuration parameter, you can easily do this through the application update API.

Application Update API

I recently had a customer ask how they can modify their pipelines without having to redeploy them each time. Their situation was unique in that the need to update their plugins to get additional functionality but doing so over 1000+ pipelines is problematic. I pointed them to the Application Update API which enables developers to re-submit their pipeline configuration and update in place.

POST /v3/namespaces/<namespace-id>/apps/<app-id>/update 

All that you need to do this is the namespace name, the pipeline name (app-id), and the pipeline JSON.

Updating Deployed Pipelines

In the following example, we’ll walk through updating a deployed pipeline. The pipeline below is using an older release of the BigQuery plugin. The current release is .19.0 while we’re running .18.1 here. If we want to upgrade the plugin we can either redeploy the pipeline with the plugins enhanced OR we can use the REST API to deliver the update without redeployment.

Simple pipeline reading from and writing to BQ

To use the API we’ll first need to procure the pipeline JSON. To get the pipeline JSON click on the Gear Icon in the deployed pipeline and click EXPORT.

Export the pipeline JSON

Once you have the JSON go through and make the changes you’d like to update. In this example, we’re updating the plugin versions being used.

Old Plugin Configuration

In the above example, we can see the version of the “google-cloud” plugin for the pipeline sink. In the below example, we modify the version to .19.0.

If the version of the plugin is from the HUB make sure you modify the “scope” to USER.

With the pipeline JSON modified, the pipeline can now the updated via the API. The code below is a bash example of updating the pipeline in the example. The pipeline JSON that I modified I saved in a file that I referenced when I made my API call. That kept the API request clean because I didn’t have a huge JSON body to include.

##get auth token
export AUTH_TOKEN=$(gcloud auth print-access-token)
##set instance name
export INSTANCE_ID=my-instance
export REGION=us-central1
##get CDAP_ENDPOINT
export CDAP_ENDPOINT=$(gcloud beta data-fusion instances describe \
--location=$REGION \
--format="value(apiEndpoint)" \
${INSTANCE_ID})
curl -X POST -H "Authorization: Bearer ${AUTH_TOKEN}" -H "Content-Type: application/json" -d @pipeline.json "${CDAP_ENDPOINT}/v3/namespaces/development/apps/audit_pipeline_v2/update"

Once the API call has been made, it should take around 5–10 seconds for the process to complete. If successful a message of “update complete” will appear. Going back to the deployed pipeline page, a simple refresh of the pipeline will show the newly updated configuration.

Updated Pipeline

Concluding Thoughts

Using the API for updating deployed pipelines can save lots of development time. Instead of duplicating the pipeline to development, making the appropriate changes and then re-deploying, it can all be done with a single command. This really comes in handy when you’re working with hundreds of pipelines that all need updates to their plugins to take advantage of a new feature or enhancement. Since this mechanism is available over REST, you can write simple applications with python to programmatically update pipelines in batch without having to manually update each one.

In a future article we’ll look at building a script to automate the updates.

For more information on this API and the many others available please visit the Cloud Data Fusion documentation for more details.

--

--

Justin Taras

I’m a Google Customer Engineer interested in all things data. I love helping customers leverage their data to build new and powerful data driven applications!