Cloud Data Fusion: Upload UDD’s through the Rest API

Justin Taras
3 min readMar 21, 2022

TL;DR Use the Rest API to upload UDD’s across namespaces in a programatic way. While the Data Fusion UI has the ability to do this, it can be problematic when working across many independent namespaces. This allows a developer to quickly push new UDD’s across many namespaces.

Artifact API

I recently had a customer ask how they can upload their UDD’s without having to do so manually through the UI. Their situation was unique in that they are running many namespaces and a newly developed UDD could potentially need to be used by all pipelines in all namespaces. How can they upload a UDD across many namespaces without clicking a million times in the UI. The answer is the Artifact API!

POST /v3/namespaces/<namespace-id>/artifacts/<artifact-name>

To walk through the use of this API, we’re going to look at the sample UDD that is included in CDAP Git Repo.

Build your UDD

Build the UDD that can be found here: https://github.com/data-integrations/example-directive

This is a super simple UDD that reverses the text in a string.

To build it use Java 8 and Maven 3.6.3 to build UDD. Run the following command to build the UDD.

mvn package -DskipTests

The artifacts you need will be under the target directory.

Build artifacts: the JAR and the JSON Configuration

Upload Artifacts

Before we upload, we’ll need to set the environment vars required to interact with the API. The following code retrieves the auth token needed for authentication and the endpoint we’ll need to communicate with. The endpoint is based upon the region Data Fusion is deployed in along with the instance name.

##sign into SDK 
gcloud auth login
##get auth token
export AUTH_TOKEN=$(gcloud auth print-access-token)
##set instance name
export INSTANCE_ID=instance_name
export REGION=us-central1
##get CDAP_ENDPOINT
export CDAP_ENDPOINT=$(gcloud beta data-fusion instances describe \
--location=$REGION \
--format="value(apiEndpoint)" \
${INSTANCE_ID})

With these details setup we can build the API call.

curl -w"\n" -X POST -H "Authorization: Bearer ${AUTH_TOKEN}" "${CDAP_ENDPOINT}/v3/namespaces/[namespace name]/artifacts/[UDD Name]" \
-H "Artifact-Extends: [artifacts and versions] " \
--data-binary @[jar file full path]

In the example above we include an interesting header. The “Artifact-Extends” header can be found by examining your JSON configuration file. This basically tells Data Fusion that this artifact can be used with these other artifacts. You can see the relationship with the key “parents” in the JSON file. This means that the wrangler transform and wrangler service can use this UDD artifact. Without these details the UDD won’t upload. Also, the open bracket and closed parenthesis for the values of the parents key is not a typo. That is how Data Fusion designates that the versions are open beyond what is defined (closed parenthesis).

{
"properties": {},
"parents": [
"system:wrangler-transform[4.0.0,5.0.0)",
"system:wrangler-service[4.0.0,5.0.0)"

]
}

My example UDD curl command looks like this. I took the values from parents key and placed them in the Artifacts Extends header.

curl -w"\n" -X POST -H "Authorization: Bearer ${AUTH_TOKEN}" "${CDAP_ENDPOINT}/v3/namespaces/default/artifacts/simple-udds-1.2.0" \
-H "Artifact-Extends: system:wrangler-transform[4.0.0,5.0.0)/system:wrangler-service[4.0.0,5.0.0)" \
--data-binary @/home/admin_/simple-udds-1.2.0-SNAPSHOT.jar

If this command executes successfully, the response should say “Artifact added successfully”. To double check, navigate to the Control Center and do a search for your newly added UDD. It should come up as a “Just Added” artifact.

Newly added UDD in my default namespace

Conclusion

While this example is in bash, it could easily be replicated in python where you could iterate through lists of namespaces to upload the UDD. As new UDD’s are developed, they can easily be deployed without having to manually install them one by one in the UI.

For more information on uploading UDD’s and other artifacts visit the CDAP documentation.

https://cdap.atlassian.net/wiki/spaces/DOCS/pages/480412045/Artifacts

--

--

Justin Taras

I’m a Google Customer Engineer interested in all things data. I love helping customers leverage their data to build new and powerful data driven applications!