SDK export from cloud data sources

📘

Prefer Import Jobs for integrating cloud data sources

We recommend using Import Jobs to import data from your cloud data sources like Snowflake and BigQuery, as it provides a simpler configuration interface and automatic scheduling of regular imports.

However, the Python SDK also provides a set of utility scripts for exporting data from a cloud data source to CSV files. These can then be imported to the Exabel platform as described in Importing via Exabel SDK.

The following cloud data sources are supported:

  • Snowflake
  • Google BigQuery
  • Amazon Athena

In order to use the export functionality with a given data source, the SDK must be installed with the necessary optional dependencies. See how to install the optional dependencies here.

Exporting data from Snowflake

Authentication to Snowflake is currently only possible using username and password. Here is an example on how to query Snowflake and store the results to a file my_entities.csv.

python -m exabel_data_sdk.scripts.sql.read_snowflake \
	--query="SELECT entity, display_name FROM entities" \
	--output-file="my_entities.csv" \
	--account="my_account_identifier" \
	--username="my_username" \
	--password="my_password"

Run python -m exabel_data_sdk.scripts.sql.read_snowflake --help to get a full printout of possible arguments.

Exporting data from Google BigQuery

If you have configured default application credentials for Google Cloud Platform using the Google Cloud SDK, you can query BigQuery without specifying any authentication method. The BigQuery client library will locate and use the default application credentials automatically. If you need to authenticate with a specific service account, you can use the argument --credentials-path with the path to a service account JSON file. Here is an example on how to query BigQuery and store the results to a file my_entities.csv.

python -m exabel_data_sdk.scripts.sql.read_bigquery \
	--query="SELECT entity, display_name FROM entities" \
	--output-file="my_entities.csv"

Run python -m exabel_data_sdk.scripts.sql.read_bigquery --help to get a full printout of possible arguments.

Exporting data using AWS Athena

When exporting data using Athena, we unload the query result to an S3 staging bucket before downloading the result, in order to be able to export large amounts of data more efficiently. See Athena's documentation on the UNLOAD statement for considerations and limitations when using this approach for querying data. In order to prevent large amounts of output data to accumulate in the staging bucket, you can configure an S3 Lifecycle configuration to automatically delete old query results.

If you have configured an AWS shared credentials file, you can query Athena without specifying any authentication method. The AWS client library will locate the credentials file and use the default profile automatically. You can also specify a different profile in the credentials file using the parameter --profile. If you need to authenticate with a specific access key, you can use the arguments --aws-access-key-id and --aws-secret-access-key . Here is an example on how to query Athena and store the results to a file my_entities.csv:

python -m exabel_data_sdk.scripts.sql.read_athena \
	--query="SELECT entity, display_name FROM entities" \
	--output-file="my_entities.csv" \
	--region="my_aws_region" \
	--s3-staging-dir="s3://my_bucket/path/to/staging_area/"

Run python -m exabel_data_sdk.scripts.sql.read_athena --help to get a full printout of possible arguments.