AWS Redshift API Essential Guide

Aug 8, 2024 • 6 minute read

What type of API does AWS Redshift provide?

AWS Redshift does not have a native REST, GraphQL, or SOAP API. However, there are a few options for accessing Redshift data via APIs:

Amazon Redshift Data API

The Redshift Data API provides a secure HTTP endpoint to run SQL statements without managing database connections. Key points about the Redshift Data API:

It uses credentials stored in AWS Secrets Manager or temporary database credentials.
It can be used with applications like AWS Lambda, Amazon SageMaker notebooks, and AWS Cloud9.

However, this is not a true REST API - it's a custom API for running SQL queries against Redshift.

Custom API Implementation

Several sources suggest building a custom API layer on top of Redshift:

Using API Gateway and Lambda to create a REST API that queries Redshift.
Building your own REST API using Java and JDBC to connect to Redshift.
Using a third-party package designed for Redshift or Postgres to create an API.

PostgreSQL API

Redshift uses a PostgreSQL-compatible API for direct database access, but this is not a REST or web service API.

In summary, AWS Redshift does not have a native REST, GraphQL or SOAP API. The closest option is the Redshift Data API, which provides programmatic access but is not a standard REST API. For a true REST API, you would need to build a custom solution using services like API Gateway and Lambda or implement your own API layer.

Does the AWS Redshift API have webhooks?

Event Subscriptions

AWS Redshift does not have webhooks, but it does offer event notifications through Amazon Simple Notification Service (Amazon SNS).
You can create event notification subscriptions to be notified when events occur for a given cluster, snapshot, security group, or parameter group.

Types of Events

You can subscribe to various types of events, including:

Source type events:
- Cluster
- Snapshot
- Parameter group
- Security group
Event categories:
- Configuration
- Management
- Monitoring
- Security
- Pending
Event severities:
- INFO
- ERROR

Creating Event Subscriptions

To create an event subscription:

Use the AWS Management Console, AWS CLI, or Amazon Redshift API.
Specify event criteria such as source type, source ID, event category, and event severity.
Create or specify an Amazon SNS topic to receive the notifications.

Key Considerations

You can only create event subscriptions to Amazon SNS standard topics, not FIFO topics.
Billing for Amazon Redshift event notifications is through Amazon SNS.
You can view events that have occurred using the AWS Management Console, Amazon Redshift API, or AWS SDKs.
Event notifications can be sent in various forms supported by Amazon SNS, such as email, text message, or HTTP endpoint calls.

API Operations

The Amazon Redshift API provides several operations for managing event subscriptions:

CreateEventSubscription
DeleteEventSubscription
DescribeEventCategories
DescribeEventSubscriptions
DescribeEvents
ModifyEventSubscription

In summary, while AWS Redshift doesn't offer webhooks, it provides a robust event notification system through Amazon SNS. This system allows you to subscribe to various types of events and receive notifications through different channels.

Rate Limits and other limitations

Based on the search results provided, here are the key points regarding API Rate Limits for the AWS Redshift API:

Transactions per Second (TPS) Limits

The AWS Redshift Data API has specific TPS limits for different API operations [2]:

BatchExecuteStatement API: 20 TPS
CancelStatement API: 3 TPS
DescribeStatement API: 100 TPS
DescribeTable API: 3 TPS
ExecuteStatement API: 30 TPS
GetStatementResult API: 20 TPS
ListDatabases API: 3 TPS
ListSchemas API: 3 TPS
ListStatements API: 3 TPS
ListTables API: 3 TPS

These limits are not adjustable and represent the maximum number of operation requests you can make per second without being throttled [2].

Throttling and Error Handling

If the rate of requests exceeds the quota for any API, a ThrottlingException with HTTP Status Code 400 is returned [1]. To handle throttling:

Implement a retry strategy as described in the AWS SDKs and Tools Reference Guide.
Some AWS SDKs automatically implement this strategy for throttling errors [1].

Other Relevant Limits

While not strictly API rate limits, there are other important limitations to consider when using the Redshift Data API:

Maximum query duration: 24 hours [1]
Maximum number of active queries per cluster: 200 [1]
Maximum query result size: 100 MB (after gzip compression) [1]
Maximum query statement size: 100 KB [1]
Maximum retention time for query results: 24 hours [1]
Maximum retention time for client tokens: 8 hours [1]

Key Considerations

These limits apply specifically to the Redshift Data API, not necessarily to JDBC connections [3].
The Data API is available for specific node types and must be used with clusters in a VPC [1].
For AWS Step Functions, include the ClientToken idempotency parameter in Redshift Data API calls to handle retries [1].

Best Practices

Implement proper error handling and retry logic in your applications.
Monitor your API usage to ensure you stay within the limits.
Consider using client-side throttling to prevent exceeding the API limits.
For high-throughput scenarios, you may need to implement request queuing or rate limiting on your side.

It's important to note that these limits are specific to the Redshift Data API. For JDBC connections, different limitations may apply, but specific details were not provided in the search results [3].

Latest API Version

Based on the search results provided, the most recent version of the AWS Redshift API appears to be:

1.0.69451 - Released on June 18, 2024

This is listed as the "Current track version" in the Amazon Redshift patch 181 section of Source 1. It is the most recent version mentioned across all the search results.

Key points to consider:

Amazon Redshift releases new cluster versions periodically to update clusters with new features and improvements [2].
There are separate versions for Amazon Redshift provisioned clusters and Amazon Redshift Serverless [2].
The version numbers follow a format of 1.0.XXXXX, where XXXXX is an incrementing number [2].
The most recent versions are typically listed at the top of the cluster version history in the documentation [2].
The API version is separate from the cluster software version, but they are typically updated together.

Best practices:

Always check the official AWS documentation for the most up-to-date version information, as it can change frequently.
When developing applications, use the latest stable API version to ensure access to the newest features and improvements.
Be aware of any breaking changes or deprecations when upgrading to a new API version.
Test your applications thoroughly when moving to a new API version to ensure compatibility.

How to get a AWS Redshift developer account and API Keys?

To get a developer account for AWS Redshift and create an API integration, you'll need to follow these steps:

Set up an AWS account

If you don't already have an AWS account, sign up for one at aws.amazon.com.
Once you have an AWS account, you'll have access to AWS Redshift and other AWS services.

Set up IAM permissions

Create an IAM user or role with the necessary permissions to access Redshift and the Data API.
Attach the AmazonRedshiftDataFullAccess managed policy to the IAM user/role to grant full access to the Redshift Data API.
If using AWS Secrets Manager for authentication, ensure the policy allows the secretsmanager:GetSecretValue action.

Create a Redshift cluster

Set up a Redshift cluster in your AWS account if you haven't already.
Make note of the cluster identifier, database name, and user credentials.

Set up the Data API

The Redshift Data API doesn't require any additional setup - it's automatically available once you have a Redshift cluster.
You can start using the Data API through the AWS SDK in your preferred programming language.

Implement API integration

Choose your preferred programming language and install the AWS SDK for that language.
Use the SDK to make calls to the Redshift Data API. For example, in Python using boto3:

import boto3

client = boto3.client('redshift-data')

response = client.execute_statement(
    ClusterIdentifier='my-redshift-cluster',
    Database='dev',
    DbUser='awsuser',
    Sql='SELECT * FROM my_table'
)

What can you do with the AWS Redshift API?

Based on the search results provided, here are the key data models and capabilities you can interact with using the AWS Redshift API:

SQL Data Model

Execute SQL statements using the Data API operations [1]
Run queries against data stored in Amazon S3 using Redshift Spectrum, without loading the data [3]
Support for standard SQL accessed via JDBC and ODBC drivers [3]
Ability to run thousands of concurrent queries on large datasets [4]

Machine Learning Model

Create, train and deploy machine learning models using SQL commands with Redshift ML [2][3]
Supports both supervised learning (Autopilot, XGBoost, MLP algorithms) and unsupervised learning (K-Means) [3]
Integrates with Amazon SageMaker for ML capabilities [3]

API Data Model

Interact with Redshift data using the Data API, without managing database connections [1][2]
Run asynchronous queries and retrieve results later [2]
Access data from programming languages supported by AWS SDK (Python, Java, Node.js, etc.) [2]
Use the API from services like AWS Lambda, Cloud9, AppSync and EventBridge [3]
Execute queries via AWS CLI using the aws redshift-data command [3]

Data Lake Model

Query data stored in S3 data lakes using Redshift Spectrum [3]
Support for open file formats like Apache Parquet and ORC [4]

Data Integration Model

Federated querying to access data from RDS and S3 [3]
Automated data pipelines to ingest streaming data or S3 files [3]
Integration with AWS Data Exchange to query third-party datasets [3]

Security Model

Use credentials stored in AWS Secrets Manager or temporary database credentials [1]
Access control, data encryption and VPC security features [4]

Key capabilities for each model include running SQL queries, training ML models, executing asynchronous API calls, querying data lakes, integrating diverse data sources, and managing security - all through Redshift's interfaces and APIs.