Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 32 additions & 58 deletions docs/docs/developers/build/connectors/data-source/azure.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,6 @@ When you add an Azure Blob Storage data model through the Rill UI, you'll see fo
Azure CLI authentication is only available through manual configuration. See [Method 5: Azure CLI Authentication](#method-5-azure-cli-authentication-local-development-only) for setup instructions.
:::

---

## Method 1: Storage Account Key (Recommended)

Storage Account Key credentials provide reliable authentication for Azure Blob Storage. This method works for both local development and Rill Cloud deployments.
Expand Down Expand Up @@ -89,13 +87,13 @@ Create `models/my_azure_data.yaml`:

```yaml
type: model
connector: duckdb
create_secrets_from_connectors: my_azure
materialize: true

sql: SELECT * FROM read_parquet('azure://rilltest.blob.core.windows.net/my-container/path/to/data/*.parquet')
connector: duckdb

refresh:
cron: "0 */6 * * *"
create_secrets_from_connectors: my_azure
sql: |
select * from read_parquet('azure://rilltest.blob.core.windows.net/my-container/path/to/data/*.parquet')
```

**Step 3: Add credentials to `.env`**
Expand All @@ -106,8 +104,6 @@ connector.azure.azure_storage_key=your_storage_account_key

Follow the [Azure Documentation](https://learn.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?tabs=azure-portal) to retrieve your storage account keys.

---

## Method 2: Connection String

Connection String provides an alternative authentication method for Azure Blob Storage.
Expand Down Expand Up @@ -144,13 +140,13 @@ Create `models/my_azure_data.yaml`:

```yaml
type: model
connector: duckdb
create_secrets_from_connectors: my_azure_conn
materialize: true

sql: SELECT * FROM read_parquet('azure://rilltest.blob.core.windows.net/my-container/path/to/data/*.parquet')
connector: duckdb

refresh:
cron: "0 */6 * * *"
create_secrets_from_connectors: my_azure_conn
sql: |
select * from read_parquet('azure://rilltest.blob.core.windows.net/my-container/path/to/data/*.parquet')
```

**Step 3: Add credentials to `.env`**
Expand All @@ -161,8 +157,6 @@ connector.azure.azure_storage_connection_string=your_connection_string

Follow the [Azure Documentation](https://learn.microsoft.com/en-us/azure/storage/common/storage-account-keys-manage?tabs=azure-portal) to retrieve your connection string.

---

## Method 3: Shared Access Signature (SAS) Token

SAS tokens provide fine-grained access control with specific permissions and expiration times for secure access to your storage resources.
Expand Down Expand Up @@ -201,13 +195,13 @@ Create `models/my_azure_data.yaml`:

```yaml
type: model
connector: duckdb
create_secrets_from_connectors: my_azure_sas
materialize: true

sql: SELECT * FROM read_parquet('azure://rilltest.blob.core.windows.net/my-container/path/to/data/*.parquet')
connector: duckdb

refresh:
cron: "0 */6 * * *"
create_secrets_from_connectors: my_azure_sas
sql: |
select * from read_parquet('azure://rilltest.blob.core.windows.net/my-container/path/to/data/*.parquet')
```

**Step 3: Add credentials to `.env`**
Expand All @@ -218,8 +212,6 @@ connector.azure.azure_storage_sas_token=your_sas_token

Follow the [Azure Documentation](https://learn.microsoft.com/en-us/azure/ai-services/translator/document-translation/how-to-guides/create-sas-tokens?tabs=Containers) to create your Azure SAS token.

---

## Method 4: Public Containers

For publicly accessible Azure Blob Storage containers, you don't need to create a connector. Simply use the Azure URI directly in your model configuration.
Expand All @@ -246,16 +238,14 @@ Create `models/my_azure_data.yaml`:

```yaml
type: model
connector: duckdb
materialize: true

sql: SELECT * FROM read_parquet('azure://publicaccount.blob.core.windows.net/my-public-container/path/to/data/*.parquet')
connector: duckdb

refresh:
cron: "0 */6 * * *"
sql: |
select * from read_parquet('azure://publicaccount.blob.core.windows.net/my-public-container/path/to/data/*.parquet')
```

---

## Method 5: Azure CLI Authentication (Local Development Only)

For local development, you can use credentials from the Azure CLI. This method is **not suitable for production** or Rill Cloud deployments. This method is only available through manual configuration, and you don't need to create a connector file.
Expand All @@ -279,12 +269,12 @@ Create `models/my_azure_data.yaml`:

```yaml
type: model
connector: duckdb
materialize: true

sql: SELECT * FROM read_parquet('azure://rilltest.blob.core.windows.net/my-container/path/to/data/*.parquet')
connector: duckdb

refresh:
cron: "0 */6 * * *"
sql: |
select * from read_parquet('azure://rilltest.blob.core.windows.net/my-container/path/to/data/*.parquet')
```

Rill will automatically detect and use your local Azure CLI credentials when no connector is specified.
Expand All @@ -293,34 +283,19 @@ Rill will automatically detect and use your local Azure CLI credentials when no
This method only works for local development. Deploying to Rill Cloud with this configuration will fail because the cloud environment doesn't have access to your local credentials. Always use Storage Account Key, Connection String, or SAS tokens for production deployments.
:::

## Using Azure Blob Storage Data in Models

Once your connector is configured (or for public containers, no connector needed), you can reference Azure Blob Storage paths in your model SQL queries using DuckDB's Azure functions.

### Basic Example

**With a connector (authenticated):**

```yaml
type: model
connector: duckdb

sql: SELECT * FROM read_parquet('azure://rilltest.blob.core.windows.net/my-container/data/*.parquet')

refresh:
cron: "0 */6 * * *"
```
## Reading Different File Types

**Public container (no connector needed):**
DuckDB supports reading various file formats directly from Azure Blob Storage:

```yaml
type: model
connector: duckdb
```sql
-- Read Parquet files
select * from read_parquet('azure://account.blob.core.windows.net/container/data/*.parquet')

sql: SELECT * FROM read_parquet('azure://publicaccount.blob.core.windows.net/my-public-container/data/*.parquet')
-- Read CSV files
select * from read_csv('azure://account.blob.core.windows.net/container/data/*.csv', auto_detect=true, ignore_errors=1, header=true)

refresh:
cron: "0 */6 * * *"
-- Read JSON files
select * from read_json('azure://account.blob.core.windows.net/container/data/*.json', auto_detect=true, ignore_errors=1)
```

### Path Patterns
Expand All @@ -341,7 +316,6 @@ SELECT * FROM read_parquet('azure://account.blob.core.windows.net/container/data
SELECT * FROM read_parquet('azure://account.blob.core.windows.net/container/data/2024-*.parquet')
```

---

## Deploy to Rill Cloud

Expand Down
104 changes: 27 additions & 77 deletions docs/docs/developers/build/connectors/data-source/gcs.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,6 @@ When you add a GCS data model through the Rill UI, you'll see three authenticati
1. **Configure Data Model** - Define which bucket and objects to ingest
The UI will only create the model file (no connector file is needed).

---

## Method 1: Service Account JSON (Recommended)

Service Account JSON credentials provide the most secure and reliable authentication for GCS. This method works for both local development and Rill Cloud deployments.
Expand Down Expand Up @@ -78,14 +76,13 @@ Create `models/my_gcs_data.yaml`:

```yaml
type: model
connector: duckdb
create_secrets_from_connectors: my_gcs
materialize: true

sql: SELECT * FROM read_parquet('gs://my-bucket/path/to/data/*.parquet')
connector: duckdb

# Add a refresh schedule
refresh:
cron: "0 */6 * * *"
create_secrets_from_connectors: my_gcs
sql: |
select * from read_parquet('gs://my-bucket/path/to/data/*.parquet')
```

**Step 3: Add credentials to `.env`**
Expand All @@ -94,8 +91,6 @@ refresh:
connector.gcs.google_application_credentials=<json_credentials>
```

---

## Method 2: HMAC Keys

HMAC keys provide S3-compatible authentication for GCS. This method is useful when you need compatibility with S3-style access patterns.
Expand Down Expand Up @@ -134,14 +129,13 @@ Create `models/my_gcs_data.yaml`:

```yaml
type: model
connector: duckdb
create_secrets_from_connectors: my_gcs_hmac
materialize: true

sql: SELECT * FROM read_parquet('gs://my-bucket/path/to/data/*.parquet')
connector: duckdb

# Add a refresh schedule
refresh:
cron: "0 */6 * * *"
create_secrets_from_connectors: my_gcs_hmac
sql: |
select * from read_parquet('gs://my-bucket/path/to/data/*.parquet')
```

**Step 3: Add credentials to `.env`**
Expand All @@ -155,8 +149,6 @@ connector.gcs.secret=your-secret-access-key
Notice that the connector uses `key_id` and `secret`. HMAC keys use S3-compatible authentication with GCS.
:::

---

## Method 3: Public Buckets

For publicly accessible GCS buckets, you don't need to create a connector. Simply use the GCS URI directly in your model configuration.
Expand All @@ -183,17 +175,14 @@ Create `models/my_gcs_data.yaml`:

```yaml
type: model
connector: duckdb
materialize: true

sql: SELECT * FROM read_parquet('gs://my-public-bucket/path/to/data/*.parquet')
connector: duckdb

# Add a refresh schedule
refresh:
cron: "0 */6 * * *"
sql: |
select * from read_parquet('gs://my-public-bucket/path/to/data/*.parquet')
```

---

## Method 4: Local Google Cloud CLI Credentials

For local development, you can use credentials from the Google Cloud CLI. This method is **not suitable for production** or Rill Cloud deployments. This method is only available through manual configuration, and you don't need to create a connector file.
Expand All @@ -213,13 +202,12 @@ Create `models/my_gcs_data.yaml`:

```yaml
type: model
connector: duckdb
materialize: true

sql: SELECT * FROM read_parquet('gs://my-bucket/path/to/data/*.parquet')
connector: duckdb

# Add a refresh schedule
refresh:
cron: "0 */6 * * *"
sql: |
select * from read_parquet('gs://my-bucket/path/to/data/*.parquet')
```

Rill will automatically detect and use your local Google Cloud CLI credentials when no connector is specified.
Expand All @@ -228,55 +216,20 @@ Rill will automatically detect and use your local Google Cloud CLI credentials w
This method only works for local development. Deploying to Rill Cloud with this configuration will fail because the cloud environment doesn't have access to your local credentials. Always use Service Account JSON or HMAC keys for production deployments.
:::

---

## Using GCS Data in Models

Once your connector is configured (or for public buckets, no connector needed), you can reference GCS paths in your model SQL queries using DuckDB's GCS functions.

### Basic Example

**With a connector (authenticated):**

```yaml
type: model
connector: duckdb
## Reading Different File Types

sql: SELECT * FROM read_parquet('gs://my-bucket/data/*.parquet')
DuckDB supports reading various file formats directly from GCS:

refresh:
cron: "0 */6 * * *"
```

**Public bucket (no connector needed):**

```yaml
type: model
connector: duckdb

sql: SELECT * FROM read_parquet('gs://my-public-bucket/data/*.parquet')

refresh:
cron: "0 */6 * * *"
```

### Reading Multiple File Types
```sql
-- Read Parquet files
select * from read_parquet('gs://my-bucket/data/*.parquet')

```yaml
type: model
connector: duckdb
-- Read CSV files
select * from read_csv('gs://my-bucket/data/*.csv', auto_detect=true, ignore_errors=1, header=true)

sql: |
-- Read Parquet files
SELECT * FROM read_parquet('gs://my-bucket/parquet-data/*.parquet')

UNION ALL

-- Read CSV files
SELECT * FROM read_csv('gs://my-bucket/csv-data/*.csv', AUTO_DETECT=TRUE)

refresh:
cron: "0 */6 * * *"
-- Read JSON files
select * from read_json('gs://my-bucket/data/*.json', auto_detect=true, ignore_errors=1)
```

### Path Patterns
Expand All @@ -297,8 +250,6 @@ SELECT * FROM read_parquet('gs://my-bucket/data/**/*.parquet')
SELECT * FROM read_parquet('gs://my-bucket/data/2024-*.parquet')
```

---

## Deploy to Rill Cloud

When deploying a project to Rill Cloud, Rill requires you to explicitly provide Service Account JSON or HMAC Keys for Google Cloud Storage used in your project. Please refer to our [connector YAML reference docs](/reference/project-files/connectors#gcs) for more information.
Expand All @@ -308,7 +259,6 @@ If you subsequently add sources that require new credentials (or if you simply e
rill env push
```

---

## Appendix

Expand Down
Loading
Loading