-
Notifications
You must be signed in to change notification settings - Fork 469
Description
Is this related to an existing feature request or issue?
No response
Which Powertools for AWS Lambda (Python) utility does this relate to?
Other
Summary
The data_masking utility in powertools encrypts/decrypts the entire data body, so you get a single encrypted data blob. There are times when you want to apply encryption/decryption to the actual fields within a payload rather than encrypt the whole thing.
I intend to raise an RFC for a TypeScript implementation and am also happy to take this on.
Use case
A real use case for this is ensuring the security of PII data stored during checkpointing of Durable function outputs. The existing mechanism there also serialises/deserialises the entire payload rather than specific fields, and the encrypted data blob is visible in the console (with no structure or other field pointers to assist debugging or investigation of issues).
I would like to have encryption/decryption at the field level, while preserving the payload structure. This can assist with debugging durable function executions, where the payload remains visible (including its structure), which is helpful. This also enables the selective encryption of data fields in the payload.
Proposal
The approach I am putting forward is to not encrypt each field of data individually, which would be computationally expensive, but instead, to replace each field with an encryption placeholder: "secret_field": { "__encrypted": "my.secret_field"} and then include ALL the encrypted fields as a single blob at the end of the data payload whcih is encrypted using a single encryption function call. Following is an example showing the original, encrypted and decrypted payloads:
Original Payload:
{
"customer": {
"name": "John",
"ssn": "123-45-6789",
"creditCard": "4111-1111-1111-1111"
}
}After encrypt(data, ['customer.ssn', 'customer.creditCard']):
{
"customer": {
"name": "John",
"ssn": { "__encrypted": "customer.ssn" },
"creditCard": { "__encrypted": "customer.creditCard" }
},
"__powertools_encrypted_data": "AQICAHh8s0D5ZXJzaW9uIjogIjEuMCIsICJhbGdvcml0aG0iOiAiQUVTL0dDTS9Ob1BhZGRpbmciLCAiY2lwaGVydGV4dCI6ICJ...",
"__powertools_encryption_context": {
"purpose": "field-encryption"
}
}Within the encrypted_data field is json structure of fields to be encrypted:
{
"customer.ssn": "123-45-6789",
"customer.creditCard": "4111-1111-1111-1111"
}Usage Examples
Basic Field Encryption
from aws_lambda_powertools.utilities.data_masking import DataMasking
from aws_lambda_powertools.utilities.data_masking.provider.kms.aws_encryption_sdk import AWSEncryptionSDKProvider
provider = AWSEncryptionSDKProvider(keys=[KMS_KEY_ARN])
data_masker = DataMasking(provider=provider)
def lambda_handler(event, context):
order_data = {
"orderId": "12345",
"customer": {
"name": "John Doe",
"email": "[email protected]",
"ssn": "123-45-6789"
},
"payment": {
"creditCard": "4111-1111-1111-1111",
"amount": 99.99
}
}
# Encrypt sensitive fields only
encrypted = data_masker.encrypt(
order_data,
fields=["customer.ssn", "payment.creditCard"],
tenant_id="acme-corp"
)
# Store encrypted payload (orderId and amount visible for queries)
dynamodb.put_item(Item=encrypted)Processing with Partial Visibility
def process_order(encrypted_order):
# Can query/filter by non-encrypted fields without decryption
if encrypted_order["payment"]["amount"] > 100:
send_fraud_alert(encrypted_order["orderId"])
# Only decrypt when actually needed
decrypted = data_masker.decrypt(encrypted_order)
charge_credit_card(decrypted["payment"]["creditCard"])Multi-Tenant Data Isolation
def store_customer_data(tenant_id, customer_data):
# Encryption context binds data to tenant
encrypted = data_masker.encrypt(
customer_data,
fields=["ssn", "dob", "medicalRecords"],
tenant_id=tenant_id,
data_classification="pii"
)
return encrypted
def retrieve_customer_data(encrypted_data):
# AWS Encryption SDK validates tenant_id automatically
# Decrypt fails if context doesn't match
decrypted = data_masker.decrypt(encrypted_data)
return decryptedDurable Function Checkpointing
Need to implement an integration class for Durable Function checkpointing which should form part of what is implemented in the Powertools library.
from aws_lambda_powertools.utilities.data_masking import DataMasking
from aws_lambda_powertools.utilities.data_masking.provider.kms.aws_encryption_sdk import AWSEncryptionSDKProvider
from aws_durable_execution_sdk_python import DurableContext, durable_execution
from aws_durable_execution_sdk_python.config import StepConfig
import os
import json
# Initialize once at module level - cached across invocations
KMS_KEY_ARN = os.getenv("KMS_KEY_ARN")
class EncryptedFieldsSerDes:
def __init__(self, encrypted_fields: list[str], kms_key_arn: str):
provider = AWSEncryptionSDKProvider(keys=[kms_key_arn])
self.data_masker = DataMasking(provider=provider)
self.encrypted_fields = encrypted_fields
def serialize(self, value: dict, context) -> str:
encrypted = self.data_masker.encrypt(
value,
fields=self.encrypted_fields,
workflow_id=context.operation_id
)
return json.dumps(encrypted)
def deserialize(self, data: str, context) -> dict:
encrypted = json.loads(data)
return self.data_masker.decrypt(encrypted)
# Create SerDes instance at module level - reused across invocations
payment_serdes = EncryptedFieldsSerDes(
encrypted_fields=["customer.creditCard", "customer.ssn"],
kms_key_arn=KMS_KEY_ARN
)
@durable_execution
def handler(event: dict, context: DurableContext):
# Reuse cached SerDes instance
result = context.step(
lambda _: process_payment(event),
name="process_payment",
config=StepConfig(serdes=payment_serdes)
)
return resultDebugging and Observability
# Console/logs show structure without exposing PII
logger.info("Processing order", extra={
"encrypted_order": encrypted_order
})
# Output in logs:
# {
# "orderId": "12345",
# "customer": {
# "name": "John Doe",
# "ssn": {"__encrypted": "customer.ssn"}
# },
# "payment": {"amount": 99.99},
# "__powertools_encrypted_data": "AQICAHh8s0D5...",
# "__powertools_encryption_context": {"tenant_id": "acme-corp"}
# }
# Structure visible for debugging
# Non-sensitive fields queryable
# Encrypted blob present but not readable without KMS accessOut of scope
Nothing considered
Potential challenges
The outcome of encryption will change the payload structure - but that is also what encryption is all about.
It is a potential challenge for customers, but is transparently handled by the decrypt function.
Dependencies and Integrations
No response
Alternative solutions
Acknowledgment
- This feature request meets Powertools for AWS Lambda (Python) Tenets
- Should this be considered in other Powertools for AWS Lambda languages? i.e. Java, TypeScript, and .NET
Metadata
Metadata
Assignees
Labels
Type
Projects
Status