On June 13, 2023, Deque’s cloud hosting provider (AWS) encountered a problem with it’s Lambda service. The problem was described as increased error rates and latency. This problem impacted many other AWS services that depend on Lambda. As a result, two of Deque’s services, Axe Mobile and Axe Developer Hub experienced intermittent periods of latency and unavailability. During this time customers may have experienced error messages while interacting with these services. No intervention was required from Deque; performance and availability began to improve as AWS resolved the problem.
The following is the incident timeline and description, taken from the AWS service status page at https://health.aws.amazon.com/
Service
Start time
June 13, 2023 at 3:08:00 PM UTC-4
Severity
Resolved
Status
Closed
End time
June 13, 2023 at 6:42:39 PM UTC-4
Region / Availability Zone
us-east-1
Category
Issue
Account specific
No
[RESOLVED] Increased Error Rates and Latencies [03:42 PM PDT]
Between 11:49 AM PDT and 3:37 PM PDT, we experienced increased error rates and latencies for multiple AWS Services in the US-EAST-1 Region. Our engineering teams were immediately engaged and began investigating. We quickly narrowed down the root cause to be an issue with a subsystem responsible for capacity management for AWS Lambda, which caused errors directly for customers (including through API Gateway) and indirectly through the use of other AWS services. Additionally, customers may have experienced authentication or sign-in errors when using the AWS Management Console, or authenticating through Cognito or IAM STS. Customers may also have experienced issues when attempting to initiate a Call or Chat to AWS Support. As of 2:47 PM PDT, the issue initiating calls and chats to AWS Support was resolved. By 1:41 PM PDT, the underlying issue with the subsystem responsible for AWS Lambda was resolved. At that time, we began processing the backlog of asynchronous Lambda invocations that accumulated during the event, including invocations from other AWS services. As of 3:37 PM PDT, the backlog was fully processed. The issue has been resolved and all AWS Services are operating normally.
[02:49 PM PDT] We are working to accelerate the rate at which Lambda asynchronous invocations are processed, and now estimate that the queue will be fully processed over the next hour. We expect that all queued invocations will be executed.
[02:29 PM PDT] Lambda synchronous invocation APIs have recovered. We are still working on processing the backlog of asynchronous Lambda invocations that accumulated during the event, including invocations from other AWS services (such as SQS and EventBridge). Lambda is working to process these messages during the next few hours and during this time, we expect to see continued delays in the execution of asynchronous invocations.
[02:00 PM PDT] Many AWS services are now fully recovered and marked Resolved on this event. We are continuing to work to fully recover all services.
[01:48 PM PDT] Beginning at 11:49 AM PDT, customers began experiencing errors and latencies with multiple AWS services in the US-EAST-1 Region. Our engineering teams were immediately engaged and began investigating. We quickly narrowed down the root cause to be an issue with a subsystem responsible for capacity management for AWS Lambda, which caused errors directly for customers (including through API Gateway) and indirectly through the use by other AWS services. We have associated other services that are impacted by this issue to this post on the Health Dashboard. Additionally, customers may experience authentication or sign-in errors when using the AWS Management Console, or authenticating through Cognito or IAM STS. Customers may also experience intermittent issues when attempting to call or initiate a chat to AWS Support. We are now observing sustained recovery of the Lambda invoke error rates, and recovery of other affected AWS services. We are continuing to monitor closely as we work towards full recovery across all services.
[01:38 PM PDT] We are beginning to see an improvement in the Lambda function error rates. We are continuing to work towards full recovery.
[01:14 PM PDT] We are continuing to work to resolve the error rates invoking Lambda functions. We're also observing elevated errors obtaining temporary credentials from the AWS Security Token Service, and are working in parallel to resolve these errors.
[12:36 PM PDT] We are continuing to experience increased error rates and latencies for multiple AWS Services in the US-EAST-1 Region. We have identified the root cause as an issue with AWS Lambda, and are actively working toward resolution. For customers attempting to access the AWS Management Console, we recommend using a region-specific endpoint (such as: https://us-west-2.console.aws.amazon.com). We are actively working on full mitigation and will continue to provide regular updates. [12:26 PM PDT] We have identified the root cause of the elevated errors invoking AWS Lambda functions, and are actively working to resolve this issue.
[12:19 PM PDT] AWS Lambda function invocation is experiencing elevated error rates. We are working to identify the root cause of this issue. [12:08 PM PDT] We are investigating increased error rates and latencies in the US-EAST-1 Region.
The following AWS services have been affected by this issue.
Resolved (104 services)
AWS Account Management
AWS Amplify
AWS Amplify Admin
AWS AppSync
AWS Batch
AWS Certificate Manager
AWS Cloud9
AWS CloudFormation
AWS CodeCommit
AWS CodePipeline
AWS CodeStar
AWS Config
AWS Control Tower
AWS Data Exchange
AWS DataSync
AWS Directory Service
AWS Elemental
AWS Fargate
AWS Fault Injection Simulator
AWS Global Accelerator
AWS Glue
AWS Ground Station
AWS Identity and Access Management
AWS IoT Device Management
AWS IoT FleetWise
AWS IoT Greengrass
AWS IoT SiteWise
AWS Lake Formation
AWS Lambda
AWS License Manager
AWS Management Console
AWS Marketplace
AWS Migration Hub Strategy Recommendations
AWS Organizations
AWS Outposts
AWS Private Certificate Authority
AWS QuickSight
AWS Resource Explorer
AWS Resource Groups
AWS Secrets Manager
AWS Service Catalog
AWS Single Sign-On
AWS Support Center
AWS Transfer Family
AWS VPCE PrivateLink
AWS Well-Architected Tool
Amazon API Gateway
Amazon AppStream 2.0
Amazon Athena
Amazon Augmented AI
Amazon Braket
Amazon Chime
Amazon CloudFront
Amazon CloudWatch
Amazon CloudWatch Synthetics
Amazon CodeCatalyst
Amazon CodeGuru Profiler
Amazon CodeGuru Reviewer
Amazon Cognito
Amazon Comprehend
Amazon Connect
Amazon DevOps Guru
Amazon DocumentDB
Amazon EMR Serverless
Amazon ElastiCache
Amazon Elastic Container Registry
Amazon Elastic Container Service
Amazon Elastic File System
Amazon Elastic Kubernetes Service
Amazon Elastic Load Balancing
Amazon Elastic MapReduce
Amazon EventBridge
Amazon FSx
Amazon FreeRTOS
Amazon GameLift
Amazon GuardDuty
Amazon Inspector
Amazon Interactive Video Service
Amazon Kendra
Amazon Kinesis Firehose
Amazon Kinesis Video Streams
Amazon Lightsail
Amazon Location Service
Amazon MQ
Amazon Managed Grafana
Amazon Managed Service for Prometheus
Amazon Managed Streaming for Apache Kafka
Amazon Managed Workflows for Apache Airflow
Amazon MemoryDB for Redis
Amazon OpenSearch Service
Amazon Pinpoint
Amazon Quantum Ledger Database
Amazon Redshift
Amazon Relational Database Service
Amazon Route 53
Amazon SageMaker
Amazon Simple Email Service
Amazon Simple Queue Service
Amazon Transcribe
Amazon Translate
Amazon VPC Lattice
Amazon WorkMail
Amazon WorkSpaces
EC2 Image Builder