Amazon Practice Questions, Discussions & Exam Topics by our Authors
A data engineer needs to build an enterprise data catalog based on the company's Amazon S3 buckets and Amazon RDS databases. The data catalog must include storage format metadata for the data...
To meet the requirement of building an enterprise data catalog that includes storage format metadata for data in Amazon S3 buckets and Amazon RDS databases, let's evaluate each option based on effort, scalability, and automation.
A) Use an AWS Glue crawler to scan the S3 buckets and RDS databases and build a data catalog. Use data stewards to inspect the data and update the data catalog with the data format.
- Explanation: This solution involves using an AWS Glue crawler to scan data in S3 and RDS and create a data catalog. However, it relies on manual inspection and updates by data stewards to determine the data format. This introduces unnecessary operational overhead, as it requires human intervention to inspect and update the catalog after it’s created.
- Why rejected: This solution adds more manual effort by requiring data stewards to update the data format, which is inefficient and not the least effort approach. It also doesn't fully automate the process of identifying and storing the data format in the catalog.
B) Use an AWS Glue crawler to build a data catalog. Use AWS Glue crawler classifiers to recognize the format of data and store the format in the catalog.
- Explanation: AWS Glue crawlers are designed to automatically discover and catalog data stored in S3 and RDS. The crawlers use built-in classifiers to automatically detect the storage format (e.g., Parquet, CSV, JSON, etc.) and include it in the catalog. This solution offers automation and efficiency, as the crawler identifies data formats without requiring manual intervention.
- Why selected: This solution automates the process of discovering both the data and its format with minimal effort. AWS Glue is well-integrated with both S3 and RDS, and the classifiers are designed to recognize...
Author: Lucas Carter · Last updated May 21, 2026
A company analyzes data in a data lake every quarter to perform inventory assessments. A data engineer uses AWS Glue DataBrew to detect any personally identifiable formation (PII) about customers within the data. The company's privacy policy considers some custom categories of information to be PII. However, the categories are not included in standard DataBrew data quality rules.The data engineer needs to modify the ...
Let's break down each option based on the key factors:
A) Manually review the data for custom PII categories:
- Key Factors: High operational overhead, prone to human error, and not scalable.
- Why rejected: This option requires manual intervention and would be time-consuming, inefficient, and error-prone. Additionally, it's not suitable for large datasets across multiple datasets in a data lake. This does not meet the requirement of reducing operational overhead.
B) Implement custom data quality rules in DataBrew. Apply the custom rules across datasets:
- Key Factors: Leverages AWS Glue DataBrew’s data quality rules, which can be automated and reusable, reducing operational overhead.
- Why selected: AWS Glue DataBrew allows the implementation of custom data quality rules, and applying them across datasets can be automated, minimizing manual effort. This solution integrates directly into the existing workflow with minimal added complexity. It provides a low-overhead solution to scan for the custom PII categories across datasets in the data lake and fits well with the tool already being used.
C) Develop custom Python scripts to detect the custom PII categories. Call the scripts from DataBrew:
- Key Factors: Requires coding, introduces custom development, and adds overhead for maintenance and scali...
Author: MoonlitPantherX · Last updated May 21, 2026
A company receives a data file from a partner each day in an Amazon S3 bucket. The company uses a daily AWS Glue extract, transform, and load (ETL) pipeline to clean and transform each data file. The output of the ETL pipeline is written to a CSV file named Daily.csv in a second S3 bucket.Occasionally, the daily data file is empty or is missing values for required fields. When the file is missing data, the company can use the previous day=E2=80=99s CSV fil...
Let’s break down each option based on key factors like effort, scalability, and effectiveness in addressing the scenario:
A) Invoke an AWS Lambda function to check the file for missing data and to fill in missing values in required fields:
- Key Factors: This introduces additional complexity with Lambda function management, error handling, and integrating it into the pipeline. Lambda would be used for checking missing data, but there’s additional overhead in managing it and handling error cases.
- Why rejected: While Lambda could be effective for checking and fixing missing data, it introduces extra components that increase complexity and maintenance. It doesn't directly address the core issue (i.e., whether the file is complete or not) in a streamlined manner as other options can.
B) Configure the AWS Glue ETL pipeline to use AWS Glue Data Quality rules. Develop rules in Data Quality Definition Language (DQDL) to check for missing values in required fields and empty files:
- Key Factors: AWS Glue Data Quality rules can automate the validation of data completeness, checking for missing or empty fields, and rejecting invalid files. This could ensure that the ETL pipeline works only with valid data before it overwrites the previous day's CSV.
- Why selected: This is a direct and automated solution. By configuring Glue’s built-in data quality rules, the company can ensure that the data is valid before proceeding with overwriting the file. Data Quality rules also provide an easy-to-use, scalable solution with low maintenance overhead, as it’s a native feature in Glue.
C) Use AWS Glue Studio to change the code in the ETL pipeline to fill in any missing values in the required fields...
Author: RadiantPhoenixX · Last updated May 21, 2026
A marketing company uses Amazon S3 to store marketing data. The company uses versioning in some buckets. The company runs several jobs to read and load data into the buckets.To help cost-optimize its storage, the company wants to gather information about incomplete multipart uploads and outda...
Let's evaluate the options based on the requirement to gather information about incomplete multipart uploads and outdated versions in S3 buckets while minimizing operational effort:
Option A: Use AWS CLI to gather the information.
- Why rejected: Using the AWS CLI to manually gather information about incomplete multipart uploads or outdated versions involves writing custom scripts and running them periodically. While possible, this option requires manual effort and does not scale well as the company needs to automate the process. It also doesn't provide a centralized view or easy integration for tracking over time.
- Not ideal because: This option involves more operational effort, including script management, scheduling, and monitoring. It is error-prone and less efficient than automated services.
Option B: Use Amazon S3 Inventory configurations reports to gather the information.
- Why selected: S3 Inventory can be configured to provide detailed reports about the objects in a bucket, including versions and incomplete multipart uploads. This solution is easy to set up and provides comprehensive, regular reports with minimal operational overhead. The reports can be scheduled to run periodically and are automatically delivered to a designated S3 bucket.
- Why other options are rejected:
- S3 Inventory is designed specifically for gathering information about the objects in an S3 bucket, including outdated versions and incomplete multipart uploads.
- It provides a low-maintenance, automated approach for the company to receive regular reports with the ...
Author: Suresh · Last updated May 21, 2026
A company needs a solution to manage costs for an existing Amazon DynamoDB table. The company also needs to control the size of the table. The solution must not disrupt any ongoing read or write operations. The company wants to use a solution that automatically delete...
Let’s evaluate the options based on factors like ease of implementation, automation, and ongoing maintenance:
A) Use the DynamoDB TTL feature to automatically expire data based on timestamps:
- Key Factors: DynamoDB’s Time to Live (TTL) feature is designed to automatically delete expired items based on a timestamp attribute. When TTL is enabled, DynamoDB automatically removes expired data without requiring custom logic or manual intervention. TTL runs in the background and does not disrupt ongoing read or write operations.
- Why selected: This solution perfectly aligns with the company’s requirement of automatically deleting data after a certain period (1 month) with the least operational maintenance. Once TTL is configured, it automatically manages the deletion of expired items without any manual intervention, reducing complexity and maintenance effort. It’s efficient and integrates seamlessly into the existing DynamoDB table.
B) Configure a scheduled Amazon EventBridge rule to invoke an AWS Lambda function to check for data that is older than 1 month. Configure the Lambda function to delete old data:
- Key Factors: While this solution would also automate the deletion of data, it requires creating and managing an EventBridge rule, a Lambda function, and ensuring that the function efficiently scans and deletes data. This solution requires more maintenance, monitoring, and scaling as the table size grows.
- Why rejected: This solution introduces extra components (EventBridge rule and Lambda function) that require ongoing monitoring and maintenance. It also needs periodic invocation to check and delete data, which increases operational overhead compared to the simpler TTL solution.
C) Configure a stream on the DynamoDB table to invoke an AWS Lambda function. Configure the Lambda function to delete data...
Author: GlowingTiger · Last updated May 21, 2026
A company uses Amazon S3 to store data and Amazon QuickSight to create visualizations,The company has an S3 bucket in an AWS account named Hub-Account. The S3 bucket is encrypted by an AWS Key Management Service (AWS KMS) key. The company's QuickSight instance is in a separate account named BI-Account.The company updates the S3 bucket policy to grant access to the QuickSight service role. ...
Let's analyze each option based on the requirements:
A) Use the existing AWS KMS key to encrypt connections from QuickSight to the S3 bucket.
- Rejected: AWS KMS keys are used for encryption/decryption of data stored in S3, but they don't directly handle the encryption of connections. This option is irrelevant in this context because it doesn't pertain to granting cross-account access or permission management for accessing encrypted data in the S3 bucket.
B) Add the S3 bucket as a resource that the QuickSight service role can access.
- Selected: This is a necessary step. Since the S3 bucket is in a different AWS account (Hub-Account) and QuickSight operates in the BI-Account, adding the S3 bucket as a resource the QuickSight service role can access ensures that QuickSight has permission to interact with the S3 bucket. This is done via the S3 bucket policy.
C) Use AWS Resource Access Manager (AWS RAM) to share the S3 bucket with the BI-Account account.
- Rejected: AWS RAM is typically used for sharing resources such as VPCs or subnets, not S3 buckets. S3 buckets can be shared using policies or IAM roles, but AWS RAM is not applicable for sharing S3 buckets. This option does not fulfill the need for cross-account access to S3.
D) A...
Author: Sophia · Last updated May 21, 2026
A car sales company maintains data about cars that are listed for sale in an area. The company receives data about new car listings from vendors who upload the data daily as compressed files into Amazon S3. The compressed files are up to 5 KB in size. The company wants to see the most up-to-date listings as soon as the data is uploaded to Amazon S3.A data engineer must automate and orchestrate the data processing workflow of the listings to feed a dashboard. The...
Let's break down each solution to understand which one will meet the requirements most cost-effectively, focusing on scalability, automation, orchestration, and cost.
Option Analysis:
A) Use an Amazon EMR cluster to process incoming data. Use AWS Step Functions to orchestrate workflows. Use Apache Hive for one-time queries and analytical reporting. Use Amazon OpenSearch Service to bulk ingest the data into compute-optimized instances. Use OpenSearch Dashboards in OpenSearch Service for the dashboard.
- Rejected: While EMR and OpenSearch Service can handle the job, EMR clusters (especially provisioned ones) can be expensive to run continuously. Moreover, OpenSearch Service might not be the best fit for managing large datasets with frequent querying and analysis. The need for compute-optimized instances adds complexity and unnecessary cost. The use of Hive for analytical reporting is more suited for large-scale processing, which might be overkill for a small dataset (5 KB files). This option is not cost-efficient for the described needs.
B) Use a provisioned Amazon EMR cluster to process incoming data. Use AWS Step Functions to orchestrate workflows. Use Amazon Athena for one-time queries and analytical reporting. Use Amazon QuickSight for the dashboard.
- Rejected: While this option includes AWS Athena for querying and Amazon QuickSight for the dashboard, the use of a provisioned EMR cluster is not ideal for cost efficiency. Provisioned clusters in EMR are expensive to run continuously, especially given the small size of the files (5 KB). Athena is well-suited for querying, but the EMR cluster incurs unnecessary costs due to its provisioning and compute power needs for such small data files.
C) Use AWS Glue to process incoming data. Use AWS Step Functions to orchestrate workflows....
Author: NightmareDragon2025 · Last updated May 21, 2026
A company has AWS resources in multiple AWS Regions. The company has an Amazon EFS file system in each Region where the company operates. The company=E2=80=99s data science team operates within only a single Region. The data that the data science team works with must remain within the team's Region.A data engineer needs to create a single dataset by processing files that are in each of the company's Regional EFS file systems. The...
Let's analyze the options based on the requirement to create a single dataset from files in multiple AWS Regions while ensuring the data remains within the data science team's Region. The goal is to process the data using AWS Step Functions orchestrating Lambda functions with the least effort.
Option Analysis:
A) Peer the VPCs that host the EFS file systems in each Region with the VPC that is in the data science team's Region. Enable EFS file locking. Configure the Lambda functions in the data science team's Region to mount each of the Region-specific file systems. Use the Lambda functions to process the data.
- Rejected: This solution requires complex VPC peering between Regions, which can be difficult to manage and maintain, especially when it comes to networking between different Regions. Mounting EFS file systems across Regions is not ideal due to potential latency issues and the fact that EFS is designed to work within a single region. Also, enabling EFS file locking for cross-region access can add complexity and potential performance overhead. This solution is more complex and less efficient.
B) Configure each of the Regional EFS file systems to replicate data to the data science team's Region. In the data science team's Region, configure the Lambda functions to mount the replica file systems. Use the Lambda functions to process the data.
- Rejected: EFS does not natively support cross-region replication like Amazon S3. Setting up replication manually or using third-party solutions would add unnecessary complexity. Additionally, maintaining consistency across multiple regions and ensuring data synchronization could become cumbersome and inefficient. This approach introduces more operational overhead and would be difficult to manage at scale.
C) Deploy the Lambda functions to ...
Author: Ethan · Last updated May 21, 2026
A company hosts its applications on Amazon EC2 instances. The company must use SSL/TLS connections that encrypt data in transit to communicate securely with AWS infrastructure that is managed by a customer.A data engineer needs to implement a solution to simplify the generation, distribution, and rotation of digital certificates. The s...
Let's break down each option and evaluate which will meet the requirements with the least operational overhead for simplifying the generation, distribution, and rotation of digital certificates while ensuring automatic renewal and deployment of SSL/TLS certificates.
Option Analysis:
A) Store self-managed certificates on the EC2 instances.
- Rejected: While it is possible to store self-managed certificates directly on EC2 instances, this approach introduces significant operational overhead. The company would need to manually manage certificate generation, distribution, and rotation, which could become cumbersome and error-prone. There is no automated process for renewing or deploying certificates, making this option less desirable. Additionally, manually handling certificates on EC2 instances can lead to security risks if not properly managed.
B) Use AWS Certificate Manager (ACM).
- Selected: AWS Certificate Manager (ACM) is the ideal solution here. ACM simplifies the generation, distribution, and rotation of SSL/TLS certificates. It supports automated certificate renewal and can be used to deploy certificates to Amazon EC2 instances and other AWS services such as Elastic Load Balancers and CloudFront distributions. ACM automates the certificate lifecycle management, reducing operational overhead and increasing security. It integrates seamlessly with other AWS services, making it the most efficient and low-maintenance option for the described requirements.
C) Implement custom automation scripts in AWS Secrets Manager. ...
Author: Harper · Last updated May 21, 2026
A company saves customer data to an Amazon S3 bucket. The company uses server-side encryption with AWS KMS keys (SSE-KMS) to encrypt the bucket. The dataset includes personally identifiable information (PII) such as social security numbers and account details.Data that is tagged as PII must be masked before the company uses customer data for analysis. Some users must have secure access to the PII data during the pre-processing phase. The ...
Let's evaluate each option based on the requirements:
- Masking PII data before analysis
- Securing PII data during processing
- Low-maintenance solution for managing the entire engineering pipeline
Option Analysis:
A) Use AWS Glue DataBrew to perform extract, transform, and load (ETL) tasks that mask the PII data before analysis.
- Selected: AWS Glue DataBrew is a visual, no-code data preparation tool that allows you to clean and transform data efficiently, including masking PII data. It can be integrated with Amazon S3, so you can automatically perform data masking during the ETL process before the data is used for analysis. This solution is low maintenance because it automates the transformation process and handles the data masking securely.
B) Use Amazon GuardDuty to monitor access patterns for the PII data that is used in the engineering pipeline.
- Rejected: Amazon GuardDuty is a security monitoring service that detects unusual or unauthorized access patterns or malicious activity within your AWS environment. While GuardDuty is important for security, it does not directly help with masking PII data or securing it during the processing phase. Its role is more about detecting anomalies and security threats rather than ensuring compliance with data masking requirements.
C) Configure an Amazon Macie discovery job for the S3 bucket.
- Rejected: Amazon Macie is a service designed to discover and classify sensitive data (such as PII) in S3 buckets. It helps ide...
Author: Maya · Last updated May 21, 2026
A data engineer is launching an Amazon EMR cluster. The data that the data engineer needs to load into the new cluster is currently in an Amazon S3 bucket. The data engineer needs to ensure that data is encrypted both at rest and in transit.The data that is in the S3 bucket is encrypted by an AWS Key Management Service (AW...
Let's analyze each of the options to determine the best solution based on the requirements of ensuring encryption both at rest and in transit for data loaded into an Amazon EMR cluster from an Amazon S3 bucket.
Key Requirements:
- At-rest encryption: The data in the S3 bucket is encrypted using AWS Key Management Service (AWS KMS) keys.
- In-transit encryption: The data must be encrypted during transfer from Amazon S3 to the EMR cluster. This can be managed with a PEM file for Secure Socket Layer (SSL) encryption.
Evaluation of Options:
Option A:
- Create an Amazon EMR security configuration. Specify the appropriate AWS KMS key for at-rest encryption for the S3 bucket.
- Create a second security configuration specifying the Amazon S3 path of the PEM file for in-transit encryption.
- Create the EMR cluster and attach both security configurations to the cluster.
Analysis:
- You can use two different security configurations for at-rest and in-transit encryption.
- While possible, creating and attaching multiple security configurations to the same cluster is not the standard or optimal approach for encryption. AWS prefers consolidating encryption settings within a single security configuration to minimize complexity and reduce the chances of errors.
Rejection Reason: Multiple configurations could lead to unnecessary complexity, so this option is not ideal.
Option B:
- Create an Amazon EMR security configuration. Specify the appropriate AWS KMS key for local disk encryption for the S3 bucket.
- Specify the Amazon S3 path of the PEM file for in-transit encryption.
- Use the security configuration during EMR cluster creation.
Analysis:
- This option suggests local disk encryption for S3 data, which is incorrect because S3 data is already encrypted at rest using AWS KMS. Local disk encryption refers to encrypting the local storage of the EMR nodes, not the data in S3.
Rejection Reason: Misinterpretation of encryption scopes (local disk vs. S3 encryption). It doesn't fulfill the at-rest encrypti...
Author: Benjamin · Last updated May 21, 2026
A retail company is using an Amazon Redshift cluster to support real-time inventory management. The company has deployed an ML model on a real-time endpoint in Amazon SageMaker.The company wants to make real-time inventory recommendations. The company also wa...
To evaluate the options, we need to focus on the core requirements: real-time inventory recommendations and future inventory predictions. The company already has an ML model deployed on a real-time endpoint in Amazon SageMaker, and it's using Amazon Redshift for inventory management.
Key Requirements:
1. Real-time recommendations: This implies that the system needs to make predictions or provide recommendations instantly based on incoming data (e.g., inventory changes).
2. Future inventory predictions: This requires forecasting future inventory needs, likely using historical data and machine learning models.
Now, let's evaluate each option.
Option A: Use Amazon Redshift ML to generate inventory recommendations.
- Analysis:
- Amazon Redshift ML allows you to use machine learning models directly within Redshift using SQL.
- It enables users to build, train, and deploy models for predictions directly in Redshift.
- Using Redshift ML to generate real-time inventory recommendations is a direct and effective solution for the use case, especially since it's already integrated with the Redshift cluster, where the inventory data is stored.
Selected Reasoning: This is a strong option because Redshift ML can handle both real-time recommendations and predictions without needing external calls or complex integration.
Option B: Use SQL to invoke a remote SageMaker endpoint for prediction.
- Analysis:
- This approach involves using SQL within Redshift to call a remote SageMaker endpoint for real-time predictions.
- Redshift can invoke external services via SQL, which means predictions could be fetched from the SageMaker real-time endpoint.
- This is a valid solution because it allows the Redshift cluster to retrieve predictions from SageMaker, leveraging the real-time ML model deployed there. However, calling a remote endpoint can introduce latency and potential reliability concerns, especially in high-volume use cases, as it's dependent on network calls and external systems.
Selected Reasoning: This option can work but comes with additional complexity and potential latency due to invoking an external service.
Option C: Use Amazon Redshift ML to schedule regular data exports for offline model training.
- Analysis:
- Amazon Red...
Author: Samuel · Last updated May 21, 2026
A company stores CSV files in an Amazon S3 bucket. A data engineer needs to process the data in the CSV files and store the processed data in a new S3 bucket.The process needs to rename a column, remove specific columns, ignore the second row of each file, create a new column based on the values of the first row of the ...
Let's analyze the problem and each potential solution based on the requirements:
Key Requirements:
1. Rename a column.
2. Remove specific columns.
3. Ignore the second row of each file.
4. Create a new column based on values in the first row.
5. Filter by a numeric value of a column.
The goal is to process CSV files in an S3 bucket and output the processed data to a new S3 bucket. We want to meet the requirements with the least development effort.
Evaluation of Options:
Option A: Use AWS Glue Python jobs to read and transform the CSV files.
- Analysis:
- AWS Glue Python jobs allow you to write custom code in Python to transform the data.
- While this provides full flexibility to perform any transformation (e.g., renaming columns, removing columns, ignoring rows, creating new columns, and filtering), it requires significant development effort. The data engineer would need to manually code and test all transformations, especially for handling the second row and numeric filtering.
Rejection Reason: While this option is flexible, it requires more development effort than the other options, making it less suitable if the goal is to minimize development time.
Option B: Use an AWS Glue custom crawler to read and transform the CSV files.
- Analysis:
- AWS Glue crawlers are designed to infer the schema of data and create metadata for it, but they are not intended for data transformation. A crawler can detect the structure of the CSV files and catalog them, but it does not support complex transformations such as renaming columns, filtering rows, or creating new columns.
Rejection Reason: Crawlers are meant for cataloging and inferring metadata, not for performing data transformations. Thus, this solution will not meet the processing needs described in the requirements.
Option C: Use an AWS Glue workflow to build a set of jobs to crawl and transform the CSV files.
- Analysis:
- AWS Glue workflows allow you to orchestrate multiple Glue jobs...
Author: Liam123 · Last updated May 21, 2026
A company uses Amazon Redshift as its data warehouse. Data encoding is applied to the existing tables of the data warehouse. A data engineer discovers that the compression encoding applied to some of the tables is not the best fit for the data.The data engineer needs t...
Key Requirement:
The data engineer needs to improve data encoding for tables in an Amazon Redshift data warehouse where the compression encoding is sub-optimal. The goal is to identify better encoding methods and apply them to optimize storage and performance.
Evaluation of Options:
Option A: Run the ANALYZE command against the identified tables. Manually update the compression encoding of columns based on the output of the command.
- Analysis:
- The ANALYZE command in Redshift is primarily used to update statistics about the data to help the query planner make better decisions. It does not help with compression or encoding selection.
- This option would require the manual identification and update of compression encoding, which is time-consuming and not automated.
Rejection Reason: While the ANALYZE command helps with query performance, it does not optimize or recommend changes for compression encoding. This option is not the most efficient for improving compression encoding.
Option B: Run the ANALYZE COMPRESSION command against the identified tables. Manually update the compression encoding of columns based on the output of the command.
- Analysis:
- ANALYZE COMPRESSION is a specific Redshift command that analyzes the current compression encoding of columns and suggests more optimal encoding based on the data distribution and type.
- The output of this command provides recommendations for improving compression encoding, which can then be applied to optimize storage and query performance.
- Manually updating the encoding based on the output can be done, but this is still a manual process.
Selected Reasoning: This is the most appropriate solution because ANALYZE COMPRESSION provides targeted insights into the best compression encodin...
Author: Emma · Last updated May 21, 2026
The company stores a large volume of customer records in Amazon S3. To comply with regulations, the company must be able to access new customer records immediately for the first 30 days after the records are created. The company accesses records that are older than 30 days infrequently....
Key Requirements:
- Immediate access to new customer records for the first 30 days.
- Infrequent access to records older than 30 days.
- The goal is to cost-optimize S3 storage.
Evaluation of Options:
Option A: Apply a lifecycle policy to transition records to S3 Standard Infrequent-Access (S3 Standard-IA) storage after 30 days.
- Analysis:
- S3 Standard-IA is designed for infrequent access to data, offering lower cost compared to S3 Standard for data that is accessed less often.
- By applying a lifecycle policy, the company can automatically move customer records to S3 Standard-IA after 30 days, which would reduce storage costs for older records while still ensuring immediate access for the first 30 days.
- This approach balances cost-efficiency and access needs perfectly.
Selected Reasoning: This is the most cost-effective solution because it automatically optimizes storage costs based on access patterns while meeting the requirement for quick access within the first 30 days.
Option B: Use S3 Intelligent-Tiering storage.
- Analysis:
- S3 Intelligent-Tiering automatically moves data between two access tiers (frequent and infrequent) based on access patterns, without the need for a lifecycle policy.
- It is useful for datasets with unpredictable access patterns, but it incurs additional charges for monitoring and automation, which may not be necessary in this scenario where the data access is predictable (accessed immediately for 30 days, infrequent after that).
Rejection Reason: While S3 Intelligent-Tiering is useful for unpredictable access, it could be less cost-efficient for this use case, as the data access pattern is relatively predictable, and lifecycle policies with S3 Standard-IA can offer a more cost-effective solution.
Option C: Transition records to S3 Glacier Deep Archive storage after 30 days.
- Analysis:
- S3 Gl...
Author: Amira · Last updated May 21, 2026
A data engineer is using Amazon QuickSight to build a dashboard to report a company=E2=80=99s revenue in multiple AWS Regions. The data engineer wants the dashboard to display the total revenue for a Region, regardl...
To meet the requirement of displaying the total revenue for a region, regardless of the drill-down levels shown in the visual, the solution needs to focus on performing calculations that allow control over how the data is aggregated across different levels. Let's break down the options:
A) Create a table calculation:
- Table calculations are typically used to compute values based on the current view in the dashboard, such as applying calculations on the visible rows or columns. However, table calculations depend on the current drill-down context, and they are not ideal when you want to show a constant total that ignores drill-downs.
- Why rejected: This option cannot be guaranteed to provide the total revenue across regions without being affected by the drill-down level. It does not offer control over how the data should be aggregated independent of the view level.
B) Create a simple calculated field:
- Simple calculated fields are used for basic mathematical operations or logic based on the data. However, they will be affected by the current drill-down level in the dashboard, which is not what is needed here.
- Why rejected: This option only calculates based on the current data view (drill-down level), so it doesn't work for aggregating totals across regions independent of the drill-down context.
C) Create a level-aware calculation - aggregate (LAC-A) function:
- LAC-A functions allow you to perform calculations on a hig...
Author: RadiantPhoenixX · Last updated May 21, 2026
A retail company stores customer data in an Amazon S3 bucket. Some of the customer data contains personally identifiable information (PII) about customers. The company must not share PII data with business partners.A data engineer must determine whether a dataset contains PII before making obje...
To meet the requirement of determining whether a dataset contains PII before sharing it, the solution should automate the process of identifying and flagging sensitive data (PII) with minimal manual intervention. Let's analyze each option:
A) Configure the S3 bucket and S3 objects to allow access to Amazon Macie. Use automated sensitive data discovery in Macie:
- Amazon Macie is a fully managed service that helps identify, classify, and protect sensitive data, such as PII, in S3. Macie uses machine learning to detect PII across S3 objects and can automatically scan and flag content as sensitive.
- Why selected: Macie is designed specifically for this use case, providing an automated solution to identify PII with minimal manual intervention. Once configured, Macie will continuously monitor and classify data in the S3 bucket, making it the most efficient and automated option for detecting PII data in S3.
B) Configure AWS CloudTrail to monitor S3 PUT operations. Inspect the CloudTrail trails to identify operations that save PII:
- AWS CloudTrail records API calls made to AWS services, including S3 operations like PUT. However, CloudTrail only records metadata about the request (who, what, when, and where) but does not inspect the content of the objects being uploaded to S3.
- Why rejected: CloudTrail cannot help identify PII in the actual content of the files; it only logs the event data. Therefore, it does not address the core requirement of identifying sensitive data within the S3 objects themselves.
C) Create an AWS Lambda f...
Author: Oliver · Last updated May 21, 2026
A data engineer needs to create an empty copy of an existing table in Amazon Athena to perform data processing tasks. The existing table in Athe...
Let’s analyze each query option to determine the one that creates an empty copy of the existing table:
A) CREATE TABLE new_table LIKE old_table;
- Rejected: This query will create a new table with the same schema as `old_table` (i.e., same columns and data types), but it will not copy any data from the `old_table`. It only replicates the table structure, which is useful if you just want the table definition, but it doesn't specifically meet the requirement of creating an empty copy with no data, so this option is close but not ideal.
B) CREATE TABLE new_table AS SELECT FROM old_table WITH NO DATA;
- Selected: This query creates an empty table (`new_table`) with the same structure as `old_table` and no data. The `WITH NO DATA` clause ensures that the new table will have the same schema as `old_table`, but no data is copied over. This is exactly what is required in the scenario: an empty copy of the existing table.
C) CREATE TABLE new_table AS SELECT FROM old_table;
- Rejected: This query will create a new table `new_table` and copy all data from `old_ta...
Author: Rahul · Last updated May 21, 2026
A company has a data lake in Amazon S3. The company collects AWS CloudTrail logs for multiple applications. The company stores the logs in the data lake, catalogs the logs in AWS Glue, and partitions the logs based on the year. The company uses Amazon Athena to analyze the logs.Recently, customers reported that a query on one of the Athena tables d...
To troubleshoot the issue where a query in Athena is not returning data, we need to focus on possible issues related to how Athena is interacting with the data in Amazon S3, particularly with respect to partitioning and table metadata. Let's review the options:
A) Confirm that Athena is pointing to the correct Amazon S3 location.
- Why selected: The first step is to ensure that Athena is correctly configured to point to the correct S3 location where the logs are stored. If Athena is querying the wrong location (e.g., the logs have been moved or there is a misconfiguration), the query would not return any data.
- This is a straightforward check to confirm that the table's location matches the S3 path where the logs are actually stored. If there is a mismatch, no data will be available for querying.
B) Increase the query timeout duration.
- Why rejected: The issue described is that no data is being returned, not that the query is taking too long. Increasing the timeout would be useful if the query is timing out, but in this case, the problem is likely related to data availability or metadata issues, not performance. Thus, this step is not relevant for resolving the issue where no data is returned.
C) Use the MSCK REPAIR TABLE command.
- Why selected: The issue could be related to Athena not recognizing the correct partitions, especially if the logs are partitioned by year. The `MSCK REPAIR ...
Author: Deepak · Last updated May 21, 2026
A data engineer wants to orchestrate a set of extract, transform, and load (ETL) jobs that run on AWS. The ETL jobs contain tasks that must run Apache Spark jobs on Amazon EMR, make API calls to Salesforce, and load data into Amazon Redshift.The ETL jobs need to handle failures and retr...
Let's analyze each service option based on the requirements:
A) Amazon Managed Workflows for Apache Airflow (Amazon MWAA)
- Rejected: Amazon MWAA is a managed service that runs Apache Airflow workflows. While Airflow is great for orchestrating ETL jobs and can handle dependencies, retries, and failure management, it requires more setup and management of the environment. Additionally, while MWAA can run Spark jobs on EMR, it may not directly integrate as smoothly with all of the AWS services (like Salesforce or Redshift) compared to other services. It is a good option for more complex workflows, but the other options may be more directly suited for the given task, especially with Python integration.
B) AWS Step Functions
- Selected: AWS Step Functions is ideal for orchestrating tasks that require integration with various AWS services. Step Functions allows you to define workflows with state machines, supporting task failures and retries, and integrates directly with Amazon EMR, Amazon Redshift, and other AWS services. It also provides native support for invoking Lambda functions (which can be used to make API calls to external services like Salesforce). Python is also fully supported, making it a great choice for the ETL orchestration requirements. Step Functions is designed specifically to handle workflows with retries, error handling, and integration across AWS services, making it the best option for this scenario.
C) AWS Glue
- Rejected: AWS Glue is a fully managed ETL service, and while it is designed for ETL jobs, it is p...
Author: Zain · Last updated May 21, 2026
A data engineer maintains custom Python scripts that perform a data formatting process that many AWS Lambda functions use. When the data engineer needs to modify the Python scripts, the data engineer must manually update all the Lambda functions.The data...
To determine which solution best meets the requirement of updating multiple AWS Lambda functions with minimal manual intervention, let's analyze each option based on factors like ease of maintenance, scalability, and automation.
Option A: Store the custom Python scripts in a shared Amazon S3 bucket. Store a pointer to the custom scripts in the execution context object.
- Pros: Storing scripts in Amazon S3 is cost-effective and centralized. It allows updating the scripts in one location and referencing them dynamically within the Lambda functions.
- Cons: Using the execution context object to store the pointer is not ideal, as the context object is typically tied to individual invocations of the function and not meant for persistent script storage. The pointer will be limited to the invocation lifecycle, and updating it requires frequent code changes.
- Use case: This approach may be useful if the Lambda functions fetch scripts dynamically from S3. However, relying on the execution context for this pointer doesn’t offer an efficient, scalable solution for maintenance.
Option B: Package the custom Python scripts into Lambda layers. Apply the Lambda layers to the Lambda functions.
- Pros: Lambda Layers are a great way to share code across multiple Lambda functions. By using Lambda layers, you can centralize the management of custom scripts. Updating the layer automatically reflects in all Lambda functions that use that layer, reducing manual intervention.
- Cons: Requires versioning of Lambda layers, but the overhead is minimal compared to manually updating each Lambda function.
- Use case: This is the most efficient and scalable solution for reusing code across multiple Lambda functions, especially for shared custom scripts like those written in Python. It is ideal for your scenario, as you want to modify the scripts centrally and have the changes reflected in all functions without manually updating each one.
Opt...
Author: Olivia · Last updated May 21, 2026
A company stores customer data in an Amazon S3 bucket. Multiple teams in the company want to use the customer data for downstream analysis. The company needs to ensure that the teams do not have access to personally identifiable informati...
Let's evaluate each option based on the requirement to ensure that teams do not have access to personally identifiable information (PII) while minimizing operational overhead:
A) Use Amazon Macie to create and run a sensitive data discovery job to detect and remove PII.
- Rejected: Amazon Macie is a great service for discovering and classifying PII in Amazon S3. However, Macie primarily focuses on detection rather than active removal or filtering of PII during data access. While it can be used to discover and alert on PII, the operational overhead of implementing it alongside other services for data access and modification is higher than other options. It doesn't provide an easy way to automatically modify or prevent access to PII once detected.
B) Use S3 Object Lambda to access the data, and use Amazon Comprehend to detect and remove PII.
- Selected: This option allows you to use S3 Object Lambda, which can dynamically modify the data as it's accessed. You can use Amazon Comprehend to detect and remove PII from the data on-the-fly as it is retrieved by downstream teams. This solution ensures that no PII is exposed to the teams, and it is highly scalable with minimal operational overhead, as it automatically processes data without requiring manual intervention. This solution meets the requirement for handling sensitive data with low overhead by processing data only when accessed.
C) Use Amazon Data Firehose and Amazon Comprehend to detect and remove PII.
- Rejected: Amazon Kinesis Data Firehose is primarily used for real-time streaming data processing and delivery to destinations such as ...
Author: Suresh · Last updated May 21, 2026
A company stores its processed data in an S3 bucket. The company has a strict data access policy. The company uses IAM roles to grant teams within the company different levels of access to the S3 bucket.The company wants to receive notifications when a user violates the data access policy....
Let's analyze each option to determine the best solution for tracking data access violations and including the username of the violator in the notifications:
A) Use AWS Config rules to detect violations of the data access policy. Set up compliance alarms.
- Rejected: AWS Config is typically used for assessing, auditing, and evaluating configurations of AWS resources to ensure compliance with best practices or internal policies. It doesn't directly track individual actions or events like S3 access attempts. AWS Config would not capture specific access violations to an S3 bucket or provide detailed information about the user violating the policy. Thus, it is not well-suited for this use case, where specific user actions are required.
B) Use Amazon CloudWatch metrics to gather object-level metrics. Set up CloudWatch alarms.
- Rejected: Amazon CloudWatch metrics generally track aggregated data at the service or resource level, such as CPU usage, request count, or error rates. It doesn't natively support tracking object-level access violations or user-specific actions on S3. Additionally, it cannot capture the username of a specific user violating the policy, which is a critical requirement for this use case.
C) Use AWS CloudTrail to track object-level events for the S3 bucket. Forward events to Amazon CloudWatch to set up CloudWatch alarms.
- Selected: AWS CloudTrail is the most appropriate solution. CloudTrail records all API requests made to AWS services, including S3. By enabling object-level logging for S3, CloudTrail will log events such as `GetObject`, `PutObject`, and other S3 actions, along with the associated IAM...
Author: RadiantJaguar56 · Last updated May 21, 2026
A company needs to load customer data that comes from a third party into an Amazon Redshift data warehouse. The company stores order data and product data in the same data warehouse. The company wants to use the combined dataset to identify potential new customers.A data engineer notices that one of the fields in the source...
Let's evaluate each option to determine the best approach for loading JSON data into Amazon Redshift with the least effort:
A) Use the SUPER data type to store the data in the Amazon Redshift table.
- Selected: The `SUPER` data type in Amazon Redshift is specifically designed for semi-structured data, such as JSON. Using this data type, you can directly load JSON data into Redshift tables without needing to flatten the data beforehand. This method allows Redshift to store and query JSON data in its native format, making it easy to perform complex queries on the data later using Redshift's `Redshift Spectrum` or `SQL` queries. This option minimizes the need for data transformations or complex ETL processes, reducing operational overhead while maintaining query flexibility. Additionally, this approach integrates seamlessly with Redshift's capabilities.
B) Use AWS Glue to flatten the JSON data and ingest it into the Amazon Redshift table.
- Rejected: While AWS Glue is a powerful ETL tool that can flatten JSON data and load it into Redshift, this approach involves more operational overhead compared to using the `SUPER` data type. Flattening JSON data with Glue requires configuring a Glue job, and the data must be transformed into a tabular format, which may not be necessary if the goal is to simply store and query the JSON data in its original structure. This approach would be useful if complex transformations were required, but it adds unnecessary complexity for this scenario.
C) Use Amazon S3 to store the JSON data. Use Amazon Athena to query the data.
- Rejected: This option suggests storing the JSON data in S3 and querying it with Athena, which ...
Author: Zara1234 · Last updated May 21, 2026
A company wants to analyze sales records that the company stores in a MySQL database. The company wants to correlate the records with sales opportunities identified by Salesforce.The company receives 2 GB of sales records every day. The company has 100 GB of identified sales opportunities. A data engineer needs to develop a process that will analyze and correlate...
To meet the requirements of analyzing and correlating sales records from a MySQL database and Salesforce, we need to consider the scale of the data, the frequency of the process (once per night), and the level of operational overhead that each solution introduces. Let's evaluate each option.
Option A: Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to fetch both datasets. Use AWS Lambda functions to correlate the datasets. Use AWS Step Functions to orchestrate the process.
- Pros: Apache Airflow is a powerful workflow orchestration service that can be used to automate ETL tasks. AWS Lambda functions are lightweight and flexible, and AWS Step Functions can handle orchestration and error management.
- Cons: Using Lambda functions for data correlation can be cumbersome, especially given the amount of data (2 GB of sales records per day). AWS Lambda has memory and time limits, which may require additional configurations or splitting of the data into smaller chunks, increasing the complexity. Also, Apache Airflow introduces management overhead.
- Use case: This option might be viable for smaller-scale, less complex workflows. However, for a recurring nightly process dealing with data from two sources, it introduces too much complexity and may not scale well with large datasets, especially with Lambda limitations.
Option B: Use Amazon AppFlow to fetch sales opportunities from Salesforce. Use AWS Glue to fetch sales records from the MySQL database. Correlate the sales records with the sales opportunities. Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate the process.
- Pros: Amazon AppFlow simplifies integration with Salesforce, allowing easy data transfer to AWS. AWS Glue can fetch and process data from MySQL, and Apache Airflow provides workflow orchestration. This setup would allow seamless integration of data and correlation.
- Cons: Similar to Option A, using Apache Airflow for orchestration and adding AWS Glue for the extraction process introduces overhead in terms of setup, configuration, and ongoing management. While this solution is scalable, it might be over-engineered for the task and could introduce unnecessary complexity.
- Use case: While it addresses integration and correlation effectively, the overhead of using Apache Airflow and AWS Glue might not be justified for this specific use case.
Option C: Use Amazon AppFlow to fetch sales opportunities from Sales...
Author: Max · Last updated May 21, 2026
A company stores server logs in an Amazon S3 bucket. The company needs to keep the logs for 1 year. The logs are not required after 1 year.A data engineer needs a solution to automatically delete logs that are older...
To determine the best solution with the least operational overhead, let's evaluate each option based on key factors like automation, simplicity, and operational efficiency:
A) Define an S3 Lifecycle configuration to delete the logs after 1 year
- Why it's a good option: Amazon S3 Lifecycle configurations provide an automated, fully managed solution to expire objects after a set period (e.g., 1 year). You don’t need to manage servers or code, making it a highly efficient option in terms of both operational overhead and automation.
- Key factors: Simple to implement, fully automated, and managed by AWS with no additional infrastructure required.
- Why other options are rejected: This solution is very straightforward and requires no manual intervention after the setup. It scales well without requiring additional resources.
B) Create an AWS Lambda function to delete the logs after 1 year
- Why it's a bad option: While Lambda can be used to delete objects, it requires additional setup and ongoing management. The Lambda function would need to be triggered on a schedule, either by an AWS CloudWatch event or by invoking the function periodically. This adds more complexity compared to the S3 Lifecycle approach.
- Key factors: Involves writing and maintaining custom code, scheduling CloudWatch events, and monitoring Lambda executions. It introduces more complexity than necessary, which increases operational overhead.
- Why other options are rejected: It’s more complex than the S3 Lifecycle option and can incur costs depending on the number of invocations.
C) Schedule a cron job on an Amazon EC2 instance to delete the...
Author: Sara · Last updated May 21, 2026
A company is designing a serverless data processing workflow in AWS Step Functions that involves multiple steps. The processing workflow ingests data from an external API, transforms the data by using multiple AWS Lambda functions, and loads the transformed data into Amazon DynamoDB.The company needs the workflow to perfor...
In this case, the company needs to design a workflow that involves conditional steps based on the content of the incoming data. Let's evaluate the options to determine which is the best fit:
A) Parallel
- Advantages:
- Parallel execution: The Parallel state can run multiple branches of steps simultaneously, which is useful when you want to perform independent operations concurrently.
- Disadvantages:
- Not conditional: The Parallel state does not allow for decision-making based on data. It's designed for parallelism and not for branching based on content.
- Use case: Useful for running independent tasks concurrently but does not fit when you need branching logic based on input data.
- Why this is not ideal: Since the company needs the workflow to perform specific steps based on incoming data (conditional logic), the Parallel state does not meet the requirement of conditional branching.
B) Choice
- Advantages:
- Conditional branching: The Choice state allows for branching based on the content of the data. It’s ideal for making decisions in the workflow and directing the execution path based on conditions.
- Flexible: You can define different conditions based on the data (e.g., values, expressions, or paths), making it a perfect fit when you need to perform specific steps depending on the incoming data.
- Why this is ideal: The Choice state is specifically designed for the scenario where the workflow must take different paths depending on conditions within the incoming data. This matches the requirement for conditional logic based on data content.
C) Task
- Advantages:
- Execution of tasks: The Task state is used to invoke a service or function, such as calling AW...
Author: Rahul · Last updated May 21, 2026
A company has an Amazon Redshift data warehouse that users access by using a variety of IAM roles. More than 100 users access the data warehouse every day.The company wants to control user access to the objects based on each user'...
To control user access based on job roles, permissions, and the sensitivity of the data, let’s evaluate each of the options available for Amazon Redshift:
A) Use the role-based access control (RBAC) feature of Amazon Redshift
- Advantages:
- Role-based management: RBAC allows you to assign users to specific IAM roles that define their access to various resources in Amazon Redshift. It’s a common method for managing access in a multi-user environment.
- Granular control: You can define different permissions at the database level, schema level, and even the object level (tables, views) based on the user's job role.
- Scalability: With more than 100 users, managing access through RBAC is scalable because it’s based on assigning roles rather than managing individual user permissions.
- Why this is ideal: RBAC is the most straightforward solution for managing user access based on job roles and responsibilities. It allows you to control access efficiently by grouping users into roles and assigning permissions to those roles. This will work well for the company's needs of controlling access based on job roles and permissions.
B) Use the row-level security (RLS) feature of Amazon Redshift
- Advantages:
- Data filtering: RLS allows you to filter the data returned to users based on their specific attributes, such as department, region, or any other field that determines what data they should see. This helps secure sensitive data by ensuring that users only see rows relevant to them.
- Disadvantages:
- Not the primary solution for role management: While RLS is great for filtering data based on specific user attributes, it does not help with managing access to objects themselves, like tables or schemas. It focuses on controlling access to the data within a table.
- Limited role management: It’s not designed to handle permissions based on roles or job responsibilities, which is a broader aspect of managing user access.
- Why this is not ideal: While RLS can be useful for filtering data based on user attributes, it doesn't address the broader concern of controlling access to objects in Redshift (e.g., tables, schemas) based on user roles. RBAC provides more comprehensive control for this use case.
C) Use the column-level security (CLS) feature of Amazon Redshift
- Advantages:
- Fine-grained ...
Author: Andrew · Last updated May 21, 2026
A company uses Amazon DataZone as a data governance and business catalog solution. The company stores data in an Amazon S3 data lake. The company uses AWS Glue with an AWS Glue Data Catalog.A data engineer needs to publish AWS...
To determine the best option for publishing AWS Glue Data Quality scores to the Amazon DataZone portal, let's analyze each option carefully.
Option A:
- Create a data quality ruleset with Data Quality Definition language (DQDL) rules that apply to a specific AWS Glue table.
- Schedule the ruleset to run daily.
- Configure the Amazon DataZone project to have an Amazon Redshift data source.
- Enable the data quality configuration for the data source.
Reasoning:
- The solution proposes creating DQDL rules for AWS Glue tables, which is a valid approach to defining data quality rules.
- However, the data source in the Amazon DataZone project is specified as Amazon Redshift, not AWS Glue. This would mean that the data quality scores from AWS Glue would not be directly available to the DataZone portal through the Redshift configuration.
Why rejected:
- The mismatch between the data source (Amazon Redshift) and AWS Glue makes this solution unsuitable.
Option B:
- Configure AWS Glue ETL jobs to use an Evaluate Data Quality transform.
- Define a data quality ruleset inside the jobs.
- Configure the Amazon DataZone project to have an AWS Glue data source.
- Enable the data quality configuration for the data source.
Reasoning:
- Using the Evaluate Data Quality transform within AWS Glue ETL jobs allows data quality checks to be performed within the ETL process.
- Defining the data quality ruleset within the Glue job and configuring Amazon DataZone with an AWS Glue data source aligns well with the goal of integrating data quality metrics into DataZone.
Why rejected:
- This option is plausible and practical because AWS Glue integrates directly with DataZone through the AWS Glue data source. However, using ETL jobs to apply the data quality rules might not be the most efficient method compared to having a more centralized data quality process with DQDL rules.
Option C:
- Create a data quality ruleset with Data Quality Definition language (DQ...
Author: Sofia2021 · Last updated May 21, 2026
A company has a data warehouse in Amazon Redshift. To comply with security regulations, the company needs to log and store all user activities and connection activit...
To determine the best solution for logging and storing all user and connection activities for an Amazon Redshift data warehouse, let's analyze each option:
Option A:
- Create an Amazon S3 bucket. Enable logging for the Amazon Redshift cluster. Specify the S3 bucket in the logging configuration to store the logs.
Reasoning:
- Amazon S3 is a highly scalable and cost-effective storage solution. Amazon Redshift supports storing audit logs in an S3 bucket, which makes this a natural and efficient option.
- Redshift can be configured to store logs (including user and connection activity logs) in an S3 bucket by enabling the logging feature.
- Logs stored in S3 are easy to manage, can be retained for compliance purposes, and can be queried or analyzed later.
- S3 is a suitable option because it is durable and designed for large-scale data storage.
Why selected:
- This solution is ideal as it is fully supported by Redshift and provides a cost-effective, scalable, and reliable method for storing logs.
- It also complies with security regulations by allowing the logs to be securely stored in S3, where they can be retained for as long as necessary.
Option B:
- Create an Amazon Elastic File System (Amazon EFS) file system. Enable logging for the Amazon Redshift cluster. Write logs to the EFS file system.
Reasoning:
- Amazon EFS is a scalable file storage solution, but it is not a native or optimal choice for Redshift logs.
- EFS provides shared file storage that can be mounted by multiple EC2 instances, but it is generally used for different types of workloads that require file system access rather than logging.
- Storing logs in EFS could lead to unnecessary complexity and higher cost compared to Amazon S3, as EFS is more suitable for file systems accessed by EC2 instances and not for simple log storage.
Why rejected:
- Amazon S3 is a more appropriate and cost-efficient storage solution f...
Author: Zain · Last updated May 21, 2026
A company wants to migrate a data warehouse from Teradata to Amazon Redshift.Which solution will meet this ...
To determine the best solution for migrating a data warehouse from Teradata to Amazon Redshift with the least operational effort, let's evaluate each option.
Option A:
- Use AWS Database Migration Service (AWS DMS) Schema Conversion to migrate the schema. Use AWS DMS to migrate the data.
Reasoning:
- AWS DMS (Database Migration Service) supports migrating data between different databases, including Teradata to Amazon Redshift. However, AWS DMS does not directly handle schema conversion for a complex source database like Teradata.
- Schema conversion, especially for complex databases like Teradata, requires more specialized tools for mapping the schema, transforming objects, and ensuring compatibility between the source and target databases.
- While AWS DMS can migrate the data, it cannot fully convert the schema, which is a significant part of the migration process.
Why rejected:
- AWS DMS alone will not be sufficient to handle the schema conversion, particularly for complex data warehouses like Teradata. Additional steps or tools would still be necessary.
Option B:
- Use the AWS Schema Conversion Tool (AWS SCT) to migrate the schema. Use AWS Database Migration Service (AWS DMS) to migrate the data.
Reasoning:
- AWS SCT is a powerful tool designed specifically for schema conversion. It can analyze the Teradata schema and convert it into a format compatible with Amazon Redshift, making it ideal for handling the complex schema migration from Teradata.
- After using SCT for schema conversion, AWS DMS can be used to efficiently migrate the actual data from Teradata to Redshift. DMS supports incremental data replication, ensuring minimal downtime during the migration.
- This approach minimizes the operational effort by automating the schema conversion and data migration processes with specialized tools.
Why selected:
- This solution is the most efficient because it uses AWS SCT for schema conversion, which is specifically built for this purpose, and AWS DMS for data migration, which is designed to move large datasets efficiently.
- This solution p...
Author: Noah Williams · Last updated May 21, 2026
A company uses a variety of AWS and third-party data stores. The company wants to consolidate all the data into a central data warehouse to perform analytics. Users need fast response times for analytics queries.The company uses Amazon QuickSight in direct query mode to visualize the data. Users normally run querie...
To determine the best solution for consolidating data into a central data warehouse with fast response times for analytics queries and minimal operational overhead, let's evaluate each option:
Option A:
- Use Amazon Redshift Serverless to load all the data into Amazon Redshift managed storage (RMS).
Reasoning:
- Amazon Redshift Serverless is designed to automatically scale based on demand, which is perfect for handling unpredictable spikes in query activity.
- It simplifies the operational overhead by automatically managing resources like compute and storage, so you don’t need to worry about provisioning or managing clusters.
- Redshift Serverless can be used in direct query mode with Amazon QuickSight, making it easy for users to run analytics queries on the consolidated data.
- Managed storage (RMS) in Redshift is designed to provide fast and efficient querying, so response times are typically good for analytics workloads.
Why selected:
- Redshift Serverless is the most suitable option for this scenario because it provides elasticity to handle unpredictable spikes in query volume without needing manual intervention. It also minimizes the operational overhead by automatically scaling and managing resources.
- This option offers fast response times and integrates smoothly with Amazon QuickSight.
Option B:
- Use Amazon Athena to load all the data into Amazon S3 in Apache Parquet format.
Reasoning:
- Amazon Athena allows users to query data directly in Amazon S3, typically using serverless querying. However, it is not designed to function as a central data warehouse, especially when dealing with large-scale, complex analytics workloads.
- While Athena is good for querying large datasets in S3, it may not provide the same query performance as a fully managed data warehouse like Redshift for high-performance analytics.
- Athena is also more focused on ad hoc querying, so it may not be the best option when fast response times for complex analytics queries are required, particularly during periods of heavy usage.
Why rejected:
- Athena is not designed to be used as a centralized data warehouse for sustained high-performance analytics, and it may not meet the requirements for fast response times during unpredictable spikes in query load. Using ...
Author: Ava · Last updated May 21, 2026
A company is planning to create a service that requires encryption in transit. The traffic must not be decrypted between the client and the backend of the service. The company will implement the service by using the gRPC protocol over TCP port 443. The service will scale up to thousands of simultaneous connections. The backend of the service will be hosted on an Amazon Elastic Kubernetes Service (Amazon EKS) duster with the Kubernetes Cluster Autosc...
To determine the appropriate solution, let's break down the requirements and evaluate each option based on key factors such as encryption in transit, mutual TLS (mTLS), backend scalability, and Kubernetes service management.
Key Requirements:
1. Encryption in transit: Traffic must remain encrypted end-to-end, with no decryption between the client and the backend.
2. Mutual TLS (mTLS) authentication: Both the client and backend need to authenticate each other using mTLS.
3. Scalability: The backend service needs to scale automatically with the Kubernetes Cluster Autoscaler and Horizontal Pod Autoscaler.
4. TCP traffic on port 443: The service is gRPC-based, which typically uses HTTP/2 and requires TCP-based communication over port 443.
Evaluation of the options:
A) Install the AWS Load Balancer Controller for Kubernetes. Using that controller, configure a Network Load Balancer with a TCP listener on port 443 to forward traffic to the IP addresses of the backend service Pods.
- Why it works: The Network Load Balancer (NLB) operates at the transport layer (Layer 4), which means it will forward raw TCP traffic without modifying it. This is crucial for gRPC-based services, as they require raw TCP to function correctly. Since the service uses mutual TLS (mTLS), the encryption will not be decrypted between the client and the backend; it stays encrypted across the connection, fulfilling the encryption in transit requirement.
- Scalability: NLB automatically scales and integrates well with Kubernetes and the Cluster Autoscaler.
- Why it is preferred: The key advantage here is that the NLB does not decrypt traffic, making it ideal for scenarios where encryption in transit is paramount. The TCP listener ensures that the gRPC protocol is supported.
B) Install the AWS Load Balancer Controller for Kubernetes. Using that controller, configure an Application Load Balancer with an HTTPS listener on port 443 to forward traffic to the IP addresses of the backend service Pods.
- Why it doesn’t work: The Application Load Balancer (ALB) operates at the HTTP/HTTPS layer (Layer 7), and although it supports HTTPS, it would decrypt th...
Author: Elijah · Last updated May 16, 2026
A company is deploying a new application in the AWS Cloud. The company wants a highly available web server that will sit behind an Elastic Load Balancer. The load balancer will route requests to multiple target groups based on the URL in the request. All traffic must use HTTPS. TLS processing must be offloaded to the load balancer. The web se...
Key Requirements:
1. Highly available web server: The application needs to be highly available, meaning it should be able to handle failures in one or more components without downtime.
2. Elastic Load Balancer: Traffic must be routed to multiple target groups based on the request.
3. HTTPS traffic: All traffic must use HTTPS, and TLS processing must be offloaded to the load balancer.
4. Preserving the user's IP address: The web server must be able to read the client's IP address in order to keep accurate security logs.
Evaluation of the options:
A) Deploy an Application Load Balancer with an HTTPS listener. Use path-based routing rules to forward the traffic to the correct target group. Include the X-Forwarded-For request header with traffic to the targets.
- Why it works:
- HTTPS traffic: An Application Load Balancer (ALB) with an HTTPS listener is a suitable choice for handling HTTPS traffic, as TLS offloading can be done at the ALB.
- Routing: Path-based routing can be used effectively with the ALB to forward traffic to different target groups based on the URL path.
- X-Forwarded-For header: The ALB automatically includes the `X-Forwarded-For` header, which contains the original client IP address. This allows the backend web server to retrieve the user's IP address, even though the ALB terminates the TLS connection.
- Why it is preferred:
- TLS offloading: Since the TLS connection is terminated at the ALB, the backend does not need to handle encryption.
- Preserving IP address: The `X-Forwarded-For` header includes the user's IP address, which meets the requirement of logging accurate client information.
B) Deploy an Application Load Balancer with an HTTPS listener for each domain. Use host-based routing rules to forward the traffic to the correct target group for each domain. Include the X-Forwarded-For request header with traffic to the targets.
- Why it works:
- HTTPS traffic: The ALB supports TLS offloading, so HTTPS traffic is supported.
- Routing: Host-based routing can be used to forward traffic to the correct target group for each domain.
- X-Forwarded-For header: Like Option A, the ALB will forward the original client IP address in the `X-Forwarded-For` header.
- Why it might be overcomplicated:
- While host-based routing is a valid option, it may be more complex than path-based routing if there are fewer domains involved. For a single application with multiple paths, path-based routing would be simpler to implement than creating separate listeners for each domain.
- Why rejected:
- Extra complexity: This option is only necessary if multiple domains need to be handled. For a simpler setup, Option A...
Author: Deepak · Last updated May 16, 2026
A company has developed an application on AWS that will track inventory levels of vending machines and initiate the restocking process automatically. The company plans to integrate this application with vending machines and deploy the vending machines in several markets around the world. The application resides in a VPC in the us-east-1 Region. The application consists of an Amazon Elastic Container Service (Amazon ECS) cluster behind an Application Load Balancer (ALB). The communication from the vending machines to the application happens over HTTPS.
The company is planning to use an AWS Global Accelerator accelerator and configure sta...
Key Requirements:
1. Application accessibility: The application must only be accessible via the AWS Global Accelerator, not directly through the internet or the ALB's public endpoint.
2. Security: The ALB must only accept traffic from the Global Accelerator, not from the internet at large.
3. Global access: The application is deployed in the us-east-1 region, but it should be accessible globally using static IP addresses from the AWS Global Accelerator.
4. VPC configuration: The ALB needs to be properly configured within the VPC and secured using security groups.
Evaluation of the options:
A) Configure the ALB in a private subnet of the VPC. Attach an internet gateway without adding routes in the subnet route tables to point to the internet gateway. Configure the accelerator with endpoint groups that include the ALB endpoint. Configure the ALB's security group to only allow inbound traffic from the internet on the ALB listener port.
- Why this doesn’t work:
- Internet gateway: The presence of an internet gateway but without routes to it is contradictory. The internet gateway allows internet access, but since there are no routes to it, no traffic would actually flow through the internet gateway, making the internet gateway configuration useless.
- Security group misconfiguration: The ALB security group is configured to allow inbound traffic from the internet, which would open the ALB to unwanted access. This violates the requirement to restrict access only to traffic from the Global Accelerator.
- Why rejected: The internet gateway setup is incorrect, and the ALB's security group is too permissive.
B) Configure the ALB in a private subnet of the VPC. Configure the accelerator with endpoint groups that include the ALB endpoint. Configure the ALB's security group to only allow inbound traffic from the internet on the ALB listener port.
- Why this doesn’t work:
- Security group misconfiguration: The ALB's security group is still allowing inbound traffic from the internet, which contradicts the requirement that the application should only be accessible via the Global Accelerator.
- Why rejected: The ALB security group is misconfigured by allowing unrestricted access from the internet.
C) Configure the ALB in a public subnet of the VPC. Attach an internet gateway. Add routes in the subnet route tables to point to the internet gateway. Configure the acc...
Author: Emma · Last updated May 16, 2026
A global delivery company is modernizing its fleet management system. The company has several business units. Each business unit designs and maintains applications that are hosted in its own AWS account in separate application VPCs in the same AWS Region. Each business unit's applications are designed to get data from a central shared services VPC.
The company wants the network connectivity architecture to provide granular security controls. The arc...
Key Requirements:
1. Granular security controls: The company needs a network architecture that enables fine-grained security controls over the connections between VPCs.
2. Scalability: As more business units are added, the solution must scale without major rework.
3. Central shared services VPC: All applications across different business units need to access data from a central shared services VPC.
4. Isolation between VPCs: Each business unit's application VPC needs to be isolated, with the ability to control which VPCs can communicate with each other and with the central shared services VPC.
Evaluation of Options:
A) Create a central transit gateway. Create a VPC attachment to each application VPC. Provide full mesh connectivity between all the VPCs by using the transit gateway.
- Why it works:
- Centralized management: A transit gateway is designed to connect multiple VPCs in a scalable manner. It allows VPCs to communicate with each other via a central hub.
- Granular security: Security can be controlled through route tables and access control lists (ACLs) for the transit gateway. You can specify which VPCs can communicate with which others.
- Scalability: This solution scales well as more business units are added. New VPCs can be connected to the transit gateway without complex reconfiguration.
- Why it is preferred:
- The transit gateway simplifies the network architecture by centralizing routing between VPCs.
- Granular security controls can be implemented through the transit gateway’s route tables, allowing for very specific control over which VPCs can access the shared services VPC and which others can connect.
- This solution also minimizes the number of connections compared to VPC peering, as you only need to manage a few attachments (one for each VPC) rather than a potentially large number of peerings.
- Best use case: This solution is ideal for managing communication between many VPCs, offering scalability, and centralized control over the network.
B) Create VPC peering connections between the central shared services VPC and each application VPC in each business unit's AWS account.
- Why it doesn’t work:
- Scaling issues: VPC peering becomes difficult to manage as the number of VPCs increases. You would need to create and manage a separate peering connection for each pair of VPCs (central shared services VPC with each application VPC), which would result in a full mesh of peering connections as the number of VPCs grows.
- Limited security controls: With VPC peering, you have limited control over routing, and fine-grained security between VPCs is harder to enforce. It’s more difficult to isolate traffic between different business units while allowing communication with the shared services VPC.
- Why rejected:
- Limited scalability: As more business units are added, the number of VPC peer...
Author: Daniel · Last updated May 16, 2026
A company uses a 4 Gbps AWS Direct Connect dedicated connection with a link aggregation group (LAG) bundle to connect to five VPCs that are deployed in the us-east-1 Region. Each VPC serves a different business unit and uses its own private VIF for connectivity to the on-premises environment. Users are reporting slowness when they access resources that are hosted on AWS.
A network engineer finds that there are sudden increases in throughput and that the Direct Connect connection becomes saturated at the same time for about an hour each business day....
Key Requirements:
1. Identifying the business unit causing slowness: The company needs to determine which business unit (via VPC) is causing the sudden increase in throughput and saturation of the Direct Connect link.
2. Resolve the slowness: Once the culprit VPC is identified, the company wants to implement a solution that will resolve the throughput saturation and improve performance.
Evaluation of the Options:
A) Review the Amazon CloudWatch metrics for VirtualInterfaceBpsEgress and VirtualInterfaceBpsIngress to determine which VIF is sending the highest throughput during the period in which slowness is observed. Create a new 10 Gbps dedicated connection. Shift traffic from the existing dedicated connection to the new dedicated connection.
- Why it works:
- CloudWatch metrics: Monitoring the `VirtualInterfaceBpsEgress` and `VirtualInterfaceBpsIngress` metrics for each private virtual interface (VIF) will help identify which VIF is generating the highest throughput and causing saturation during the identified time periods.
- Solution: If one business unit (via VPC) is determined to be the cause of the issue, adding a new 10 Gbps dedicated connection would allow the company to offload some traffic, helping to alleviate the congestion.
- Why it’s rejected:
- Overkill: Adding a new 10 Gbps dedicated connection may not be necessary unless there is a significant and persistent increase in traffic. This option would be more beneficial if the bandwidth of the current connection is insufficient for the overall throughput needs, which isn’t directly proven yet.
- Cost and Complexity: Setting up a new 10 Gbps dedicated connection may be expensive and might introduce unnecessary complexity if a simpler solution could resolve the issue.
B) Review the Amazon CloudWatch metrics for VirtualInterfaceBpsEgress and VirtualInterfaceBpsIngress to determine which VIF is sending the highest throughput during the period in which slowness is observed. Upgrade the bandwidth of the existing dedicated connection to 10 Gbps.
- Why it works:
- CloudWatch metrics: Again, monitoring the `VirtualInterfaceBpsEgress` and `VirtualInterfaceBpsIngress` metrics can help identify which VIF is causing the traffic spikes.
- Upgrade solution: Upgrading the bandwidth of the existing dedicated connection to 10 Gbps would potentially resolve the congestion if the throughput requirements are high.
- Why it’s a good option:
- Targeted upgrade: Upgrading the existing connection to 10 Gbps would be a more efficient solution than creating a new connection, assuming the current connection is close to its throughput limit and the need for more bandwidth is confirmed.
- Cost-effective: It avoids the need for a completely new dedicated connection while addressing the performance issue by increasing the capacity of the existing connection.
- Why it might still be overkill:
- Bandwidth upgrade may not be necessary: If the root cause is occasional spikes rather than sustained high throughput, the problem might be better handled by identifying the traffic pa...
Author: Siddharth · Last updated May 16, 2026
A software-as-a-service (SaaS) provider hosts its solution on Amazon EC2 instances within a VPC in the AWS Cloud. All of the provider's customers also have their environments in the AWS Cloud.
A recent design meeting revealed that the customers have IP address overlap with the provider's AWS deployment. The customers have stated that they will not share their internal IP addresses and that th...
To address the problem of IP address overlap between the provider’s and customers’ VPCs in AWS while ensuring that the customers do not connect over the internet, the solution must focus on private network connectivity and avoiding conflicts with internal IPs. The steps must also be secure and scalable for a Software-as-a-Service (SaaS) provider hosting on AWS. Let's break down each option:
A) Deploy the SaaS service endpoint behind a Network Load Balancer.
- Explanation: A Network Load Balancer (NLB) operates at the network layer (Layer 4) and supports handling TCP traffic efficiently. However, using an NLB doesn’t directly address the requirement to handle IP address overlap and private connections between the SaaS provider and customers. It could facilitate load balancing, but it doesn't solve the issue of overlapping IPs or private connectivity.
- Rejection: The NLB is useful for traffic distribution but does not solve the main issue of private, secure connectivity without IP conflicts.
B) Configure an endpoint service, and grant the customers permission to create a connection to the endpoint service.
- Explanation: AWS PrivateLink allows you to create a private connection between VPCs without traversing the public internet. By configuring an endpoint service, the SaaS provider can expose their service through a private link, and customers can connect to it through an endpoint in their own VPC. This solution allows secure, private access without requiring IP address sharing or conflicts.
- Selected Option: This is a viable solution since it provides private connectivity between VPCs without any need for routing over the internet and without exposing internal IPs.
C) Deploy the SaaS service endpoint behind an Application Load Balancer.
- Explanation: An Application Load Balancer (ALB) operates at Layer 7 and is designed to handle HTTP/HTTPS traffic with more complex routing capabilities. While it could be used to distribute traffic to the SaaS service, it doesn’t directl...
Author: Olivia · Last updated May 16, 2026
A network engineer is designing the architecture for a healthcare company's workload that is moving to the AWS Cloud. All data to and from the on-premises environment must be encrypted in transit. All traffic also must be inspected in the cloud before the traffic is allowed to leave the cloud and travel to the on-premises environment or to the internet.
The company will expose components of the workload to the internet so that patients can reserve appointments. The architecture must secure these components and protect them against DDoS attacks. The arch...
To meet the security and performance requirements for a healthcare company's workload migrating to the AWS Cloud, the network engineer must design a solution that ensures encryption in transit, traffic inspection, protection from DDoS attacks, and secure exposure of workload components to the internet. Let’s analyze each option:
A) Use Traffic Mirroring to copy all traffic to a fleet of traffic capture appliances.
- Explanation: Traffic Mirroring allows you to capture and inspect network traffic in real-time. While this could be useful for monitoring traffic and detecting security threats, it doesn't directly address the requirement of inspecting traffic before it leaves the cloud or protect the infrastructure from DDoS attacks.
- Rejection: Traffic Mirroring is more suited for monitoring and analysis rather than for inline traffic inspection and protection. It does not provide security controls to enforce rules on the traffic itself or mitigate DDoS attacks.
B) Set up AWS WAF on all network components.
- Explanation: AWS Web Application Firewall (WAF) helps protect web applications by filtering and monitoring HTTP(S) requests. It can block common attack patterns such as SQL injection, cross-site scripting (XSS), and also provide rate limiting for DDoS mitigation. This would be especially useful for securing the exposed components of the workload that patients will interact with, such as reservation systems.
- Selected Option: AWS WAF is a critical component for protecting web-facing components and ensuring that malicious traffic is blocked before it reaches the application layer. This aligns with the need to secure the public-facing endpoints.
C) Configure an AWS Lambda function to create Deny rules in security groups to block malicious IP addresses.
- Explanation: While it’s possible to use AWS Lambda to automate actions based on security group rules, this approach would be reactive and not as effective in preventing attacks like DDoS in real-time. AWS security services like WAF and Shield are more appropriate for automatically mitigating threats at scale.
- Rejection: This is an overcomplicated and indirect method for blocking malicious IPs. AWS offers more specialize...
Author: Liam · Last updated May 16, 2026
A retail company is running its service on AWS. The company's architecture includes Application Load Balancers (ALBs) in public subnets. The ALB target groups are configured to send traffic to backend Amazon EC2 instances in private subnets. These backend EC2 instances can call externally hosted services over the internet by using a NAT gateway.
The company has noticed in its billing that NAT gateway usage has increased significan...
The task is to investigate the source of increased NAT gateway usage in the retail company's AWS environment. The network engineer needs to identify which sources are using the NAT gateway and potentially causing increased usage. Let's go over each option:
A) Enable VPC flow logs on the NAT gateway's elastic network interface. Publish the logs to a log group in Amazon CloudWatch Logs. Use CloudWatch Logs Insights to query and analyze the logs.
- Explanation: VPC flow logs capture information about IP traffic going to and from network interfaces in your VPC, including the NAT gateway's elastic network interface. By enabling flow logs for the NAT gateway's interface and publishing them to CloudWatch Logs, you can track which EC2 instances are initiating traffic that goes through the NAT gateway. Using CloudWatch Logs Insights, you can query and analyze the logs to identify patterns or sources of increased usage.
- Selected Option: This is a strong approach because VPC flow logs provide detailed information about the traffic patterns and can be analyzed for trends and anomalies that contribute to the increased NAT gateway usage.
B) Enable NAT gateway access logs. Publish the logs to a log group in Amazon CloudWatch Logs. Use CloudWatch Logs Insights to query and analyze the logs.
- Explanation: NAT gateway access logs provide information about the traffic that is processed by the NAT gateway. However, these logs primarily focus on the source and destination of the traffic and the amount of traffic. While this helps in seeing the amount of traffic that the NAT gateway is handling, it does not provide as detailed an analysis of the specific EC2 instances generating that traffic compared to VPC flow logs.
- Rejection: NAT gateway access logs focus on overall traffic processed by the NAT gateway, but they don't provide as granular visibility into the source of the traffic (i.e., which EC2 instances are generating the traffic), making them less useful than VPC flow logs in this scenario.
C) Configure Traffic Mirroring on the NAT gateway's elastic network interface. Send the traffic to an additional EC2 instance. Use tools such as tcpdump and Wireshark to query and analyze the mirrored traffic.
- Explanation: Traffic Mirroring allows capturing and analyzing network traffic at ...
Author: Ryan · Last updated May 16, 2026
A banking company is successfully operating its public mobile banking stack on AWS. The mobile banking stack is deployed in a VPC that includes private subnets and public subnets. The company is using IPv4 networking and has not deployed or supported IPv6 in the environment. The company has decided to adopt a third-party service provider's API and must integrate the API with the existing environment. The service provider's API requires the use of IPv6.
A network engineer must turn on IPv6 connectivity for the existing workload that is deployed in a private subnet. The company does ...
The banking company needs to enable IPv6 connectivity for its existing workload deployed in private subnets. The requirements state that:
1. IPv6 connectivity must be supported for integration with the third-party API.
2. The company does not want to permit IPv6 traffic from the public internet.
3. All IPv6 connectivity must be initiated by the company’s servers, meaning it should be outbound-only for IPv6.
Given these requirements, let's analyze the options:
A) Create an internet gateway and a NAT gateway in the VPC. Add a route to the existing subnet route tables to point IPv6 traffic to the NAT gateway.
- Explanation: This solution introduces an internet gateway, which is designed for internet-bound traffic. However, the company’s requirement specifies no IPv6 traffic from the public internet, so using an internet gateway would conflict with that requirement. Additionally, the NAT gateway is typically used to handle IPv4 traffic.
- Rejection: This solution would allow IPv6 traffic to the public internet, which violates the company's constraint of not permitting public IPv6 traffic.
B) Create an internet gateway and a NAT instance in the VPC. Add a route to the existing subnet route tables to point IPv6 traffic to the NAT instance.
- Explanation: Like option A, this option includes an internet gateway, which would allow traffic to/from the public internet, conflicting with the company’s requirement. The use of a NAT instance is less scalable and harder to manage compared to a NAT gateway, and still doesn’t meet the requirement to block public IPv6 traffic.
- Rejection: Again, the internet gateway would allow unwanted IPv6 traffic from...
Author: Samuel · Last updated May 16, 2026
A company has deployed an AWS Network Firewall firewall into a VPC. A network engineer needs to implement a solution to deliver Network Firewall flow logs to the company's Amazon OpenSearch Service (Amazon Elasticsearch...
The requirement is to deliver AWS Network Firewall flow logs to an Amazon OpenSearch Service (formerly Amazon Elasticsearch Service) cluster with minimal delay. Let's analyze each option and explain the reasoning behind the selected choice:
A) Create an Amazon S3 bucket. Create an AWS Lambda function to load logs into the Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster. Enable Amazon Simple Notification Service (Amazon SNS) notifications on the S3 bucket to invoke the Lambda function. Configure flow logs for the firewall. Set the S3 bucket as the destination.
- Explanation: This solution involves using an S3 bucket as an intermediate storage location for the flow logs, with a Lambda function triggered by SNS notifications to load the logs into OpenSearch. Although this solution works, there is inherent latency involved because the logs are first written to S3, and the Lambda function must process and load them into OpenSearch. This adds delay compared to a direct delivery mechanism.
- Rejection: While it is a valid approach, it introduces additional latency (storing logs in S3, invoking Lambda, and loading into OpenSearch) which is not ideal for this use case where low latency is required.
B) Create an Amazon Kinesis Data Firehose delivery stream that includes the Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster as the destination. Configure flow logs for the firewall. Set the Kinesis Data Firehose delivery stream as the destination for the Network Firewall flow logs.
- Explanation: This solution involves setting up Kinesis Data Firehose as a delivery stream, which can directly stream the logs into Amazon OpenSearch Service. Kinesis Data Firehose provides a managed and low-latency solution for streaming logs to OpenSearch, ensuring that logs are delivered in real time or near real time with minimal delay.
- Selected Opt...
Author: RadiantJaguar56 · Last updated May 16, 2026
A company is using custom DNS servers that run BIND for name resolution in its VPCs. The VPCs are deployed across multiple AWS accounts that are part of the same organization in AWS Organizations. All the VPCs are connected to a transit gateway. The BIND servers are running in a central VPC and are configured to forward all queries for an on-premises DNS domain to DNS servers that are hosted in an on-premises data center. To ensure that all the VPCs use the custom DNS servers, a network engineer has configured a VPC DHCP options set in all the VPCs that specifies the custom DNS servers to be used as domain name servers.
Multiple development teams in the company want to use Amazon Elastic File System (Amazon EFS). A development team has created a new EFS file system but cannot mount the file system to one of its Amazon EC2 inst...
Let's break down each option in the context of the problem.
Context and Goal:
The issue is that the EC2 instance cannot resolve the domain name for the EFS mount point. The VPCs are using custom DNS servers (via BIND) for name resolution, but the EC2 instance cannot resolve the DNS name for Amazon EFS.
Option A: Configure the BIND DNS servers in the central VPC to forward queries for efs.us-east-1.amazonaws.com to the Amazon provided DNS server (169.254.169.253).
- Explanation: This option suggests configuring BIND servers to forward queries for Amazon EFS to the Amazon DNS server. Since EFS requires Amazon DNS resolution for its service endpoints, configuring the DNS servers in the central VPC to forward queries to the Amazon DNS server can help the instances resolve EFS-specific names.
- Why it's selected: The Amazon DNS server (169.254.169.253) is responsible for resolving service names like `efs.us-east-1.amazonaws.com`, which is needed to mount the EFS file system. This ensures that the EC2 instances can resolve the necessary Amazon EFS DNS names.
- Why other options are rejected: This solution doesn't disrupt the use of custom DNS servers and leverages existing BIND forwarding.
Option B: Create an Amazon Route 53 Resolver outbound endpoint in the central VPC. Update all the VPC DHCP options sets to use AmazonProvidedDNS for name resolution.
- Explanation: This option involves using Route 53 Resolver outbound endpoints to forward DNS queries for AWS resources to the Amazon DNS server. This ensures that AWS services like EFS can be resolved without disrupting existing DNS infrastructure.
- Why it's selected: This option would also solve the issue of DNS resolution for EFS and other AWS services, as the Resolver outbound endpoint would forward requests to AmazonProvidedDNS for services like EFS.
- Why other options are rejected: While this approach works, Option A directly addresses the specific issue of resolving EFS endpoints by forwarding queries to the Amazon DNS server. Option B could be an alternative but adds complexity by introducing a Resolver outbound endpoint when forwarding queries directly from BIND is simpler.
...
Author: Lucas · Last updated May 16, 2026
An ecommerce company is hosting a web application on Amazon EC2 instances to handle continuously changing customer demand. The EC2 instances are part of an Auto Scaling group. The company wants to implement a solution to distribute traffic from customers to the EC2 instances. The company must encrypt all traffic at all stages bet...
Let's break down each option in the context of the given requirements: distribute traffic, encrypt all traffic at all stages, and no decryption at intermediate points.
Requirements Recap:
- Distribute traffic: A solution is needed to distribute traffic to EC2 instances, possibly in an Auto Scaling group.
- Encrypt all traffic: Traffic must be encrypted from the customer to the application servers and remain encrypted during transit, with no decryption at intermediate points.
Option A: Create an Application Load Balancer (ALB). Add an HTTPS listener to the ALB. Configure the Auto Scaling group to register instances with the ALB's target group.
- Explanation: ALB supports HTTP and HTTPS listeners. When configured with an HTTPS listener, ALB terminates SSL/TLS at the load balancer, decrypting traffic before forwarding it to the backend instances.
- Why it's rejected: Although ALB supports HTTPS, ALB decrypts the traffic at the load balancer before forwarding it to the EC2 instances. Since the requirement is for no decryption at intermediate points, this option is not suitable. Traffic will be decrypted at the ALB, violating the requirement.
Option B: Create an Amazon CloudFront distribution. Configure the distribution with a custom SSL/TLS certificate. Set the Auto Scaling group as the distribution's origin.
- Explanation: CloudFront is a content delivery network (CDN) that supports HTTPS. By configuring CloudFront with a custom SSL/TLS certificate, the entire traffic can be encrypted from the client to CloudFront and from CloudFront to the EC2 instances.
- Why it’s selected: CloudFront can encrypt traffic between the client and CloudFront, and if configured correctly, the traffic can remain encrypted as it is passed to the Auto Scaling group. No intermediate dec...
Author: Aria · Last updated May 16, 2026
A company has two on-premises data center locations. There is a company-managed router at each data center. Each data center has a dedicated AWS Direct Connect connection to a Direct Connect gateway through a private virtual interface. The router for the first location is advertising 110 routes to the Direct Connect gateway by using BGP, and the router for the second location is advertising 60 routes to the Direct Connect gateway by using BGP. The Direct Connect gateway is attached to a company VPC through a virtual private gateway.
A network engineer receives reports that resources in the VPC are not reachable from various locations in either data center. The network eng...
Scenario Overview:
The network engineer is experiencing an issue where resources in the VPC are not reachable from the two on-premises data centers. The VPC route table is not populated with routes from the first data center. The Direct Connect gateway is used to connect the on-premises routers to the VPC, but the routes from one of the data centers (first data center) are not appearing in the VPC route table.
Key Details:
- Two on-premises data centers: Each connected to AWS via Direct Connect.
- BGP Advertising: The first data center is advertising 110 routes, and the second is advertising 60.
- VPC Route Table: Routes from the first data center are not appearing in the route table, suggesting a possible issue with route propagation or limits on the number of routes.
Let's analyze each option:
Option A: Remove the Direct Connect gateway, and create a new private virtual interface from each company router to the virtual private gateway of the VPC.
- Explanation: This option suggests removing the Direct Connect gateway and directly creating a private virtual interface between each router and the VPC's virtual private gateway.
- Why it’s rejected: This approach would be inconvenient and disruptive as it involves significant reconfiguration, including changes to existing Direct Connect connections. It does not directly address the issue of missing routes in the VPC route table. The problem likely lies in the BGP route propagation, which could be resolved without needing to reconfigure the Direct Connect gateway or the virtual private interface.
Option B: Change the router configurations to summarize the advertised routes.
- Explanation: This option suggests configuring the routers to summarize the routes being advertised via BGP. By summarizing routes, fewer routes would be advertised to AWS, potentially preventing exceeding any AWS BGP route limits.
- Why it’s selected: Route limits are often a common issue when multiple routes are advertised via BGP. AWS imposes a limit on the number of r...
Author: GlowingTiger · Last updated May 16, 2026
A company has expanded its network to the AWS Cloud by using a hybrid architecture with multiple AWS accounts. The company has set up a shared AWS account for the connection to its on-premises data centers and the company offices. The workloads consist of private web-based services for internal use. These services run in different AWS accounts. Office-based employees consume these services by using a DNS name in an on-premises DNS zone that is named example.internal.
The process to register a new service that runs on AWS requires a manual and complicated change request to the internal DNS. The process involves many teams.
The company wants to update the DNS registration process by giving the service creators access that will allow them ...
To meet the company's goal of allowing service creators to register their DNS records with minimal configuration changes and maximize cost-effectiveness, the network engineer should implement a solution that simplifies the DNS management process across multiple AWS accounts and integrates it smoothly with the on-premises DNS infrastructure. The goal is to provide a self-service DNS registration mechanism without requiring manual intervention for each new service.
Key Factors:
- The solution must automate the DNS registration process for AWS-hosted services.
- It must be cost-effective with the least possible configuration complexity.
- It should enable service creators to manage DNS records without needing to go through the internal DNS change request process.
- Integration with on-premises DNS is required because the employees access services using DNS names in the `example.internal` zone.
Let's analyze each option:
Option A: Create a record for each service in its local private hosted zone (serviceA.account1.aws.example.internal). Provide this DNS record to the employees who need access.
- Explanation: This would involve creating a separate DNS record for each service in its own private hosted zone. The records would then be provided to employees.
- Why it's rejected: This option would work if employees were directly accessing the private hosted zones, but it would still require coordination to make the DNS records accessible across multiple AWS accounts and to integrate with the on-premises DNS. It doesn't automate the registration process and is not scalable for the company’s goal.
Option B: Create an Amazon Route 53 Resolver inbound endpoint in the shared account VPC. Create a conditional forwarder for a domain named aws.example.internal on the on-premises DNS servers. Set the forwarding IP addresses to the inbound endpoint's IP addresses that were created.
- Explanation: This option involves setting up a Route 53 Resolver inbound endpoint to forward DNS queries for `aws.example.internal` to the shared account’s VPC and then using conditional forwarding on the on-premises DNS servers.
- Why it's selected: This solution centralizes DNS management in the shared AWS account and allows DNS queries for `aws.example.internal` to be forwarded to the shared account’s Route 53 Resolver, which can then resolve names for services running in different AWS accounts. This solution is efficient, provides seamless integration, and centralizes DNS resolution for internal services.
- Why other options are rejected: This approach addresses both the integration with the on-premises DNS and the need to manage AWS DNS records without requiring manual intervention. It automates DNS resolution for internal services and reduces complexity.
Option C: Create an Amazon Route 53 Resolver rule to forward any queries made to onprem.example.internal to the on-premises DNS servers.
- Explanation: This option sets up a Route 53 Resolver rule to forward queries for `onprem.example.internal` to on-premises DNS servers.
- Why it's rejected: While this option could help in resolving DNS for on-premises services, it does not directly address the issue of allow...
Author: Ava · Last updated May 16, 2026
A company has multiple AWS accounts. Each account contains one or more VPCs. A new security guideline requires the inspection of all traffic between VPCs.
The company has deployed a transit gateway that provides connectivity between all VPCs. The company also has deployed a shared services VPC with Amazon EC2 instances that include IDS services for stateful inspection. The EC2 instances are deployed across three Availability Zones. The company has set up VPC associations and routing on the transit gateway. The company has migrated a few test VPCs to th...
In this scenario, the company has deployed a transit gateway to provide connectivity between multiple VPCs, and the traffic must be inspected by EC2 instances running IDS services in a shared services VPC. The issue reported is intermittent connections for traffic crossing Availability Zones. The goal is to determine how to resolve the intermittent connectivity issues.
Let's analyze each option:
Option A: Modify the transit gateway VPC attachment on the shared services VPC by enabling cross-Availability Zone load balancing.
- Explanation: Cross-AZ load balancing allows traffic to be distributed across multiple Availability Zones when traffic is being forwarded to services like EC2 instances. This option helps with ensuring that traffic is efficiently balanced across the EC2 instances deployed in multiple Availability Zones.
- Why it's selected: Since the EC2 instances are deployed across three Availability Zones, enabling cross-AZ load balancing ensures that traffic is properly distributed across all EC2 instances, potentially improving the reliability of traffic flow. It addresses the intermittent connectivity issue when traffic crosses Availability Zones by enabling proper load distribution.
Option B: Modify the transit gateway VPC attachment on the shared services VPC by enabling appliance mode support.
- Explanation: Appliance mode is a feature in the transit gateway that is specifically designed for use cases where network appliances, like firewalls or IDS/IPS systems, are deployed to inspect traffic. It ensures that traffic sent to a network appliance does not pass through the transit gateway’s route table lookup and instead is forwarded directly to the appliance.
- Why it's rejected: Appliance mode is not necessary for this case, as the EC2 instances performing the IDS inspection are already deployed in the shared services VPC. This configuration would bypass the transit gateway’s r...
Author: Ethan · Last updated May 16, 2026
A company is using a NAT gateway to allow internet connectivity for private subnets in a VPC in the us-west-2 Region. After a security audit, the company needs to remove the NAT gateway.
In the private subnets, the company has resources that use the unified Amazon CloudWatch agent. A network engineer must create a solution to ensure that the unified CloudWatch agent c...
To ensure that the unified Amazon CloudWatch agent continues to work after the NAT gateway is removed, we need to focus on solutions that allow the private subnets to access CloudWatch services without the need for internet connectivity through a NAT gateway. This can be achieved by leveraging VPC endpoints for Amazon CloudWatch services, along with proper network configurations. Let's break down the options and determine which steps should be taken.
Option Analysis:
A) Validate that private DNS is enabled on the VPC by setting the enableDnsHostnames VPC attribute and the enableDnsSupport VPC attribute to true.
- Why selected: Enabling DNS hostnames and support is essential for private DNS resolution within the VPC. This is necessary when using VPC endpoints because it ensures that services like CloudWatch can be resolved correctly without requiring the NAT gateway.
- Reason for rejection: While important, this option alone doesn't address the removal of the NAT gateway and doesn't directly ensure that CloudWatch access continues working. However, it supports other configurations.
B) Create a new security group with an entry to allow outbound traffic that uses the TCP protocol on port 443 to destination 0.0.0.0/0.
- Why rejected: This would allow outbound internet access to any destination, which negates the purpose of using VPC endpoints for private communication. We are trying to avoid using the NAT gateway, and allowing outbound traffic to 0.0.0.0/0 conflicts with the objective.
- Reason for rejection: It's unnecessary if we implement VPC endpoints for CloudWatch services and restrict traffic to those endpoints.
C) Create a new security group with entries to allow inbound traffic that uses the TCP protocol on port 443 from the IP prefixes of the private subnets.
- Why rejected: This option is aimed at controlling inbound traffic, but CloudWatch is an outbound service. You need to focus on allowing outbound traffic to the CloudWatch endpoints, not inbound traffic from the private subnets.
- Reason for rejection: Doesn't align with the need to access CloudWatch services through...
Author: Ethan · Last updated May 16, 2026
An international company provides early warning about tsunamis. The company plans to use IoT devices to monitor sea waves around the world. The data that is collected by the IoT devices must reach the company's infrastructure on AWS as quickly as possible. The company is using three operation centers around the world. Each operation center is connected to AWS through Its own AWS Direct Connect connection. Each operation center is connected to the internet through at least two upstream internet service providers.
The company has its own provider-independent (PI) address space. The IoT devices use TCP protocols for reliable transmission of the data they collect. The IoT devices have both landline and mobile internet connectivity. The i...
To meet the company's requirements for the highest availability and optimal connectivity between IoT devices and AWS services, let's analyze each solution and the corresponding factors.
Option Analysis:
A) Set up an Amazon CloudFront distribution with origin failover. Create an origin group for each Region where the solution is deployed.
- Why rejected: CloudFront is primarily used for content delivery and caching, typically to distribute static content (e.g., websites, media files). While CloudFront does offer origin failover, it's not designed for real-time IoT data transmission. The IoT devices are sending TCP-based data, and CloudFront does not provide the low-latency, high-availability characteristics required for IoT traffic.
- Reason for rejection: CloudFront is not optimized for TCP connections, which is a key requirement for the IoT devices transmitting data reliably.
B) Set up Route 53 latency-based routing. Add latency alias records. For the latency alias records, set the value of Evaluate Target Health to Yes.
- Why selected: Amazon Route 53 latency-based routing enables the DNS service to direct IoT traffic to the AWS Region with the lowest latency. By evaluating the health of the endpoints, Route 53 can route traffic to healthy resources, enhancing availability. This approach also allows for multi-Region deployment, ensuring high availability for the IoT devices by directing traffic to the best-performing AWS Region.
- Reason for selection: This solution directly addresses the requirement for high availability and low-latency data routing to the appropriate AWS Region based on real-time conditions. It is a natural fit for the use case of reliable data transmission from IoT devices, leveraging AWS’s built-in DNS and health checks for high availability.
C) Set up an accelerator in AWS Global Accelerator. Configure Regional endpoint groups and health checks.
- Why selected: AWS Global Accelerator provides global traffic management and uses anycast IP addresses...
Author: MysticJaguar44 · Last updated May 16, 2026
A company is planning a migration of its critical workloads from an on-premises data center to Amazon EC2 instances. The plan includes a new 10 Gbps AWS Direct Connect dedicated connection from the on-premises data center to a VPC that is attached to a transit gateway. The migration must occur over encrypted paths be...
To meet the company's requirements for migrating critical workloads from an on-premises data center to Amazon EC2 instances over encrypted paths with the highest throughput, let’s analyze each option based on throughput, security, and scalability.
Option Analysis:
A) Configure a public VIF on the Direct Connect connection. Configure an AWS Site-to-Site VPN connection to the transit gateway as a VPN attachment.
- Why rejected: A public Virtual Interface (VIF) on Direct Connect connects to AWS public services (like S3 or DynamoDB), not to a VPC directly, which does not meet the requirement for secure communication between the on-premises data center and the VPC. Furthermore, Site-to-Site VPN connections generally offer lower throughput compared to Direct Connect and may not provide sufficient performance for a 10 Gbps connection.
- Reason for rejection: Public VIF doesn't support direct VPC connectivity, and VPN connections over a public internet path will not provide the highest throughput.
B) Configure a transit VIF on the Direct Connect connection. Configure an IPsec VPN connection to an EC2 instance that is running third-party VPN software.
- Why rejected: A transit VIF on Direct Connect provides a high-performance, dedicated connection to a VPC and allows the use of the AWS Transit Gateway. However, configuring IPsec VPN with third-party VPN software introduces an additional layer of complexity and reduces the throughput. IPsec VPNs generally do not scale well for high-throughput connections like 10 Gbps due to encryption overhead.
- Reason for rejection: Introducing third-party VPN software adds unnecessary complexity and reduces throughput, not suitable for high-throughput requirements.
C) Configure MACsec for the Direct Connect connection. Configure a transit VIF to ...