Microsoft Exam Practice Questions - Page 93

Microsoft Practice Questions, Discussions & Exam Topics by our Authors

HOTSPOT - You are designing an enterprise data warehouse in Azure Synapse Analytics that will store website traffic analytics in a star schema. You plan to have a fact table for website visits. The table will be approximately 5 GB. You need to recommend which distribution type and index type to use for the table. The solution must provide the fastest query perfo...

Author: Ethan · Last updated May 27, 2026

You have an Azure Stream Analytics job. You need to ensure that the job has enough streaming units provisioned. You configure monitoring of the SU % Utilization metric. Which two additional metrics should you monitor? Eac...

To ensure that an Azure Stream Analytics job has enough streaming units (SUs) provisioned, it's important to monitor relevant metrics that reflect how the job is handling data processing in real time. You've already configured monitoring of the SU % Utilization metric, which helps track the overall resource utilization. However, additional metrics are needed to ensure that the job is processing data efficiently and that resources are not overloaded or underutilized. Let's evaluate each option based on the key factors: Key Considerations: - Ensure sufficient streaming units to handle data processing needs. - Monitor delays or bottlenecks in processing to identify potential resource issues. --- A) Backlogged Input Events - Explanation: This metric tracks the number of events that are queued up but not yet processed due to resource constraints or high data volume. - Relevance: Monitoring backlogged input events is crucial because a large backlog may indicate that the job isn't able to process incoming events quickly enough. If there's a backlog, it suggests that the current number of streaming units isn't sufficient to handle the load. This directly impacts resource allocation, and addressing backlogged events can help determine if more SUs are needed. - When to use: This is a key metric to monitor alongside SU % Utilization because it indicates a processing bottleneck due to insufficient resources. --- B) Watermark Delay - Explanation: Watermark delay measures the delay in data processing compared to the expected processing time (watermark). - Relevance: While this is an important metric to track, especially for assessing how timely the job is in processing data, it is more focused on the time it takes to finish computations rather than directly addressing resource utilization. Watermark delay might indicate a delay in processing, but it is not as directly tied to resource provisioning as backlogged input events. - When to use: This metric is valuable but not as directly linked to the need for additional streaming units. It is more useful for assessing latency rather than resource sufficiency. --- C) Function Events - Explanation: This metric tracks events that are proces...

Author: Sofia · Last updated May 27, 2026

You have an activity in an Azure Data Factory pipeline. The activity calls a stored procedure in a data warehouse in Azure Synapse Analytics and runs daily. You need to v...

To verify the duration of an activity that calls a stored procedure in an Azure Synapse Analytics data warehouse and runs daily within an Azure Data Factory (ADF) pipeline, the primary goal is to track the execution time of that activity. Let’s evaluate each option based on this requirement. Key Considerations: - Track the duration of an activity: We need to identify a method that can provide insights on the duration of the specific activity that executes the stored procedure. - Focus on the activity run and its associated time: The solution should focus on how to monitor the activity run itself and obtain the duration for the specific execution. --- A) Activity Runs in Azure Monitor - Explanation: Azure Monitor can track the performance of Azure resources, including data pipelines and activities in Azure Data Factory. Azure Monitor logs provide detailed insights into activity runs, including success or failure, execution duration, and any associated issues. - Relevance: This is the most relevant option because Azure Monitor can capture details about activity runs in Data Factory, including their start and end times, which can then be used to calculate the duration of each run. You can also set up alerts and track execution history, making it the best fit for this requirement. - When to use: Ideal for tracking the execution time of activities in Data Factory and gaining insights into performance and duration over time. --- B) Activity Log in Azure Synapse Analytics - Explanation: The Activity Log in Azure Synapse Analytics records the management activities for the resources in Synapse Analytics, such as user access and resource modifications. - Relevance: While this log provides information on management actions related to Synapse, it doesn’t track execution times or durations of stored procedures that are called from external services like Data Factory. It primarily logs administrative activities rather than runtime execution details. - When to use: This would be useful for monitoring management activities, not fo...

Author: ElectricLionX · Last updated May 27, 2026

You have an Azure Data Factory pipeline that is triggered hourly. The pipeline has had 100% success for the past seven days. The pipeline execution fails, and two retries that occur 15 minutes apart also fail. The third failure returns the following error. ErrorCode=UserErrorFileNotFound,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=ADLS Gen2 operation failed for: Operation returned an invalid status code 'NotFound'. Account: 'contosoproduksouth'. Filesystem: wwi. Path: 'BIKES/CARBON/year=2021/...

To diagnose the error described in the Azure Data Factory (ADF) pipeline, let's break down the error message and assess each option carefully. Error Breakdown: The error message indicates that the operation failed because the specified path does not exist in the Azure Data Lake Storage Gen2 (ADLS Gen2) account. The path in question is: - Account: 'contosoproduksouth' - Filesystem: 'wwi' - Path: 'BIKES/CARBON/year=3D2021/month=3D01/day=3D10/hour=3D06' It shows that ADF cannot find the file or directory at the given path. The error is specifically 'PathNotFound', which implies that either the directory or file does not exist at the time of the operation. Let's evaluate each option: - A) The parameter used to generate year=3D2021/month=3D01/day=3D10/hour=3D06 was incorrect. - Reasoning: This option suggests that the error might be caused by an issue with the parameters used to generate the path (incorrect year, month, day, or hour values). However, the error message doesn't suggest that the parameters are malformed. The error is specifically about the path not being found, not about an invalid path format. - Rejected: This option is not a strong candidate since the error points to the absence of the file rather than an issue with parameter formatting. - B) From 06:00 to 07:00 on January 10, 2021, there was no data in wwi/BIKES/CARBON. - Reasoning: This option suggests that the file path is valid, but there was simply no data in the expected location during the specified time frame (06:00 to 07:00). If there's no data at that time, ADF would fai...

Author: SilverBear · Last updated May 27, 2026

You have an Azure Synapse Analytics job that uses Scala. You need to view the status of the job. W...

To determine the status of an Azure Synapse Analytics job that uses Scala, you need to monitor Spark jobs since Scala typically runs on Apache Spark in Azure Synapse. Let's analyze each option: A) From Synapse Studio, select the workspace. From Monitor, select SQL requests. - Reasoning: This option would show SQL requests, but the job in question is a Scala job, which likely runs on Apache Spark. SQL requests are related to T-SQL or Synapse SQL pools, and they won't provide information on Spark-based jobs. This option is more relevant for monitoring SQL queries rather than Spark jobs. - Rejected: Not relevant for monitoring Scala-based jobs, which run on Spark. B) From Azure Monitor, run a Kusto query against the AzureDiagnostics table. - Reasoning: The AzureDiagnostics table holds logs from various Azure services, but it is more general and doesn't specifically target the Spark job logs or job status. While you could potentially find logs related to Synapse jobs here, it’s not tailored for monitoring Spark applications, and it requires specific queries to drill into relevant logs. - Rejected: This is too general and not optimized for viewing the status of Scala/Spark jobs in Synapse Analytics. C) From Synapse Studio, select the workspace. From Monitor, select Apache Spark applications. - Reasoning: This option is the most relevant. Since Scala j...

Author: Noah · Last updated May 27, 2026

DRAG DROP - You have an Azure Data Lake Storage Gen2 account that contains a JSON file for customers. The file contains two attributes named FirstName and LastName. You need to copy the data from the JSON file to an Azure Synapse Analytics table by using Azure Databricks. A new column must be created that concatenates the FirstName and LastName values. You create the following components: * A destination table in Azure Synapse * An Azure Blob storage container * A ...

Author: Lucas · Last updated May 27, 2026

You have an Azure data factory named ADF1. You currently publish all pipeline authoring changes directly to ADF1. You need to implement version control for the changes made to pipeline artifacts. The solution must ensure that you can apply version control to the resources currently defined in the UX Authoring canvas for ADF1...

To implement version control for the changes made to pipeline artifacts in Azure Data Factory (ADF1), the solution must integrate ADF1 with a version control system (like Git) so that pipeline authoring changes are tracked, managed, and versioned. Let’s evaluate the options: A) From the UX Authoring canvas, select Set up code repository. - Reasoning: This is the correct option to initiate version control for your ADF1 pipelines. By selecting "Set up code repository" from the UX Authoring canvas, you configure the ADF1 instance to connect to a Git repository (such as Azure Repos or GitHub). This enables version control for your pipelines, datasets, and other resources. It’s the necessary step to start using Git integration for versioning. - Selected: This is part of the solution since it sets up the code repository to track changes. B) Create a Git repository. - Reasoning: This is also correct. You need a Git repository (either Azure Repos or GitHub) to track the version history of your pipeline artifacts. The Git repository acts as the storage location for the source control of the ADF1 pipelines. Without creating a Git repository, version control cannot function. - Selected: This is essential because you need a Git repository to store the pipeline version history and enable version control. C) Create a GitHub action. - Reasoning: GitHub actions are used for automating workflows, such as CI/CD processes, but they are not directly required to enable version control within Azure Data Factory. While GitHub Actions can be useful for automating deployment or testing of pipelines, they are not necessary to simply enable version control within the ADF1...

Author: SolarFalcon11 · Last updated May 27, 2026

DRAG DROP - You have an Azure subscription that contains an Azure Synapse Analytics workspace named workspace1. Workspace1 connects to an Azure DevOps repository named repo1. Repo1 contains a collaboration branch named main and a development branch named branch1. Branch1 contains an Azure Synapse pipeline named pipeline1. In workspace1, you complete testing of pipeline1. You need to schedule pipeline1 to run daily at 6 AM. Which four actions should you perform in sequence? To answer, move the appropriate actions ...

Author: Amira · Last updated May 27, 2026

HOTSPOT - You have an Azure subscription that contains an Azure Synapse Analytics dedicated SQL pool named Pool1 and an Azure Data Lake Storage account named storage1. Storage1 requires secure transfers. You need to create an external data source in Pool1 that will be used to read .orc files in storage1. How should you c...

Author: Noah · Last updated May 27, 2026

You have an Azure subscription that contains an Azure Synapse Analytics dedicated SQL pool named SQLPool1. SQLPool1 is currently paused. You need to restore the ...

To restore the current state of a paused Azure Synapse Analytics dedicated SQL pool (SQLPool1) to a new SQL pool, let's evaluate the provided options: A) Create a workspace. - Reasoning: A workspace in Azure Synapse is a container that holds different data processing and analytics resources, such as SQL pools, Spark pools, and data lakes. Creating a new workspace is not necessary in this scenario, because you already have a dedicated SQL pool (SQLPool1) and are looking to restore its state, not create new resources for data processing. - Rejected: Creating a workspace is irrelevant to the task of restoring SQLPool1 to a new SQL pool. B) Create a user-defined restore point. - Reasoning: A user-defined restore point allows you to mark a specific point in time in your dedicated SQL pool, making it possible to restore to that point later. However, you can only create a restore point when the SQL pool is running. Since SQLPool1 is paused, you would need to resume the pool first before creating a restore point. - Rejected: This option would only be possible if the SQL pool is resumed, so it cannot be the first step in the process. C) Resum...

Author: Ryan · Last updated May 27, 2026

You are designing an Azure Synapse Analytics workspace. You need to recommend a solution to provide double encryption of all the data at rest. Which two components should you include in the recommendation? Each co...

To provide double encryption of all the data at rest in an Azure Synapse Analytics workspace, we need to ensure that the data is encrypted using two layers of encryption: one managed by Azure and one managed by the customer (with customer-managed keys). Let's evaluate the options based on this requirement: A) an X.509 certificate - Reasoning: X.509 certificates are typically used for secure communication and identity management, such as for SSL/TLS or code-signing purposes. While they play a key role in various security scenarios, they are not directly involved in encrypting data at rest in Azure Synapse Analytics. Double encryption usually involves the use of keys rather than certificates. - Rejected: This is not suitable for double encryption of data at rest in Azure Synapse Analytics. B) an RSA key - Reasoning: RSA keys are used for public-key cryptography and can be used for encryption. In the context of double encryption at rest, RSA keys can be used as part of customer-managed keys (CMK), where the customer provides the encryption key, allowing for encryption on top of the encryption Azure already provides (service-managed encryption). This fulfills one layer of encryption in the double encryption scenario. - Selected: RSA keys are essential for implementing customer-managed keys for encryption, which is a core part of the solution for double encryption. C) an Azure virtual network that has a network security group (NSG) - Reasoning: While an Azure virtual network (VNet) with a Network Security Group (NSG) enhances security by controlling network traffic, it is not relevant to data encryption at rest. Network security controls do not address the encryp...

Author: Lina Zhang · Last updated May 27, 2026

You have an Azure Synapse Analytics serverless SQL pool named Pool1 and an Azure Data Lake Storage Gen2 account named storage1. The AllowBlobPublicAccess property is disabled for storage1. You need to create an external data source that can be u...

To create an external data source in an Azure Synapse Analytics serverless SQL pool (Pool1) that allows Azure Active Directory (Azure AD) users to access data from Azure Data Lake Storage Gen2 (storage1), we need to ensure that the external data source can authenticate securely using Azure AD and interact with the storage account. Let’s evaluate each option: A) an external resource pool - Reasoning: An external resource pool is a concept used in Synapse for managing computational resources, particularly for controlling workloads in dedicated SQL pools or for Spark. It is not related to setting up access to external storage like Azure Data Lake Storage Gen2. The requirement is to allow external data access, not manage compute resources. - Rejected: This option is unrelated to setting up external data access. B) an external library - Reasoning: An external library refers to a library that is either pre-installed or custom libraries used to extend the functionality of Synapse, particularly in Spark pools or for custom logic. While libraries are important for extending capabilities, they are not required to create an external data source for accessing data in Azure Data Lake Storage Gen2. - Rejected: This option does not apply to creating an external data source in a serverless SQL pool. C) database scoped credentials - Reasoning: Database scoped credentials are used to store authentication information (like Azure AD identiti...

Author: Chloe · Last updated May 27, 2026

You have an Azure Data Factory pipeline named Pipeline1. Pipeline1 contains a copy activity that sends data to an Azure Data Lake Storage Gen2 account. Pipeline1 is executed by a schedule trigger. You change the copy activity sink to a new storage account and merge the changes into the collaboration branch. After Pipeline1 executes, you disc...

To ensure that data is copied to the new Azure Data Lake Storage Gen2 account after changing the sink in the Azure Data Factory (ADF) pipeline, let’s analyze the given options: A) Publish from the collaboration branch. - Reasoning: In Azure Data Factory, after making changes in the collaboration branch, you need to publish those changes to the published branch to make them active and take effect. Since you made changes to the copy activity (specifically changing the sink to a new storage account) and merged them into the collaboration branch, the changes have not yet been applied to the live pipeline. Publishing ensures that all the changes in the collaboration branch (including changes to the sink in the copy activity) are reflected in the actual pipeline that will execute. - Selected: This is the correct action because publishing the changes from the collaboration branch to the published branch will ensure that the changes are deployed and take effect in the pipeline execution. B) Create a pull request. - Reasoning: A pull request (PR) is used for code review and merging changes from one branch to another in a version control system like Git. While a pull request might be used to merge changes into the collaboration branch, it does not automatically apply changes to the published branch. In the case of Azure Data Factory, after changes are merged into the collaboration branch, you must publish them for them to be applied to the live environment. A PR is not the step that ensures the pipeline changes take effect. - Rejected: While a pull request is...

Author: Ahmed97 · Last updated May 27, 2026

You have an Azure Data Factory pipeline named pipeline1 that is invoked by a tumbling window trigger named Trigger1. Trigger1 has a recurrence of 60 minutes. You need to ensure that pipeline1 will execute only if the previo...

To configure the self-dependency for the tumbling window trigger in Azure Data Factory so that `pipeline1` executes only if the previous execution completes successfully, you need to set the `offset` and `size` parameters appropriately in the trigger configuration. Let's go through each option and analyze them: 1. Understanding the Terms: - `offset`: Specifies the time difference between the start of the window and the time the pipeline should trigger. A positive offset will delay the execution, and a negative offset will trigger earlier. - `size`: Defines the duration of the window, i.e., the time range the pipeline will execute for. If the size is "00:01:00," the pipeline will run for 1 minute. 2. Option Analysis: A) offset: "-00:01:00" size: "00:01:00" - Offset: `-00:01:00` means the pipeline should trigger one minute before the current time, which can cause an overlap in execution if the previous execution hasn't finished yet. This is not ideal for ensuring the previous execution completes before the next one begins. - Size: `00:01:00` indicates the pipeline will run for 1 minute, but this overlap could lead to issues with pipeline completion and resource contention. B) offset: "01:00:00" size: "-01:00:00" - Offset: `01:00:00` means the pipeline is delayed by one hour, which might be too long for a 60-minute recurrence window, leading to a delay in triggering. - Size: `-01:00:00` is an invalid duration since size must be a positive duration that specifies how long the window lasts. - Conclusion: This option is incorrect because the `size` value is invalid. ...

Author: Aria · Last updated May 27, 2026

HOTSPOT - You have an Azure Synapse Analytics pipeline named Pipeline1 that contains a data flow activity named Dataflow1. Pipeline1 retrieves files from an Azure Data Lake Storage Gen 2 account named storage1. Dataflow1 uses the AutoResolveIntegrationRuntime integration runtime configured with a core count of 128. You need to optimize the number of cores used by D...

Author: Liam123 · Last updated May 27, 2026

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution. After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen. You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following three workloads: * A workload for data engineers who will use Python and SQL. * A workload for jobs that will run notebooks that use Python, Scala, and SQL. * A workload that data scientists will use to perform ad hoc analysis in Scala and R. The enterprise architecture team at your company identifies the following standards for Databricks environments: * The data engineers must share a cluster. * The job cluster will be managed by using a request process whereby data scientists and data engineers provide...

Let's break down the requirements and solution options: Requirements Recap: 1. Data Engineers: - They need to share a cluster. - The workload involves Python and SQL. 2. Jobs for Notebooks (Data Engineers and Data Scientists): - The notebooks should use Python, Scala, and SQL. - The jobs are managed by a request process, and notebooks are provided for deployment. 3. Data Scientists: - Each data scientist should have their own cluster. - The cluster should automatically terminate after 120 minutes of inactivity. Proposed Solution (Solution): - Standard cluster for each data scientist: - This satisfies the requirement that each data scientist has their own cluster, and clusters are automatically terminated after 120 minutes of inactivity. This fits the requirement for the data scientists. - High Concurrency cluster for the data engineers: - A High Concurrency cluster is ideal for multiple users and workloads, especially when they are working collaboratively. This also ensures that data engineers share a cluster as required. It supports multiple simultaneous jobs and has features for optimal resource sharing. - Standard cluster for jobs: ...

Author: Maya · Last updated May 27, 2026

Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution. After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen. You plan to create an Azure Databricks workspace that has a tiered structure. The workspace will contain the following three workloads: * A workload for data engineers who will use Python and SQL. * A workload for jobs that will run notebooks that use Python, Scala, and SQL. * A workload that data scientists will use to perform ad hoc analysis in Scala and R. The enterprise architecture team at your company identifies the following standards for Databricks environments: * The data engineers must share a cluster. * The job cluster will be managed by using a request process whereby data scientists and data engineers provide packa...

Let's break down the requirements and the proposed solution: Requirements Recap: 1. Data Engineers: - Must share a cluster. - The workload will involve Python and SQL. 2. Jobs for Notebooks: - The notebooks will use Python, Scala, and SQL. - The job cluster will be managed via a request process, with data scientists and data engineers providing packaged notebooks for deployment. 3. Data Scientists: - Each data scientist needs their own cluster. - The cluster must terminate automatically after 120 minutes of inactivity. - There are three data scientists. Proposed Solution: - Standard cluster for each data scientist: - This satisfies the requirement that each data scientist must have their own cluster. Standard clusters are appropriate for interactive work such as ad-hoc analysis, and automatic termination after inactivity can be configured. - High Concurrency cluster for data engineers: - This is a good choice because a High Concurrency cluster allows multiple data engineers to share the same cluster while ensuring proper resource management for concurrent jobs. This satisfies the requirement that data engineers share a cluster. - High Concurrency cluster for jobs: - Issue: The job cluster doesn’t need to be a High Concurrency cluster. Job clusters are typicall...

Author: Andrew · Last updated May 27, 2026

You are designing a folder structure for the files in an Azure Data Lake Storage Gen2 account. The account has one container that contains three years of data. You need to recommend a folder structure that meets the following requirements: * Supports partition elimination for queries by Azure Synapse Analytics serverless SQL pools * Supports f...

To meet the requirements for your Azure Data Lake Storage Gen2 account and ensure optimal querying and data retrieval, let's break down the requirements and analyze each folder structure option: Requirements Recap: 1. Partition Elimination for Queries: - Partitioning by year (`YYYY`), month (`MM`), or day (`DD`) is essential to support partition elimination for queries by Azure Synapse Analytics serverless SQL pools. 2. Fast Data Retrieval for Current Month: - To support fast data retrieval for the current month's data, a structure that makes the current month's data easily accessible and separate from other data is ideal. 3. Simplifies Data Security Management by Department: - Organizing data by department allows security policies to be applied easily to specific departments and their respective data. Option Analysis: A) DepartmentDataSourceYYYYMMDataFile_YYYYMMDD.parquet - Department and DataSource are placed at the top level, which makes it easier to apply security by department. - Year (`YYYY`) and Month (`MM`) subfolders enable partition elimination, which is helpful for queries filtering by time. - However, the data is stored at the day level (`YYYYMMDD`), which could result in excessive small files and may hinder efficient query performance. For partition elimination, it is usually better to partition at a coarser level (e.g., year or month) than at the day level. - This structure meets most of the requirements but may not be optimal for fast retrieval of the current month's data. B) DataSourceDepartmentYYYYMMDataFile_YYYYMMDD.parquet - DataSource and Department are reversed compared to option A, which might make applying security policies slightly more difficult as data security would now need to be managed based on the source first, then department. - The Year-Month (`YYYYMM`) structure is ideal for partition elimination since the queries are often filtered by year and month. - The day-level file naming (`YYYYMMDD`) still applies,...

Author: Stella · Last updated May 27, 2026

You have an Azure subscription that contains an Azure Synapse Analytics dedicated SQL pool named Pool1. Pool1 receives new data once every 24 hours. You have the following function. You have the following query. The query is executed once every 15 minutes and the @parameter value is set to the current date. You need to minimize the time it takes for the query...

To minimize the time it takes for the query to return results, we need to focus on the efficiency of the query execution process, considering how data is stored, indexed, and accessed. The following is an analysis of each option: A) Create an index on the avg_f column. - Explanation: Indexes help speed up query performance by allowing faster retrieval of rows based on the indexed columns. If the query frequently filters or sorts data based on the `avg_f` column, then creating an index on this column could speed up the query. - Consideration: However, indexes on columns that are not frequently queried or used in filtering/sorting won't provide significant performance benefits. You need to ensure that the `avg_f` column is used in the query's WHERE clause, JOIN conditions, or sorting operations to make this option worthwhile. - Conclusion: This could be beneficial if the `avg_f` column is heavily used in queries. B) Convert the avg_c column into a calculated column. - Explanation: A calculated column is derived from other columns and can be used for better performance if the calculation is complex or involves frequent computations. However, calculated columns are typically used when you have repetitive calculations that can be pre-computed, saving time during query execution. - Consideration: If the `avg_c` column is already being calculated in the query (for example, a formula or transformation), making it a calculated column can improve performance. However, this will not necessarily improve the overall performance if the query itself is not complex in terms of calculating the column. - Conclusion: This may not be the most impactful change if the column is already used in the query, and its calculation isn't complex enough to warrant conversion. C) Create an index on the sensorid column. - Explanation: If the `sensorid` column is used in the query frequently for filtering, grouping, or joining, creating an index on this column would likely speed up the query performance by reducing the need to scan the entire table to find matching rows. - Consideration: If `sensorid` is one of the most queried columns and used in WHER...

Author: Sophia Clark · Last updated May 27, 2026

You need to design a solution that will process streaming data from an Azure Event Hub and output the data to Azure Data Lake Storage. The solution must ensure that ana...

To design a solution that processes streaming data from Azure Event Hub and outputs the data to Azure Data Lake Storage, while ensuring analysts can interactively query the streaming data, we need to select a solution that supports real-time processing, data storage, and querying capabilities. Let’s evaluate each option based on these requirements: A) Azure Stream Analytics and Azure Synapse notebooks - Explanation: Azure Stream Analytics is a real-time data stream processing service that integrates well with Event Hub and can output data to Azure Data Lake Storage. It can process streaming data, perform real-time analytics, and output the results to various storage options, including Azure Data Lake Storage. Additionally, Azure Synapse Analytics notebooks can be used to interactively query and analyze the data once it's ingested into Data Lake Storage. - Consideration: This solution fits the requirements perfectly since Azure Stream Analytics handles real-time data ingestion, processing, and output to Data Lake, while Synapse notebooks provide an interactive query interface for analysts. - Conclusion: This is the optimal choice because it combines real-time data streaming, storage in Data Lake, and interactive querying for analysts. B) Structured Streaming in Azure Databricks - Explanation: Azure Databricks supports structured streaming, which can process data from Azure Event Hub and output it to Azure Data Lake Storage. Databricks also provides notebooks for interactive querying and analysis. While this option is powerful and flexible, it requires more setup and management compared to Azure Stream Analytics. Databricks is ideal for complex analytics, custom transformations, and advanced machine learning models. - Consideration: While Databricks is a viable solution for processing and querying streaming data, it may be an over-engineered solution if the use case is simpler (i.e., just processing and outputting streaming data with interactive querying). Databricks also ty...

Author: NightmareDragon2025 · Last updated May 27, 2026

You are creating an Apache Spark job in Azure Databricks that will ingest JSON-formatted data. You need to convert a nested JSON string into a DataFrame that w...

When dealing with nested JSON data in Apache Spark, you need to use functions that help you work with and transform structured data into a DataFrame with multiple rows, especially when the data is nested and requires flattening or unwrapping. Let's evaluate each option: A) `explode` - Explanation: The `explode` function is used to transform an array or a map into individual rows. If your JSON string contains an array or a nested structure that you want to expand into multiple rows, `explode` is the appropriate function to use. It flattens the nested array structure into individual rows, where each row corresponds to one element of the array. - Consideration: This function is perfect for flattening nested arrays into rows. If the nested JSON contains arrays, `explode` will generate multiple rows from those array elements. - Conclusion: This is the correct choice because it is specifically designed to handle nested arrays in JSON data and convert them into separate rows in the DataFrame. B) `filter` - Explanation: The `filter` function is used to filter data based on certain conditions or predicates. It is not designed for expanding or transforming nested structures. - Consideration: While `filter` can be used to eliminate rows based on conditions, it doesn't help in converting a nested structure into multiple rows. I...

Author: Nathan · Last updated May 27, 2026

DRAG DROP - You have an Azure subscription that contains an Azure Databricks workspace. The workspace contains a notebook named Notebook1. In Notebook1, you create an Apache Spark DataFrame named df_sales that contains the following columns: * Customer * SalesPerson * Region * Amount You need to identify the three top performing salespersons by amount for a region named HQ. How should you complete the query? To answer, drag the appropriate values to the...

Author: Ahmed · Last updated May 27, 2026

You need to schedule an Azure Data Factory pipeline to execute when a new file arrives in an Azure Data Lake Storage Gen2...

To schedule an Azure Data Factory pipeline to execute when a new file arrives in an Azure Data Lake Storage Gen2 container, the appropriate trigger type needs to be chosen based on the event-driven nature of the task (i.e., the pipeline should trigger as soon as a new file arrives, rather than being scheduled at fixed times). Let's evaluate each option: A) On-demand - Explanation: An on-demand trigger allows you to manually start the pipeline whenever needed, but it does not automatically trigger based on events or schedules. It requires a user to initiate the execution. - Consideration: This is not suitable for this scenario because the pipeline needs to be triggered automatically when a new file arrives, without manual intervention. - Conclusion: This is not the correct choice for an event-driven execution. B) Tumbling window - Explanation: A tumbling window trigger allows a pipeline to run at regular intervals based on a defined window. It ensures that the pipeline is executed over fixed, non-overlapping time intervals. - Consideration: While tumbling window triggers are useful for periodic batch processing (e.g., processing data every hour or every day), they are not event-based. This trigger would run on a schedule rather than reacting to a new file arriving. - Conclusion: This is n...

Author: Olivia · Last updated May 27, 2026

DRAG DROP - You have a project in Azure DevOps that contains a repository named Repo1. Repo1 contains a branch named main. You create a new Azure Synapse workspace named Workspace1. You need to create data processing pipelines in Workspace1. The solution must meet the following requirements: * Pipeline artifacts must be stored in Repo1 * Source control must be provided for pipeline artifacts. * All development must be performed in a feature bra...

Author: John · Last updated May 27, 2026

You have an Azure subscription that contains an Azure SQL database named DB1 and a storage account named storage1. The storage1 account contains a file named File1.txt. File1.txt contains the names of selected tables in DB1. You need to use an Azure Synapse pipeline to copy data from the selected tables in DB1 to the files in storage1. The solution must meet the following requirements: * The Copy activity in the pipeline must be parameterized to use the data in File1.txt to identify the source and destination of the copy. * C...

In this scenario, we are tasked with copying data from selected tables in an Azure SQL Database (DB1) to files in Azure Storage (storage1), based on the contents of a file (File1.txt). The solution must meet the requirements of parameterizing the Copy activity and enabling parallel execution. Let’s evaluate each option: A) Get Metadata - Explanation: The Get Metadata activity is used to retrieve metadata about a file or folder in a data store, such as file size, file type, or the number of rows in a dataset. This could be useful for obtaining information about the file that lists the table names (File1.txt), but it does not directly help in reading the file's content or executing parallel activities. - Consideration: Although Get Metadata could be useful for checking if the file exists or obtaining other details about the file, it is not directly useful for extracting data from File1.txt or for managing parallel execution of copy activities. - Conclusion: This option is not directly applicable for parameterizing the Copy activity or enabling parallelism. B) Lookup - Explanation: The Lookup activity is used to retrieve data from a data source (such as a SQL table or file) and return it as a result, typically in the form of a dataset. In this scenario, you could use the Lookup activity to read the contents of File1.txt, which contains the names of the selected tables in DB1. - Consideration: The Lookup activity can be parameterized to read the file and return a list of table names. This list could then be used to drive the Copy activity for each table. The Lookup activity is useful for getting the table names that need to be copied and parameterizing the pipe...

Author: Aarav · Last updated May 27, 2026

You have an Azure data factory that connects to a Microsoft Purview account. The data factory is registered in Microsoft Purview. You update a Data Factory pipeline. You need to ensure that...

To ensure that the updated lineage is available in Microsoft Purview after updating an Azure Data Factory pipeline, the correct sequence of actions needs to be taken. Let's evaluate each option: A) Disconnect the Microsoft Purview account from the data factory. - Explanation: Disconnecting the Microsoft Purview account from Azure Data Factory would stop the lineage and metadata integration between the two services. This would be counterproductive as you need the connection intact to ensure lineage is captured and updated correctly. - Consideration: Disconnecting the Purview account would disrupt the integration and the ability to capture any updated lineage from the Data Factory pipeline. - Conclusion: This option is not a valid solution since it would break the integration rather than update the lineage. B) Execute the pipeline. - Explanation: When you execute an Azure Data Factory pipeline, the activities in the pipeline are triggered, and the actual data movement or transformations occur. Once the pipeline execution is completed, Azure Data Factory will automatically update the associated lineage information in Microsoft Purview based on the actions carried out by the pipeline. - Consideration: Executing the pipeline ensures that the actual operations (such as data movement or transformation) are performed and that the lineage of these operations is captured and pushed to Microsoft Purview. - Conclusion: This is the correct ch...

Author: Emma · Last updated May 27, 2026

You have a Microsoft Purview account. The Lineage view of a CSV file is shown in the following exhibit. ...

In Microsoft Purview, data lineage is typically populated based on various activities like data movement, transformations, and scans. Let's evaluate each of the options: A) Manually - Explanation: Data lineage is generally not populated manually in Purview. While manual metadata entry can be performed in certain cases, lineage data is usually generated automatically by tracking how data flows through the system and where it moves. - Consideration: Manually creating lineage would be labor-intensive and error-prone, especially when dealing with complex data flows. Lineage should be automatically tracked through data processing and transformation actions. - Conclusion: This option is unlikely to be the correct one, as lineage is typically generated through automated processes, not manual entry. B) By scanning data stores - Explanation: Scanning data stores is a common way to populate metadata and lineage information in Microsoft Purview. When Purview scans data sources like databases, data lakes, or file storage systems, it collects metadata and automatically tracks data movement and transformations, which contribute to the lineage view. - Consideration: Scanning data stores allows Purview to track where data originates, how it’s transformed, an...

Author: Charlotte · Last updated May 27, 2026

You have an Azure subscription that contains a Microsoft Purview account named MP1, an Azure data factory named DF1, and a storage account named storage1. MP1 is configured to scan storage1. DF1 is connected to MP1 and contains a dataset named DS1. DS1 references a file in storage1. In DF1, you plan to create a pipeline that will process data from DS1. You need to review the schema and lineage information in MP1 for the...

In this scenario, the goal is to review the schema and lineage information in Microsoft Purview (MP1) for data referenced by DS1, a dataset in Azure Data Factory (DF1) that points to a file in storage1. The key requirement is to locate the schema and lineage information related to the data from storage1, which is scanned by Purview. Let's evaluate the options one by one: A) The search bar in the Microsoft Purview governance portal - Explanation: Microsoft Purview's governance portal provides a comprehensive search feature that allows users to find assets, schemas, and lineage information. Since Purview is connected to DF1 and scans storage1, the search functionality in the Purview governance portal allows you to locate the schema and lineage for the data referenced by DS1, as it scans and catalogs all relevant metadata. - Consideration: This is the primary tool for finding schema and lineage data in Purview, as it is specifically designed to handle metadata search, lineage exploration, and asset discovery within the Purview ecosystem. - Conclusion: This is an appropriate and valid method for locating the schema and lineage information for DS1. B) The Storage browser of storage1 in the Azure portal - Explanation: The Storage browser in the Azure portal allows you to view and interact with the contents of storage1, such as browsing files and directories. However, it doesn't provide the lineage or schema information for the data within the files. This functionality is more focused on file management rather than data governance or metadata tracking. - Consideration: While you can see the file in storage1, it does not offer schema or lineage information related to how the data is used, t...

Author: Emma Brown · Last updated May 27, 2026

HOTSPOT - You have an Azure Blob storage account that contains a folder. The folder contains 120,000 files. Each file contains 62 columns. Each day, 1,500 new files are added to the folder. You plan to incrementally load five data columns from each new file into an Azure Synapse Analytics workspace. You need to minimize how long it takes to perform the incremental loads. What...

Author: ThunderBear · Last updated May 27, 2026

DRAG DROP - You are batch loading a table in an Azure Synapse Analytics dedicated SQL pool. You need to load data from a staging table to the target table. The solution must ensure that if an error occurs while loading the data to the target table, all the inserts in that batch are undone. How should you complete the Transact-SQL code? To answer, drag the appropriate values to the correct targets. Each va...

Author: FrozenWolf2022 · Last updated May 27, 2026

HOTSPOT - You have two Azure SQL databases named DB1 and DB2. DB1 contains a table named Table1. Table1 contains a timestamp column named LastModifiedOn. LastModifiedOn contains the timestamp of the most recent update for each individual row. DB2 contains a table named Watermark. Watermark contains a single timestamp column named WatermarkValue. You plan to create an Azure Data Factory pipeline that will incrementally upload into Azure Blob Storage all the rows in Table1 for which the LastModifiedOn column contains a timestamp newer than the most recent value of the WatermarkValue column in Watermark. You need to identify which activities to include in the pipeline. The solution must meet the follow...

Author: Ming · Last updated May 27, 2026

HOTSPOT - You have an Azure Synapse serverless SQL pool. You need to read JSON documents from a file by using the OPENROWSET function. How should you complete the query? To answer, select the app...

Author: Ming88 · Last updated May 27, 2026

You use Azure Data Factory to create data pipelines. You are evaluating whether to integrate Data Factory and GitHub for source and version control. What are two advantages of the integration? Each correc...

When evaluating whether to integrate Azure Data Factory with GitHub for source and version control, there are several advantages that come with using version control. Let's review each option: A) Additional triggers - Explanation: The integration of Azure Data Factory with GitHub primarily focuses on version control and source management for pipeline code, datasets, and other resources. It doesn't directly influence the creation or management of triggers. - Consideration: The integration with GitHub is primarily for source control, and triggers (like scheduled or event-based triggers) are configured within Data Factory itself, not via version control. GitHub integration does not add or modify triggers in pipelines. - Conclusion: This option is not relevant to the integration with GitHub, as it pertains more to pipeline scheduling and execution rather than version control. B) Lower pipeline execution times - Explanation: GitHub integration does not directly affect the performance or execution time of pipelines in Azure Data Factory. The execution times are influenced by factors like data volume, resource allocation, and pipeline design, not by version control integration. - Consideration: While GitHub helps manage the pipeline code and configurations, it does not impact the execution performance or speed of those pipelines. - Conclusion: This option is incorrect because version control via GitHub does not influence pipeline execution times. C) The ability to save without publishing - Explanation: ...

Author: Samuel · Last updated May 27, 2026

DRAG DROP - You have an Azure Synapse Analytics workspace named Workspace1. You perform the following changes: * Implement source control for Workspace1. * Create a branch named Feature based on the collaboration branch. * Switch to the Feature branch. * Modify Workspace1. You need to publish the changes to Azure Synapse. From which...

Author: Sophia Clark · Last updated May 27, 2026

You have two Azure Blob Storage accounts named account1 and account2. You plan to create an Azure Data Factory pipeline that will use scheduled intervals to replicate newly created or modified blobs from account1 to account2. You need to recommend a solution to implement the pipeline. The solution must meet the following requirements: * Ensure that the pipeline only...

To address the need to replicate only newly created or modified blobs from account1 to account2 using Azure Data Factory (ADF), let's evaluate each option based on the requirements and the efficiency of implementation: A) Run the Copy Data tool and select Metadata-driven copy task - Explanation: The Metadata-driven copy task in Copy Data tool leverages metadata from source and destination to manage data replication. It automatically detects changes based on metadata like last modified timestamps, which fits the requirement of copying only blobs created or modified since the last replication. - Consideration: This option minimizes the effort in pipeline creation because it leverages the built-in functionality of Metadata-driven copy, which automatically identifies and transfers only changed or new blobs, matching the requirement of minimizing effort. Additionally, this approach does not require writing complex code or custom activities. - Conclusion: This is a suitable solution as it fulfills both the requirement of copying only changed or new blobs and minimizing pipeline creation effort. B) Create a pipeline that contains a Data Flow activity - Explanation: Data Flow activities in ADF are typically used for complex transformations of data, not specifically for replicating files based on metadata changes. Data Flows allow transformations and business logic, but they are overkill for simply copying blobs based on modification timestamps. - Consideration: This solution would involve additional complexity and overhead in terms of designing and configuring the data flow for something as simple as file rep...

Author: Ryan · Last updated May 27, 2026

You have an Azure Data Factory pipeline named pipeline1 that contains a data flow activity named activity1. You need to run pi...

When running a Data Flow activity in Azure Data Factory, the runtime used to execute the activity depends on the type of activity and where the data resides. Let's evaluate each runtime option based on this scenario: A) Azure Integration runtime - Explanation: The Azure Integration Runtime (IR) is the default runtime for Data Flow activities in Azure Data Factory. It is used to execute data flows that are hosted on Azure and interact with cloud-based data sources. This runtime is managed by Azure and provides the necessary compute resources for executing the data flow activity in a fully managed environment. - Consideration: Since you are running a data flow activity, and Data Flows are typically processed using the Azure IR (unless you're using a self-hosted IR for on-premises data), the Azure Integration Runtime is the appropriate and most common runtime for this scenario. - Conclusion: This is the correct answer, as it is specifically designed to execute Data Flow activities in Azure Data Factory. B) Self-hosted integration runtime - Explanation: The Self-hosted Integration Runtime is primarily used for hybrid data movement, where data is transferred between on-premises data sources and Azure. It is also used fo...

Author: Elizabeth · Last updated May 27, 2026

HOTSPOT - You have an Azure subscription that contains an Azure Synapse Analytics workspace named workspace1. Workspace1 contains a dedicated SQL pool named SQLPool1 and an Apache Spark pool named sparkpool1. Sparkpool1 contains a DataFrame named pyspark_df. You need to write the contents of pyspark_df to a table in SQLPool1 by using a PySpark notebo...

Author: Amelia · Last updated May 27, 2026

You have an Azure data factory named ADF1 and an Azure Synapse Analytics workspace that contains a pipeline named SynPipeLine1. SynPipeLine1 includes a Notebook activity. You create a pipeline in ADF1 named ADFPipeline1....

To invoke SynPipeLine1 from ADFPipeline1 in Azure Data Factory (ADF1), let's analyze the available options: A) Web - Explanation: The Web activity in Azure Data Factory is typically used for invoking external services via HTTP requests, such as REST APIs. It is not specifically designed for invoking other Azure Data Factory pipelines or Synapse pipelines. - Consideration: The Web activity is useful for interacting with external services or endpoints, but it is not the right choice for invoking another Azure Synapse Analytics pipeline or another ADF pipeline. - Conclusion: This option is not appropriate, as it's designed for HTTP-based external integrations, not for invoking pipelines within ADF or Synapse. B) Spark - Explanation: The Spark activity is used to execute Apache Spark jobs (such as notebooks or JAR files) within Azure Data Factory or Azure Synapse Analytics. This activity is meant for running Spark-based jobs but is not used for invoking or triggering other pipelines. - Consideration: Since you want to invoke a pipeline and not run a Spark job, the Spark activity is not suitable for this task. - Conclusion: This option is not relevant for triggering another pipeline. C) Custom - Explanation: The Custom activity in Azure Data Factory is used when you need to run custom code in a specific environment, such as running an Azure function or custom logic within a container. - Consideration: While Custom activities are powerful for custom tasks, invoking another pipeline (like SynPipeLine1) doesn't require custom logic and can be done through a simpler mechanism, l...

Author: Rohan · Last updated May 27, 2026

HOTSPOT - You have an Azure data factory that contains the linked service shown in the following exhibit. Use the drop-down menus to select the answer choice that completes each statement based on ...

Author: Ethan · Last updated May 27, 2026

HOTSPOT - In Azure Data Factory, you have a schedule trigger that is scheduled in Pacific Time. Pacific Time observes daylight saving time. The trigger has the following JSON file. Use the drop-down menus to select the answer choice that c...

Author: Amira · Last updated May 27, 2026

You have an Azure Synapse Analytics dedicated SQL pool. You need to create a pipeline that will execute a stored procedure in the dedicated SQL pool and use the returned result set as the input for a downstream activity. The s...

To create a pipeline in Azure Synapse Analytics dedicated SQL pool that executes a stored procedure and uses the returned result set as the input for a downstream activity, let's evaluate each option: A) U-SQL - Explanation: U-SQL is a data processing language used primarily in Azure Data Lake Analytics (which has now been deprecated) for processing big data. It is not directly related to the Azure Synapse Analytics dedicated SQL pool or executing stored procedures. - Consideration: U-SQL would not be suitable for executing stored procedures in a dedicated SQL pool and retrieving result sets. - Conclusion: This option is not relevant to your requirements. B) Stored Procedure - Explanation: The Stored Procedure activity in Azure Data Factory and Azure Synapse Analytics allows you to execute stored procedures in Azure Synapse SQL pools (formerly SQL Data Warehouse). The stored procedure can return result sets, and the result can be passed on to downstream activities in the pipeline. - Consideration: This is the best fit for the requirement. The Stored Procedure activity in a pipeline can directly execute a stored procedure and can handle result sets for downstream activities. It's a simple and native solution for this use case. - Conclusion: This is the correct and most efficient choice, as it allows for minimal development effort while meeting the requirements. ...

Author: Ming · Last updated May 27, 2026

You have an Azure SQL database named DB1 and an Azure Data Factory data pipeline named pipeline1. From Data Factory, you configure a linked service to DB1. In DB1, you create a stored procedure named SP1. SP1 returns a single row of data that has four columns. You need to add an activity to pipeline1 to execute SP1. The solution must ensure that the values in the columns are stored as pipeline...

In the scenario where we need to execute a stored procedure (SP1) from Azure Data Factory (ADF) and store the result of the execution (i.e., the values of the columns in the returned row) as pipeline variables, let's evaluate each of the available options: A) Script - Explanation: The Script activity in Azure Data Factory allows the execution of custom SQL scripts, including stored procedures. While the Script activity can run SP1, it does not have a straightforward mechanism to automatically store the result set of the stored procedure as pipeline variables. - Consideration: To extract the result set and store it as pipeline variables, additional scripting or processing would be needed. This increases complexity compared to other options. - Conclusion: This option can technically execute the stored procedure, but it's not the most efficient or recommended solution for storing the result set as pipeline variables. B) Copy - Explanation: The Copy activity is used to copy data from a source to a destination. It’s typically used for moving data between different data stores, not for executing stored procedures. The Copy activity does not support executing stored procedures directly or extracting and storing results as pipeline variables. - Consideration: The Copy activity is not designed for interacting with a result set from a stored procedure or manipulating pipeline variables. - Conclusion: This option is...

Author: Arjun · Last updated May 27, 2026

You have an Azure data factory named ADF1. You currently publish all pipeline authoring changes directly to ADF1. You need to implement version control for the changes made to pipeline artifacts. The solution must ensure that you can apply version control to the resources currently defined in the Azure Data Factory Studio for ADF1...

To implement version control for the changes made to pipeline artifacts in Azure Data Factory (ADF1), you need to set up a proper version control mechanism and configure it to integrate with the current Azure Data Factory environment. Let's evaluate each of the provided options: A) From the Azure Data Factory Studio, run Publish All. - Explanation: The Publish All option in Azure Data Factory Studio is used to publish pipeline changes and other artifacts to the Data Factory service. However, this does not provide version control; it simply pushes the changes to the live environment. - Consideration: While this action publishes the changes, it does not introduce version control. Therefore, this is not a valid option for implementing version control. - Conclusion: This option is not relevant for implementing version control. B) Create an Azure Data Factory trigger. - Explanation: Triggers in Azure Data Factory are used to schedule and automate the execution of pipelines. They do not relate to version control in any way. - Consideration: Triggers help in orchestrating when a pipeline runs, but they don't manage version control or source code management. - Conclusion: This option is irrelevant for version control purposes. C) Create a Git repository. - Explanation: Creating a Git repository is a crucial step in setting up version control. Azure Data Factory integrates with Git repositories such as GitHub, Azure Repos, or Bitbucket to manage versions of pipeline artifacts. - Consideration: The creation of a Git repository is a critical part of setting up version control, as it will store all the versions of the pipeline artifacts and facilitate collaboration and version tracking. - Conclusion: This is an essential action to implement...

Author: CrimsonViperX · Last updated May 27, 2026

You have an Azure data factory named ADF1 that contains a pipeline named Pipeline1. Pipeline1 must execute every 30 minutes with a 15-minute offset. You need to create a trigger for Pipeline1. The trigger must meet the following requirements: * Backfill data from the beginning of the day to the current time. * If Pipeline1 fails, ensure that the pipeline can re-execute within the same 30-...

To solve this problem, let's break down the requirements: 1. Backfill data from the beginning of the day to the current time: This means the trigger should have the capability to run from a specific start time (e.g., the start of the day) and continue executing at regular intervals. 2. Re-execute within the same 30-minute period if the pipeline fails: This means the pipeline needs to be re-triggered in case of failure during the execution window, ensuring that it doesn't miss the 30-minute window. 3. Ensure that only one concurrent pipeline execution can occur: This suggests that there should be no overlapping executions of the pipeline. We need to manage concurrency. 4. Minimize development and configuration effort: The solution should require minimal effort for setup, configuration, and maintenance. Now, let's analyze each trigger option: A) Schedule Trigger - A schedule trigger allows you to specify a recurring time-based schedule (e.g., every 30 minutes). - However, this type of trigger does not provide built-in backfilling from the start of the day and does not address the requirement to re-trigger the pipeline in case of failure. - Also, it can lead to multiple concurrent executions unless configured carefully, which may not meet the requirement of only allowing one concurrent execution. B) Event-based Trigger - An event-based trigger is used to trigger pipelines based on events like the arrival of a file in a storage location. - While event-based triggers are useful for certain scenarios, they do not meet the requirements rela...

Author: Isabella1 · Last updated May 27, 2026

You have an Azure Data Lake Storage Gen2 account named account1 and an Azure event hub named Hub1. Data is written to account1 by using Event Hubs Capture. You plan to query account by using an Apache Spark pool in Azure Synapse Analytics. You need to create a notebook and ingest the data from account1. The solution must meet the following requirements: ...

To meet the requirements for querying data from Azure Data Lake Storage Gen2 and ingesting it into an Apache Spark pool in Azure Synapse Analytics, let's break down the factors involved in selecting the right data format: Requirements: 1. Retrieve multiple rows of records in their entirety: This means we want a data format that allows for efficient reading of multiple records, ideally in a way that optimizes Spark processing. 2. Minimize query execution time: The data format should be optimized for fast read operations, minimizing the time it takes to execute queries. 3. Minimize data processing: The data format should require minimal transformations and decoding during query execution, reducing the need for processing. Analysis of Data Formats: A) Parquet - ORC, Avro - Parquet is a columnar storage format, highly optimized for reading and querying data in big data environments, such as with Spark and Synapse Analytics. It allows for efficient compression and minimizes disk I/O by storing data in columnar format, which significantly speeds up query execution and reduces data processing overhead. - ORC (Optimized Row Columnar) is also a columnar format that provides similar benefits to Parquet. It has more optimized compression techniques and is widely used in the Hadoop ecosystem. - Avro is a row-based storage format that is more suitable for scenarios where data schema evolution and row-level access are critical. While Avro is compact and fast for row-level access, it doesn’t provide the same level of performance and query optimization for large-scale data analysis like columnar formats (Parquet or ...

Author: Daniel · Last updated May 27, 2026

You have an Azure Blob Storage account named blob1 and an Azure Data Factory pipeline named pipeline1. You need to ensure that pipeline1 runs when a file is deleted from a container in blob1. Th...

To meet the requirement of triggering Azure Data Factory pipeline1 when a file is deleted from a container in Azure Blob Storage, we need to choose the appropriate trigger based on the scenario and minimize development effort. Analysis of Available Trigger Options: A) Schedule Trigger - A schedule trigger runs the pipeline at specified intervals (e.g., daily, hourly). This type of trigger is not suitable for the specific requirement of responding to file deletion events in real time. - It would not minimize the development effort in this case because it would require checking the file deletion status periodically, which is inefficient and doesn't directly respond to the deletion event. - Rejected: Not event-driven and inefficient for this scenario. B) Storage Event Trigger - Storage event triggers are designed specifically to trigger actions in response to events in Azure Storage, such as file creation, modification, or deletion. - This trigger can listen for Blob storage events like BlobDeleted events, which are generated when a file is deleted from a container in Blob Storage. - This is a perfect fit for the requirement because it directly responds to the event of file deletion without the need for polling or manual checks. - Selected: I...

Author: Zain · Last updated May 27, 2026

HOTSPOT - You have Azure Data Factory configured with Azure Repos Git integration. The collaboration branch and the publish branch are set to the default values. You have a pipeline named pipeline1. You build a new version of pipeline1 in a branch named feature1. From the Data Factory Studio, you select Publish. The source code of which branch will be built, and which branch will cont...

Author: FrozenWolf2022 · Last updated May 27, 2026

DRAG DROP - You have an Azure subscription that contains an Azure data factory. You are editing an Azure Data Factory activity JSON. The script needs to copy a file from Azure Blob Storage to multiple destinations. The solution must ensure that the source and destination files have consistent folder paths. How should you complete the script? To answer, drag the appropriate values to the correct targets. Each...

Author: Noah · Last updated May 27, 2026

You are building a data flow in Azure Data Factory that upserts data into a table in an Azure Synapse Analytics dedicated SQL pool. You need to add a transformation to the data flow. The transformation must specify logic indicating when a row from th...

In this scenario, where you need to add logic to specify when a row from the input data must be upserted into a sink (Azure Synapse Analytics dedicated SQL pool), the appropriate transformation type is Alter Row. Reasoning: A) Join: - The Join transformation is typically used to combine data from two or more sources based on a common key. While joins are essential for combining datasets, they do not inherently provide a mechanism to control upsert behavior. Therefore, this option is not relevant for upserting rows into a sink. - Use case: Typically used when combining multiple datasets, not for conditional upsert logic. B) Alter Row: - The Alter Row transformation is specifically designed to handle row-level operations such as insert, update, and delete. This transformation allows you to apply conditional logic based on the values of the incoming data, defining whether a row should be inserted, updated, or ignored based on your requirements. - For an upsert scenario, you can use Alter Row to define the conditions under which data should be inserted or updated in the sink. This gives you full control over the upsert logic, making it the id...

Author: StarryEagle42 · Last updated May 27, 2026

You have an on-premises database named db1 and a set-hosted integration runtime. You have an Azure subscription that contains an Azure Data Lake Storage account named dl1. You need to develop four data pipeline projects that will use Microsoft Power Query to copy data from db1 to dl1. The solution must meet the following requirements: * All pipelines must use th...

In this case, the most appropriate option to meet the requirements is C) Azure Data Factory. Reasoning: A) Azure Synapse Analytics: - Azure Synapse Analytics is a comprehensive analytics service that integrates big data and data warehousing. While it can work with on-premises data sources and copy data to various Azure destinations, its main strength is in data analytics and data warehousing, rather than in building data pipeline projects specifically for Power Query integration. - Synapse may not be the most suitable solution for simple data movement and transformation tasks, especially when you need separate Git repositories for each project. - Use case: Typically used for big data analytics and data warehousing scenarios. B) Azure Logic Apps: - Azure Logic Apps is used for building workflows that automate tasks and integrate services. While it is excellent for orchestrating workflows between various services (e.g., triggering actions based on events), it is not primarily designed for creating data pipelines that involve copying and transforming data from an on-premises database to Azure Data Lake Storage using Power Query. - Use case: Ideal for process automation and integrating different services but not for data transformation and storage in data lakes. C) Azure Data Factory: - Azure Data Factory is specifically designed for building data pipelines. It supports various transformations and can connect to both cloud and on-premises data sources. - Self-hosted integration runtime in Azure Data Factory allows you to securely connect to on-premises data sources like db1. - Power Query integ...

Author: Isabella · Last updated May 27, 2026

What Our Friends Say

What Our Friends Say

Microsoft Practice Questions, Discussions & Exam Topics by our Authors

You have an Azure Stream Analytics job. You need to ensure that the job has enough streaming units provisioned. You configure monitoring of the SU % Utilization metric. Which two additional metrics should you monitor? Eac...

You have an activity in an Azure Data Factory pipeline. The activity calls a stored procedure in a data warehouse in Azure Synapse Analytics and runs daily. You need to v...

You have an Azure Synapse Analytics job that uses Scala. You need to view the status of the job. W...

You have an Azure subscription that contains an Azure Synapse Analytics dedicated SQL pool named SQLPool1. SQLPool1 is currently paused. You need to restore the ...

You are designing an Azure Synapse Analytics workspace. You need to recommend a solution to provide double encryption of all the data at rest. Which two components should you include in the recommendation? Each co...

You have an Azure Synapse Analytics serverless SQL pool named Pool1 and an Azure Data Lake Storage Gen2 account named storage1. The AllowBlobPublicAccess property is disabled for storage1. You need to create an external data source that can be u...

You have an Azure Data Factory pipeline named pipeline1 that is invoked by a tumbling window trigger named Trigger1. Trigger1 has a recurrence of 60 minutes. You need to ensure that pipeline1 will execute only if the previo...

You need to design a solution that will process streaming data from an Azure Event Hub and output the data to Azure Data Lake Storage. The solution must ensure that ana...

You are creating an Apache Spark job in Azure Databricks that will ingest JSON-formatted data. You need to convert a nested JSON string into a DataFrame that w...

You need to schedule an Azure Data Factory pipeline to execute when a new file arrives in an Azure Data Lake Storage Gen2...

You have an Azure data factory that connects to a Microsoft Purview account. The data factory is registered in Microsoft Purview. You update a Data Factory pipeline. You need to ensure that...

You have a Microsoft Purview account. The Lineage view of a CSV file is shown in the following exhibit. ...

HOTSPOT - You have an Azure Synapse serverless SQL pool. You need to read JSON documents from a file by using the OPENROWSET function. How should you complete the query? To answer, select the app...

You use Azure Data Factory to create data pipelines. You are evaluating whether to integrate Data Factory and GitHub for source and version control. What are two advantages of the integration? Each correc...

You have an Azure Data Factory pipeline named pipeline1 that contains a data flow activity named activity1. You need to run pi...

You have an Azure data factory named ADF1 and an Azure Synapse Analytics workspace that contains a pipeline named SynPipeLine1. SynPipeLine1 includes a Notebook activity. You create a pipeline in ADF1 named ADFPipeline1....

HOTSPOT - You have an Azure data factory that contains the linked service shown in the following exhibit. Use the drop-down menus to select the answer choice that completes each statement based on ...

HOTSPOT - In Azure Data Factory, you have a schedule trigger that is scheduled in Pacific Time. Pacific Time observes daylight saving time. The trigger has the following JSON file. Use the drop-down menus to select the answer choice that c...

You have an Azure Synapse Analytics dedicated SQL pool. You need to create a pipeline that will execute a stored procedure in the dedicated SQL pool and use the returned result set as the input for a downstream activity. The s...

You have an Azure Blob Storage account named blob1 and an Azure Data Factory pipeline named pipeline1. You need to ensure that pipeline1 runs when a file is deleted from a container in blob1. Th...

You are building a data flow in Azure Data Factory that upserts data into a table in an Azure Synapse Analytics dedicated SQL pool. You need to add a transformation to the data flow. The transformation must specify logic indicating when a row from th...