
We offers microsoft 70 475. "Designing and Implementing Big Data Analytics Solutions", also known as 70-475 exam, is a Microsoft Certification. This set of posts, Passing the 70-475 exam with exam 70 475, will help you answer those questions. The exam 70 475 covers all the knowledge points of the real exam. 100% real microsoft 70 475 and revised by experts!
Online Microsoft 70-475 free dumps demo Below:
NEW QUESTION 1
You have an Apache Storm cluster.
You need to ingest data from a Kafka queue.
Which component should you use to consume data emitted from Kaka?
Answer: C
Explanation: To perform real-time computation on Storm, we create “topologies.” A topology is a graph of a computation, containing a network of nodes called “Spouts” and “Bolts.” In a Storm topology, a Spout is the source of data streams and a Bolt holds the business logic for analyzing and processing those streams.
The org.apache.storm.kafka.KafkaSpout component reads data from Kafka. Example:
References:
https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-apache-storm-with-kafka https://hortonworks.com/blog/storm-kafka-together-real-time-data-refinery/
NEW QUESTION 2
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
Your company has multiple databases that contain millions of sales transactions. You plan to implement a data mining solution to identity purchasing fraud.
You need to design a solution that mines 10 terabytes (TB) of sales data. The solution must meet the following requirements:
Run the analysis to identify fraud once per week.
Continue to receive new sales transactions while the analysis runs.
Be able to stop computing services when the analysis is NOT running. Solution: You create a Microsoft Azure HDlnsight cluster.
Does this meet the goal?
Answer: B
Explanation: HDInsight cluster billing starts once a cluster is created and stops when the cluster is deleted. Billing is pro-rated per minute, so you should always delete your cluster when it is no longer in use.
NEW QUESTION 3
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the states goals. Some question sets might have more than one correct solution, while the others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You have an Apache Spark system that contains 5 TB of data.
You need to write queries that analyze the data in the system. The queries must meet the following requirements:
Use static data typing.
Execute queries as quickly as possible.
Have access to the latest language features.
Solution: You write the queries by using Python.
Answer: B
NEW QUESTION 4
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions will not appear in the review screen.
You plan to deploy a Microsoft Azure SQL data warehouse and a web application.
The data warehouse will ingest 5 TB of data from an on-premises Microsoft SQL Server database daily. The web application will query the data warehouse.
You need to design a solution to ingest data into the data warehouse.
Solution: You use AzCopy to transfer the data as text files from SQL Server to Azure Blob storage, and then you use Azure Data Factory to refresh the data warehouse database.
Does this meet the goal?
Answer: B
NEW QUESTION 5
Your Microsoft Azure subscription contains several data sources that use the same XML schema. You plan to process the data sources in parallel.
You need to recommend a compute strategy to minimize the cost of processing the data sources. What should you recommend including in the compute strategy?
Answer: A
NEW QUESTION 6
A company named Fabricam, Inc, has a web app hosted in Microsoft Azure. Millions of users visit the app daily.
All of the user visits are logged in Azure Blob storage. Data analysts at Fabrikam built a dashboard that processes the user visit logs.
Fabrikam plans to use an Apache Hadoop cluster on Azure HDInsight to process queries. The queries will access the data only once.
You need to recommend a query execution strategy. What is the best to recommend using to achieve the goal?
More than one answer choice may achieve the goal. Select the BEST answer.
Answer: B
Explanation: File format versatility and Intelligent caching: Fast analytics on Hadoop have always come with one big catch: they require up-front conversion to a columnar format like ORCFile, Parquet or Avro, which is
time-consuming, complex and limits your agility.
With Interactive Query Dynamic Text Cache, which converts CSV or JSON data into optimized in-memory format on-the-fly, caching is dynamic, so the queries determine what data is cached. After text data is cached, analytics run just as fast as if you had converted it to specific file formats.
References:
https://azure.microsoft.com/en-us/blog/azure-hdinsight-interactive-query-simplifying-big-data-analytics-architec
NEW QUESTION 7
You have an analytics solution in Microsoft Azure that must be operationalized.
You have the relevant data in Azure Blob storage. You use an Azure HDInsight Cluster to process the data. You plan to process the raw data files by using Azure HDInsight. Azure Data Factory will operationalize the
solution.
You need to create a data factory to orchestrate the data movement. Output data must be written back to Azure Blob storage.
Which four actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.
Answer:
Explanation: 
NEW QUESTION 8
You are designing a partitioning scheme for ingesting real-time data by using Kafka. Kafka and Apache Storm will be integrated. You plan to use four event processing servers that each run as a Kafka consumer. Each server will have a two quad-core processor. You need to identify the minimum number of partitions required to ensure that the load is distributed evenly. How many should you identify?
Answer: B
NEW QUESTION 9
You have a web application that generates several terabytes (TB) of financial documents each day. The application processes the documents in batches.
You need to store the documents in Microsoft Azure. The solution must ensure that a user can restore the previous version of a document.
Which type of storage should you use for the documents?
Answer: A
NEW QUESTION 10
You have a data warehouse that contains the sales data of several customers.
You plan to deploy a Microsoft Azure data factory to move additional sales data to the data warehouse. You need to develop a data factory job that reads reference data from a table in the source data.
Which type of activity should you add to the control flow of the job?
Answer: B
Explanation: References:
https://docs.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity
NEW QUESTION 11
You are designing a data-driven data flow in Microsoft Azure Data Factory to copy data from Azure Blob storage to Azure SQL Database.
You need to create the copy activity.
How should you complete the JSON code? To answer, drag the appropriate code elements to the correct targets. Each element may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content
NOTE: Each correct selection is worth one point.
Answer:
Explanation: 
NEW QUESTION 12
You have a financial model deployed to an application named finance1. The data from the financial model is stored in several data files.
You need to implement a batch processing architecture for the financial model. You upload the data files and finance1 to a Microsoft Azure Storage account.
Which three components should you create in sequence next? To answer, move the appropriate components from the list of components to the answer area and arrange them in the correct order.
Answer:
Explanation: 
NEW QUESTION 13
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions
will not appear in the review screen.
You have a Microsoft Azure subscription that includes Azure Data Lake and Cognitive Services. An administrator plans to deploy an Azure Data Factory.
You need to ensure that the administrator can create the data factory. Solution: You add the user to the Owner role.
Does this meet the goal?
Answer: B
NEW QUESTION 14
Your company has two Microsoft Azure SQL databases named db1 and db2.
You need to move data from a table in db1 to a table in db2 by using a pipeline in Azure Data Factory. You create an Azure Data Factory named ADF1.
Which two types Of objects Should you create In ADF1 to complete the pipeline? Each correct answer presents part of the solution.
NOTE: Each correct selection is worth one point.
Answer: AD
Explanation: You perform the following steps to create a pipeline that moves data from a source data store to a sink data store:
Create linked services to link input and output data stores to your data factory.
Create datasets to represent input and output data for the copy operation.
Create a pipeline with a copy activity that takes a dataset as an input and a dataset as an output.
NEW QUESTION 15
You manage a Microsoft Azure HDInsight Hadoop cluster. All of the data for the cluster is stored in Azure Premium Storage.
You need to prevent all users from accessing the data directly. The solution must allow only the HDInsight service to access the data.
Which five actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.
Answer:
Explanation: 1. Create Shared Access Signature policy2. Save the SAS policy token, storage account name, and container name. These values are used when associating the storage account with your HDInsight cluster.3. Update property of core-site4. Maintenance mode5. Restart all
serviceshttps://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-storage-sharedaccesssignature-permissions
NEW QUESTION 16
Note: This question is part of a series of questions that present the same scenario. Each question in the series contains a unique solution that might meet the stated goals. Some question sets might have more than one correct solution, while others might not have a correct solution.
After you answer a question in this section, you will NOT be able to return to it. As a result, these questions
will not appear in the review screen.
You plan to deploy a Microsoft Azure SQL data warehouse and a web application.
The data warehouse will ingest 5 TB of data from an on-premises Microsoft SQL Server database daily. The web application will query the data warehouse.
You need to design a solution to ingest data into the data warehouse.
Solution: You use the bcp utility to export CSV files from SQL Server and then to import the files to Azure SQL Data Warehouse.
Does this meet the goal?
Answer: B
Explanation: If you need the best performance, then use PolyBase to import data into Azure SQL warehouse. References: https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-migrate-data
NEW QUESTION 17
You plan to deploy a storage solution to store the output of stream analytics. You plan to store the data for the following three types of data streams:
Unstructured JSON data
Exploratory analytics
Pictures
You need to implement a storage solution for the data stream types.
Which storage solution should you implement for each data stream type? To answer, drag the appropriate storage solutions to the correct data stream types. Each storage solution may be used once, more than once, or not at all. You may need to drag the split bar between the panes or scroll to view content.
NOTE: Each correct selection is worth one point.
Answer:
Explanation: Box 1: Azure Data Lake Store
Stream Analytics supports Azure Data Lake Store. Azure Data Lake Store is an enterprise-wide hyper-scale repository for big data analytic workloads. Data Lake Store enables you to store data of any size, type and ingestion speed for operational and exploratory analytics. Stream Analytics has to be authorized to access the Data Lake Store.
Box 2: Azure Cosmos DB
Stream Analytics can target Azure Cosmos DB for JSON output, enabling data archiving and low-latency queries on unstructured JSON data.
Box 3: Azure Blob Storage
Blob storage offers a cost-effective and scalable solution for storing large amounts of unstructured data in the cloud.
Incorrect Asnwers: Azure SQL Database:
Azure SQL Database can be used as an output for data that is relational in nature or for applications that depend on content being hosted in a relational database. Stream Analytics jobs write to an existing table in an Azure SQL Database.
Azure Service Bus Queue:
Service Bus Queues offer a First In, First Out (FIFO) message delivery to one or more competing consumers. Typically, messages are expected to be received and processed by the receivers in the temporal order in which they were added to the queue, and each message is received and processed by only one message consumer.
Azure Table Storage
Azure Table storage offers highly available, massively scalable storage, so that an application can automatically scale to meet user demand. Table storage is Microsoft’s NoSQL key/attribute store, which one can leverage for structured data with fewer constraints on the schema. Azure Table storage can be used to store data for persistence and efficient retrieval.
References: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-define-outputs
Thanks for reading the newest 70-475 exam dumps! We recommend you to try the PREMIUM Certleader 70-475 dumps in VCE and PDF here: https://www.certleader.com/70-475-dumps.html (102 Q&As Dumps)