0% 0 votes, 0 avg 16 123456789101112131415161718192021222324252627282930 This quiz randomly generates 30 questions as asked in AWS Certified Machine Learning - Specialty (MLS-C01) Congratulations! AWS Certified Machine Learning - Specialty (MLS-C01) This quiz randomly generates 30 questions (in 60 mins) as asked in AWS Certified Machine Learning - Specialty (MLS-C01). The real MLS-C01 test has 65 questions and a total time of 180 minutes. Of these, 15 questions are underlined, and only 50 questions are scored. This test randomly generates 30 questions from our question bank. For best results, practice multiple times until you achieve 100% accuracy. 1 / 30 A Machine Learning Specialist is packaging a custom ResNet model into a Docker container so the company can leverage Amazon SageMaker for training. The Specialist is using Amazon EC2 P3 instances to train the model and needs to properly configure the Docker container to leverage the NVIDIA GPUs. What does the Specialist need to do? Bundle the NVIDIA drivers with the Docker image. Build the Docker container to be NVIDIA-Docker compatible. Organize the Docker container's file structure to execute on GPU instances. Set the GPU flag in the Amazon SageMaker CreateTrainingJob request body. 2 / 30 When submitting Amazon SageMaker training jobs using one of the built-in algorithms, which common parameters MUST be specified? (Choose three.) The training channel identifying the location of training data on an Amazon S3 bucket. The validation channel identifying the location of validation data on an Amazon S3 bucket. The IAM role that Amazon SageMaker can assume to perform tasks on behalf of the users. Hyperparameters in a JSON array as documented for the algorithm used. The Amazon EC2 instance class specifying whether training will be run using CPU or GPU. The output path specifying where on an Amazon S3 bucket the trained model will persist. 3 / 30 Machine Learning Specialist is building a model to predict future employment rates based on a wide range of economic factors. While exploring the data, the Specialist notices that the magnitude of the input features vary greatly. The Specialist does not want variables with a larger magnitude to dominate the model. What should the Specialist do to prepare the data for model training? Apply quantile binning to group the data into categorical bins to keep any relationships in the data by replacing the magnitude with distribution. Apply the Cartesian product transformation to create new combinations of fields that are independent of the magnitude. Apply normalization to ensure each field will have a mean of 0 and a variance of 1 to remove any significant magnitude. Apply the orthogonal sparse bigram (OSB) transformation to apply a fixed-size sliding window to generate new features of a similar magnitude. 4 / 30 A Machine Learning Specialist must build out a process to query a dataset on Amazon S3 using Amazon Athena. The dataset contains more than 800,000 records stored as plaintext CSV files. Each record contains 200 columns and is approximately 1.5 MB in size. Most queries will span 5 to 10 columns only. How should the Machine Learning Specialist transform the dataset to minimize query runtime? Convert the records to Apache Parquet format. Convert the records to JSON format. Convert the records to GZIP CSV format. Convert the records to XML format. Using compressions will reduce the amount of data scanned by Amazon Athena, and also reduce your S3 bucket storage. It’s a Win-Win for your AWS bill. Supported formats: GZIP, LZO, SNAPPY (Parquet) and ZLIB. 5 / 30 A company wants to classify user behavior as either fraudulent or normal. Based on internal research, a Machine Learning Specialist would like to build a binary classifier based on two features: age of account and transaction month. The class distribution for these features is illustrated in the figure provided. Based on this information, which model would have the HIGHEST accuracy? Long short-term memory (LSTM) model with scaled exponential linear unit (SELU) Logistic regression Support vector machine (SVM) with non-linear kernel Single perceptron with tanh activation function 6 / 30 A Machine Learning Specialist at a company sensitive to security is preparing a dataset for model training. The dataset is stored in Amazon S3 and contains Personally Identifiable Information (PII). The dataset: Must be accessible from a VPC only. Must not traverse the public internet. How can these requirements be satisfied? Create a VPC endpoint and apply a bucket access policy that restricts access to the given VPC endpoint and the VPC. Create a VPC endpoint and apply a bucket access policy that allows access from the given VPC endpoint and an Amazon EC2 instance. Create a VPC endpoint and use Network Access Control Lists (NACLs) to allow traffic between only the given VPC endpoint and an Amazon EC2 instance. Create a VPC endpoint and use security groups to restrict access to the given VPC endpoint and an Amazon EC2 instance 7 / 30 A large consumer goods manufacturer has the following products on sale: 1. 34 different toothpaste variants 2. 48 different toothbrush variants 3. 43 different mouthwash variants The entire sales history of all these products is available in Amazon S3. Currently, the company is using custom-built autoregressive integrated moving average (ARIMA) models to forecast demand for these products. The company wants to predict the demand for a new product that will soon be launched. Which solution should a Machine Learning Specialist apply? Train a custom ARIMA model to forecast demand for the new product. Train an Amazon SageMaker DeepAR algorithm to forecast demand for the new product. Train an Amazon SageMaker k-means clustering algorithm to forecast demand for the new product. Train a custom XGBoost model to forecast demand for the new product. The Amazon SageMaker DeepAR forecasting algorithm is a supervised learning algorithm for forecasting scalar (one-dimensional) time series using recurrent neural networks (RNN). Classical forecasting methods, such as autoregressive integrated moving average (ARIMA) or exponential smoothing (ETS), fit a single model to each individual time series. They then use that model to extrapolate the time series into the future. 8 / 30 A company is using Amazon Polly to translate plaintext documents to speech for automated company announcements. However, company acronyms are being mispronounced in the current documents. How should a Machine Learning Specialist address this issue for future documents? Convert current documents to SSML with pronunciation tags. Create an appropriate pronunciation lexicon. Output speech marks to guide in pronunciation. Use Amazon Lex to preprocess the text files for pronunciation 9 / 30 A Machine Learning Specialist is creating a new natural language processing application that processes a dataset comprised of 1 million sentences. The aim is to then run Word2Vec to generate embeddings of the sentences and enable different types of predictions. Here is an example from the dataset: "The quck BROWN FOX jumps over the lazy dog." Which of the following are the operations the Specialist needs to perform to correctly sanitize and prepare the data in a repeatable manner? (Choose three.) Perform part-of-speech tagging and keep the action verb and the nouns only. Normalize all words by making the sentence lowercase. Remove stop words using an English stopword dictionary. Correct the typography on "quck" to "quick." One-hot encode all words in the sentence. Tokenize the sentence into words. 10 / 30 An agency collects census information within a country to determine healthcare and social program needs by province and city. The census form collects responses for approximately 500 questions from each citizen. Which combination of algorithms would provide the appropriate insights? (Select TWO.) The factorization machines (FM) algorithm The Latent Dirichlet Allocation (LDA) algorithm The principal component analysis (PCA) algorithm The k-means algorithm The Random Cut Forest (RCF) algorithm The PCA and K-means algorithms are useful in collection of data using census form. 11 / 30 A city wants to monitor its air quality to address the consequences of air pollution. A Machine Learning Specialist needs to forecast the air quality in parts per million of contaminates for the next 2 days in the city. As this is a prototype, only daily data from the last year is available. Which model is MOST likely to provide the best results in Amazon SageMaker? Use the Amazon SageMaker k-Nearest-Neighbors (kNN) algorithm on the single time series consisting of the full year of data with a predictor_type of regressor. Use Amazon SageMaker Random Cut Forest (RCF) on the single time series consisting of the full year of data. Use the Amazon SageMaker Linear Learner algorithm on the single time series consisting of the full year of data with a predictor_type of regressor. Use the Amazon SageMaker Linear Learner algorithm on the single time series consisting of the full year of data with a predictor_type of classifier. 12 / 30 An office security agency conducted a successful pilot using 100 cameras installed at key locations within the main office. Images from the cameras were uploaded to Amazon S3 and tagged using Amazon Rekognition, and the results were stored in Amazon ES. The agency is now looking to expand the pilot into a full production system using thousands of video cameras in its office locations globally. The goal is to identify activities performed by non-employees in real time Which solution should the agency consider? Use a proxy server at each local office and for each camera, and stream the RTSP feed to a unique Amazon Kinesis Video Streams video stream. On each stream, use Amazon Rekognition Video and create a stream processor to detect faces from a collection of known employees, and alert when non- employees are detected. Use a proxy server at each local office and for each camera, and stream the RTSP feed to a unique Amazon Kinesis Video Streams video stream. On each stream, use Amazon Rekognition Image to detect faces from a collection of known employees and alert when non-employees are detected. Install AWS DeepLens cameras and use the DeepLens_Kinesis_Video module to stream video to Amazon Kinesis Video Streams for each camera. On each stream, use Amazon Rekognition Video and create a stream processor to detect faces from a collection on each stream, and alert when non- employees are detected. Install AWS DeepLens cameras and use the DeepLens_Kinesis_Video module to stream video to Amazon Kinesis Video Streams for each camera. On each stream, run an AWS Lambda function to capture image fragments and then call Amazon Rekognition Image to detect faces from a collection of known employees, and alert when non-employees are detected. 13 / 30 A Data Scientist is developing a machine learning model to classify whether a financial transaction is fraudulent. The labeled data available for training consists of 100,000 non-fraudulent observations and 1,000 fraudulent observations. The Data Scientist applies the XGBoost algorithm to the data, resulting in the following confusion matrix when the trained model is applied to a previously unseen validation dataset. The accuracy of the model is 99.1%, but the Data Scientist has been asked to reduce the number of false negatives. Which combination of steps should the Data Scientist take to reduce the number of false positive predictions by the model? (Choose two.) Change the XGBoost eval_metric parameter to optimize based on rmse instead of error. Increase the XGBoost scale_pos_weight parameter to adjust the balance of positive and negative weights. Increase the XGBoost max_depth parameter because the model is currently underfitting the data. Change the XGBoost eval_metric parameter to optimize based on AUC instead of error. Decrease the XGBoost max_depth parameter because the model is currently overfitting the data. 14 / 30 Which of the following metrics should a Machine Learning Specialist generally use to compare/evaluate machine learning classification models against each other? Recall Misclassification rate Mean absolute percentage error (MAPE) Area Under the ROC Curve (AUC) 15 / 30 A Machine Learning Specialist has created a deep learning neural network model that performs well on the training data but performs poorly on the test data. Which of the following methods should the Specialist consider using to correct this? (Choose three.) Decrease regularization. Increase regularization. Increase dropout. Decrease dropout. Increase feature combinations. Decrease feature combinations. 16 / 30 A Machine Learning Specialist is configuring Amazon SageMaker so multiple Data Scientists can access notebooks, train models, and deploy endpoints. To ensure the best operational performance, the Specialist needs to be able to track how often the Scientists are deploying models, GPU and CPU utilization on the deployed SageMaker endpoints, and all errors that are generated when an endpoint is invoked. Which services are integrated with Amazon SageMaker to track this information? (Choose two.) AWS CloudTrail AWS Health AWS Trusted Advisor Amazon CloudWatch AWS Config 17 / 30 A monitoring service generates 1 TB of scale metrics record data every minute. A Research team performs queries on this data using Amazon Athena. The queries run slowly due to the large volume of data, and the team requires better performance. How should the records be stored in Amazon S3 to improve query performance? CSV files Parquet files Compressed JSON RecordlO 18 / 30 A Machine Learning Specialist deployed a model that provides product recommendations on a company's website. Initially, the model was performing very well and resulted in customers buying more products on average. However, within the past few months, the Specialist has noticed that the effect of product recommendations has diminished and customers are starting to return to their original habits of spending less. The Specialist is unsure of what happened, as the model has not changed from its initial deployment over a year ago. Which method should the Specialist try to improve model performance? The model needs to be completely re-engineered because it is unable to handle product inventory changes. The model's hyperparameters should be periodically updated to prevent drift. The model should be periodically retrained from scratch using the original data while adding a regularization term to handle product inventory changes The model should be periodically retrained using the original training data plus new data as product inventory changes. 19 / 30 A Machine Learning Specialist is designing a system for improving sales for a company. The objective is to use the large amount of information the company has on users' behavior and product preferences to predict which products users would like based on the users' similarity to other users. What should the Specialist do to meet this objective? Build a content-based filtering recommendation engine with Apache Spark ML on Amazon EMR Build a collaborative filtering recommendation engine with Apache Spark ML on Amazon EMR Build a model-based filtering recommendation engine with Apache Spark ML on Amazon EMR Build a combinative filtering recommendation engine with Apache Spark ML on Amazon EMR Many developers want to implement the famous Amazon model that was used to power the "People who bought this also bought these items" feature on Amazon.com. This model is based on a method called Collaborative Filtering. It takes items such as movies, books, and products that were rated highly by a set of users and recommending them to other users who also gave them high ratings. This method works well in domains where explicit ratings or implicit user actions can be gathered and analyzed. 20 / 30 An interactive online dictionary wants to add a widget that displays words used in similar contexts. A Machine Learning Specialist is asked to provide word features for the downstream nearest neighbor model powering the widget. What should the Specialist do to meet these requirements? Create one-hot word encoding vectors. Produce a set of synonyms for every word using Amazon Mechanical Turk. Create word embedding vectors that store edit distance with every other word. Download word embeddings pre-trained on a large corpus. 21 / 30 Machine Learning Specialist is training a model to identify the make and model of vehicles in images. The Specialist wants to use transfer learning and an existing model trained on images of general objects. The Specialist collated a large custom dataset of pictures containing different vehicle makes and models. What should the Specialist do to initialize the model to re-train it with the custom data? Initialize the model with random weights in all layers including the last fully connected layer. Initialize the model with pre-trained weights in all layers and replace the last fully connected layer. Initialize the model with random weights in all layers and replace the last fully connected layer. Initialize the model with pre-trained weights in all layers including the last fully connected layer. 22 / 30 A Data Scientist wants to gain real-time insights into a data stream of GZIP files. Which solution would allow the use of SQL to query the stream with the LEAST latency? Amazon Kinesis Data Analytics with an AWS Lambda function to transform the data. AWS Glue with a custom ETL script to transform the data. An Amazon Kinesis Client Library to transform the data and save it to an Amazon ES cluster. Amazon Kinesis Data Firehose to transform the data and put it into an Amazon S3 bucket. 23 / 30 A company is running a machine learning prediction service that generates 100 TB of predictions every day. A Machine Learning Specialist must generate a visualization of the daily precision-recall curve from the predictions, and forward a read-only version to the Business team. Which solution requires the LEAST coding effort? Run daily Amazon EMR workflow to generate precision-recall data, and save the results in Amazon S3. Give the Business team read-only access to S3. Generate daily precision-recall data in Amazon QuickSight, and publish the results in a dashboard shared with the Business team. Run a daily Amazon EMR workflow to generate precision-recall data, and save the results in Amazon S3. Visualize the arrays in Amazon QuickSight, and publish them in a dashboard shared with the Business team. Generate daily precision-recall data in Amazon ES, and publish the results in a dashboard shared with the Business team. 24 / 30 A Machine Learning Specialist working for an online fashion company wants to build a data ingestion solution for the company's Amazon S3-based data lake. The Specialist wants to create a set of ingestion mechanisms that will enable future capabilities comprised of: Real-time analytics Interactive analytics of historical data Clickstream analytics Product recommendations Which services should the Specialist use? AWS Glue as the data catalog; Amazon Kinesis Data Streams and Amazon Kinesis Data Analytics for real-time data insights; Amazon Kinesis Data Firehose for delivery to Amazon ES for clickstream analytics; Amazon EMR to generate personalized product recommendations Amazon Athena as the data catalog: Amazon Kinesis Data Streams and Amazon Kinesis Data Analytics for near-real-time data insights; Amazon Kinesis Data Firehose for clickstream analytics; AWS Glue to generate personalized product recommendations AWS Glue as the data catalog; Amazon Kinesis Data Streams and Amazon Kinesis Data Analytics for historical data insights; Amazon Kinesis Data Firehose for delivery to Amazon ES for clickstream analytics; Amazon EMR to generate personalized product recommendations Amazon Athena as the data catalog; Amazon Kinesis Data Streams and Amazon Kinesis Data Analytics for historical data insights; Amazon DynamoDB streams for clickstream analytics; AWS Glue to generate personalized product recommendations 25 / 30 An online reseller has a large, multi-column dataset with one column missing 30% of its data. A Machine Learning Specialist believes that certain columns in the dataset could be used to reconstruct the missing data. Which reconstruction approach should the Specialist use to preserve the integrity of the dataset? Listwise deletion Last observation carried forward Multiple imputation Mean substitution 26 / 30 A company is observing low accuracy while training on the default built-in image classification algorithm in Amazon SageMaker. The Data Science team wants to use an Inception neural network architecture instead of a ResNet architecture. Which of the following will accomplish this? (Choose two.) Customize the built-in image classification algorithm to use Inception and use this for model training. Create a support case with the SageMaker team to change the default image classification algorithm to Inception. Bundle a Docker container with TensorFlow Estimator loaded with an Inception network and use this for model training. Use custom code in Amazon SageMaker with TensorFlow Estimator to load the model with an Inception network, and use this for model training. Download and apt-get install the inception network code into an Amazon EC2 instance and use this instance as a Jupyter notebook in Amazon SageMaker. 27 / 30 A Mobile Network Operator is building an analytics platform to analyze and optimize a company's operations using Amazon Athena and Amazon S3. The source systems send data in .CSV format in real time. The Data Engineering team wants to transform the data to the Apache Parquet format before storing it on Amazon S3. Which solution takes the LEAST effort to implement? Ingest .CSV data using Apache Kafka Streams on Amazon EC2 instances and use Kafka Connect S3 to serialize data as Parquet Ingest .CSV data from Amazon Kinesis Data Streams and use Amazon Glue to convert data into Parquet. Ingest .CSV data using Apache Spark Structured Streaming in an Amazon EMR cluster and use Apache Spark to convert data into Parquet. Ingest .CSV data from Amazon Kinesis Data Streams and use Amazon Kinesis Data Firehose to convert data into Parquet. 28 / 30 A Machine Learning Specialist is building a model that will perform time series forecasting using Amazon SageMaker. The Specialist has finished training the model and is now planning to perform load testing on the endpoint so they can configure Auto Scaling for the model variant. Which approach will allow the Specialist to review the latency, memory utilization, and CPU utilization during the load test? Review SageMaker logs that have been written to Amazon S3 by leveraging Amazon Athena and Amazon QuickSight to visualize logs as they are being produced. Generate an Amazon CloudWatch dashboard to create a single view for the latency, memory utilization, and CPU utilization metrics that are outputted by Amazon SageMaker. Build custom Amazon CloudWatch Logs and then leverage Amazon ES and Kibana to query and visualize the log data as it is generated by Amazon SageMaker. Send Amazon CloudWatch Logs that were generated by Amazon SageMaker to Amazon ES and use Kibana to query and visualize the log data 29 / 30 The displayed graph is from a forecasting model for testing a time series. Considering the graph only, which conclusion should a Machine Learning Specialist make about the behavior of the model? The model predicts both the trend and the seasonality well The model predicts the trend well, but not the seasonality. The model predicts the seasonality well, but not the trend. The model does not predict the trend or the seasonality well. 30 / 30 A Machine Learning Specialist receives customer data for an online shopping website. The data includes demographics, past visits, and locality information. The Specialist must develop a machine learning approach to identify the customer shopping patterns, preferences, and trends to enhance the website-for better service and smart recommendations. Which solution should the Specialist recommend? Latent Dirichlet Allocation (LDA) for the given collection of discrete data to identify patterns in the customer database. A neural network with a minimum of three layers and random initial weights to identify patterns in the customer database. Collaborative filtering based on user interactions and correlations to identify patterns in the customer database. Random Cut Forest (RCF) over random subsamples to identify patterns in the customer database. Your score is 0% Restart quiz