0%
0 votes, 0 avg
17

This quiz randomly generates 30 questions as asked in AWS Certified Machine Learning - Specialty (MLS-C01)

Congratulations!

AWS Certified Machine Learning

AWS Certified Machine Learning - Specialty (MLS-C01)

This quiz randomly generates 30 questions (in 60 mins) as asked in AWS Certified Machine Learning - Specialty (MLS-C01). The real MLS-C01 test has 65 questions and a total time of 180 minutes. Of these, 15 questions are underlined, and only 50 questions are scored. This test randomly generates 30 questions from our question bank. For best results, practice multiple times until you achieve 100% accuracy.

1 / 30

A city wants to monitor its air quality to address the consequences of air pollution. A Machine Learning
Specialist needs to forecast the air quality in parts per million of contaminates for the next 2 days in the
city. As this is a prototype, only daily data from the last year is available.
Which model is MOST likely to provide the best results in Amazon SageMaker?

2 / 30

A Machine Learning Specialist is using an Amazon SageMaker notebook instance in a private subnet of a
corporate VPC. The ML Specialist has important data stored on the Amazon SageMaker notebook
instance's Amazon EBS volume, and needs to take a snapshot of that EBS volume. However, the ML
Specialist cannot find the Amazon SageMaker notebook instance's EBS volume or Amazon EC2 instance
within the VPC.
Why is the ML Specialist not seeing the instance visible in the VPC?

3 / 30

A Machine Learning Specialist kicks off a hyperparameter tuning job for a tree-based ensemble model
using Amazon SageMaker with Area Under the ROC Curve (AUC) as the objective metric. This workflow
will eventually be deployed in a pipeline that retrains and tunes hyperparameters each night to model click-
through on data that goes stale every 24 hours.
With the goal of decreasing the amount of time it takes to train these models, and ultimately to decrease
costs, the Specialist wants to reconfigure the input hyperparameter range(s).
Which visualization will accomplish this?

4 / 30

A Machine Learning Specialist is building a prediction model for a large number of features using linear
models, such as linear regression and logistic regression. During exploratory data analysis, the Specialist
observes that many features are highly correlated with each other. This may make the model unstable.
What should be done to reduce the impact of having such a large number of features?

5 / 30

A Machine Learning Specialist needs to be able to ingest streaming data and store it in Apache Parquet
files for exploration and analysis.
Which of the following services would both ingest and store this data in the correct format?

6 / 30

A Data Science team is designing a dataset repository where it will store a large amount of training data
commonly used in its machine learning models. As Data Scientists may create an arbitrary number of new
datasets every day, the solution has to scale automatically and be cost-effective. Also, it must be possible
to explore the data using SQL.
Which storage scheme is MOST adapted to this scenario?

7 / 30

A Machine Learning Specialist is developing a custom video recommendation model for an application.
The dataset used to train this model is very large with millions of data points and is hosted in an Amazon
S3 bucket. The Specialist wants to avoid loading all of this data onto an Amazon SageMaker notebook
instance because it would take hours to move and will exceed the attached 5 GB Amazon EBS volume on
the notebook instance.
Which approach allows the Specialist to use all the data to train the model?

8 / 30

An office security agency conducted a successful pilot using 100 cameras installed at key locations within

the main office. Images from the cameras were uploaded to Amazon S3 and tagged using Amazon

Rekognition, and the results were stored in Amazon ES. The agency is now looking to expand the pilot into

a full production system using thousands of video cameras in its office locations globally. The goal is to

identify activities performed by non-employees in real time

Which solution should the agency consider?

9 / 30

A Machine Learning Specialist is preparing data for training on Amazon SageMaker. The Specialist is
using one of the SageMaker built-in algorithms for the training. The dataset is stored in .CSV format and is
transformed into a numpy.array, which appears to be negatively affecting the speed of the training.
What should the Specialist do to optimize the data for training on SageMaker?

10 / 30

A monitoring service generates 1 TB of scale metrics record data every minute. A Research team performs
queries on this data using Amazon Athena. The queries run slowly due to the large volume of data, and the
team requires better performance. How should the records be stored in Amazon S3 to improve query performance?

11 / 30

A Machine Learning Specialist is building a convolutional neural network (CNN) that will classify 10 types
of animals. The Specialist has built a series of layers in a neural network that will take an input image of an
animal, pass it through a series of convolutional and pooling layers, and then finally pass it through a
dense and fully connected layer with 10 nodes. The Specialist would like to get an output from the neural
network that is a probability distribution of how likely it is that the input image belongs to each of the 10
classes.
Which function will produce the desired output?

12 / 30

A large consumer goods manufacturer has the following products on sale:
1. 34 different toothpaste variants
2. 48 different toothbrush variants
3. 43 different mouthwash variants
The entire sales history of all these products is available in Amazon S3. Currently, the company is using
custom-built autoregressive integrated moving average (ARIMA) models to forecast demand for these
products. The company wants to predict the demand for a new product that will soon be launched.
Which solution should a Machine Learning Specialist apply?

13 / 30

A Data Scientist is working on an application that performs sentiment analysis. The validation accuracy is
poor, and the Data Scientist thinks that the cause may be a rich vocabulary and a low average frequency
of words in the dataset.
Which tool should be used to improve the validation accuracy?

14 / 30

A gaming company has launched an online game where people can start playing for free, but they need to
pay if they choose to use certain features. The company needs to build an automated system to predict
whether or not a new user will become a paid user within 1 year. The company has gathered a labeled
dataset from 1 million users.
The training dataset consists of 1,000 positive samples (from users who ended up paying within 1 year)
and 999,000 negative samples (from users who did not use any paid features). Each data sample consists
of 200 features including user age, device, location, and play patterns.
Using this dataset for training, the Data Science team trained a random forest model that converged with
over 99% accuracy on the training set. However, the prediction results on a test dataset were not
satisfactory
Which of the following approaches should the Data Science team take to mitigate this issue? (Choose
two.)

15 / 30

A company's Machine Learning Specialist needs to improve the training speed of a time-series forecasting
model using TensorFlow. The training is currently implemented on a single-GPU machine and takes
approximately 23 hours to complete. The training needs to be run daily.
The model accuracy is acceptable, but the company anticipates a continuous increase in the size of the
training data and a need to update the model on an hourly, rather than a daily, basis. The company also
wants to minimize coding effort and infrastructure changes.
What should the Machine Learning Specialist do to the training solution to allow it to scale for future
demand?

16 / 30

A Machine Learning team uses Amazon SageMaker to train an Apache MXNet handwritten digit classifier
model using a research dataset. The team wants to receive a notification when the model is overfitting.
Auditors want to view the Amazon SageMaker log activity report to ensure there are no unauthorized API
calls.
What should the Machine Learning team do to address the requirements with the least amount of code and
fewest steps?

17 / 30

A Machine Learning Specialist has created a deep learning neural network model that performs well on the
training data but performs poorly on the test data.
Which of the following methods should the Specialist consider using to correct this? (Choose three.)

18 / 30

A Machine Learning Specialist deployed a model that provides product recommendations on a company's
website. Initially, the model was performing very well and resulted in customers buying more products on
average. However, within the past few months, the Specialist has noticed that the effect of product
recommendations has diminished and customers are starting to return to their original habits of spending
less. The Specialist is unsure of what happened, as the model has not changed from its initial deployment
over a year ago.
Which method should the Specialist try to improve model performance?

19 / 30

A Data Scientist is developing a machine learning model to classify whether a financial transaction is
fraudulent. The labeled data available for training consists of 100,000 non-fraudulent observations and
1,000 fraudulent observations.
The Data Scientist applies the XGBoost algorithm to the data, resulting in the following confusion matrix
when the trained model is applied to a previously unseen validation dataset. The accuracy of the model is
99.1%, but the Data Scientist has been asked to reduce the number of false negatives.
Which combination of steps should the Data Scientist take to reduce the number of false positive
predictions by the model? (Choose two.)

20 / 30

A Machine Learning Specialist is required to build a supervised image-recognition model to identify a cat.
The ML Specialist performs some tests and records the following results for a neural network-based image
classifier:
Total number of images available = 1,000
Test set images = 100 (constant test set)
The ML Specialist notices that, in over 75% of the misclassified images, the cats were held upside down by
their owners.
Which techniques can be used by the ML Specialist to improve this specific test error?

21 / 30

A Machine Learning Specialist is implementing a full Bayesian network on a dataset that describes public
transit in New York City. One of the random variables is discrete, and represents the number of minutes
New Yorkers wait for a bus given that the buses cycle every 10 minutes, with a mean of 3 minutes.
Which prior probability distribution should the ML Specialist use for this variable?

22 / 30

A Marketing Manager at a pet insurance company plans to launch a targeted marketing campaign on
social media to acquire new customers. Currently, the company has the following data in Amazon Aurora:
Profiles for all past and existing customers
Profiles for all past and existing insured pets
Policy-level information
Premiums received
Claims paid
What steps should be taken to implement a machine learning model to identify potential new customers on
social media?

23 / 30

A Data Science team within a large company uses Amazon SageMaker notebooks to access data stored
in Amazon S3 buckets. The IT Security team is concerned that internet-enabled notebook instances create
a security vulnerability where malicious code running on the instances could compromise data privacy. The
company mandates that all instances stay within a secured VPC with no internet access, and data
communication traffic must stay within the AWS network.
How should the Data Science team configure the notebook instance placement to meet these
requirements?

24 / 30

A Machine Learning Specialist receives customer data for an online shopping website. The data includes
demographics, past visits, and locality information. The Specialist must develop a machine learning
approach to identify the customer shopping patterns, preferences, and trends to enhance the website-for
better service and smart recommendations.
Which solution should the Specialist recommend?

25 / 30

A Machine Learning Specialist is working with a large cybersecurity company that manages security
events in real time for companies around the world. The cybersecurity company wants to design a solution
that will allow it to use machine learning to score malicious events as anomalies on the data as it is being
ingested. The company also wants be able to save the results in its data lake for later processing and
analysis.
What is the MOST efficient way to accomplish these tasks?

26 / 30

A Machine Learning Specialist has completed a proof of concept for a company using a small data sample,
and now the Specialist is ready to implement an end-to-end solution in AWS using Amazon SageMaker.
The historical training data is stored in Amazon RDS.
Which approach should the Specialist use for training a model using that data?

27 / 30

A Machine Learning Specialist is designing a system for improving sales for a company. The objective is to
use the large amount of information the company has on users' behavior and product preferences to
predict which products users would like based on the users' similarity to other users.
What should the Specialist do to meet this objective?

28 / 30

During mini-batch training of a neural network for a classification problem, a Data Scientist notices that
training accuracy oscillates.
What is the MOST likely cause of this issue?

29 / 30

A Machine Learning Specialist is working with a large company to leverage machine learning within its
products. The company wants to group its customers into categories based on which customers will and
will not churn within the next 6 months. The company has labeled the data available to the Specialist.
Which machine learning model type should the Specialist use to accomplish this task?

30 / 30

A Mobile Network Operator is building an analytics platform to analyze and optimize a company's
operations using Amazon Athena and Amazon S3. The source systems send data in .CSV format in real time. The Data Engineering team wants to transform the data to the Apache Parquet format before storing it on Amazon S3. Which solution takes the LEAST effort to implement?

Your score is

0%

Scroll to Top