10 πŸ’» Second Intermediate Sample Questions

Hi guys, these are the sample questions that prof Dabo gave us to exercise yourself. As you may notice most of them are open questions on very superifical theory concepts, no indeep math or heavy calculations (matrix products, dot products etc.). So please my suggestion is to review carefully the slides and just learn the basic R commands to execute analysis on a higer level! πŸ€

10.1 πŸ‘¨β€πŸŽ“ 2023/2024 (2nd intermediate)

Exercise 10.1 Basic Understanding:

  1. What does PCA stand for?
  2. Briefly explain the primary objective of Principal Component Analysis.
  3. How does PCA help in dimensionality reduction?

Exercise 10.2 Library and Data Loading:

  1. Which R library is commonly used for performing PCA?
  2. Write the command to load the library FactomineR for PCA.
  3. How do you read a dataset into R for PCA analysis?

Exercise 10.3 Data Preparation:

  1. Explain the importance of scaling or standardizing variables before applying PCA.
  2. Write the R command to standardize a data matrix.

Exercise 10.4 PCA Execution:

  1. What function in R is used to perform PCA?
  2. Provide the basic syntax for running PCA on a dataset named « my_data.”

Exercise 10.5 Interpretation of Results:

  1. How can you access the proportion of variance explained by each principal component in the following R script?
  2. What is the significance of the eigenvalues and eigenvectors in PCA?

Exercise 10.6 Selecting Principal Components:

  1. How can you determine the optimal number of principal components to retain in R?
  2. Write the R command to extract the loadings of principal components.

Exercise 10.7 The inertia of a centered matrix of n individuals and p quantitative variables is

  1. p
  2. The sum of variances of the p variables
  3. None of the responses are true

Exercise 10.8 The principal components (coordinates of the individuals) are un-correlated

  1. TRUE
  2. FALSE

Exercise 10.9 In a normed PCA, the mean eigen-values is

  1. 1
  2. 2
  3. 3

Exercise 10.10 Let Z be a matrix (50 rows and 4 columns) of centered and reduced quantitative data, with a correlation matrix R (of dimension 4) and three eigenvalues are 2, 1 and 0.4.

  1. Give the maximum number of eigen-values
  2. Give the remaining eigen-values

Exercise 10.11 A dataset X gives, for 23 Charolais and Zebus cattles, 6 different weights, in kg: live weight (W_LIV), carcass weight (W_CAR), prime meat weight (W_QUALI), total meat weight (W_TOTAL), fat meat weight (W_FAT), bone weight (W_BO) and the cattle type (Type).

  1. How do you interpret the following correlation matrix plot?
corr matrix
corr matrix
  1. How many components would you choose regarding the following figures (giving the eigen-values and correlation between the components and the variables)
eig table
eig table
  1. Interpret the following figure:
eig table
eig table

Exercise 10.12 Scree Plot:

  1. What is the purpose of a scree plot in PCA?

  2. How do you generate and interpret the following scree plot ?

scree plot
scree plot

Exercise 10.13 Scree Plot:

  1. Briefly explain the main objective of Correspondence Analysis (CA).
  2. How is CA different from Principal Component Analysis (PCA)?
  3. Provide an example of a scenario where CA would be a suitable analysis.

Exercise 10.14 Correspondence Analysis:

  1. Briefly explain the main objective of Correspondence Analysis (CA).
  2. How is CA different from Principal Component Analysis (PCA)?
  3. Provide an example of a scenario where CA would be a suitable analysis.

Exercise 10.14 CA Execution:

  1. Provide the basic syntax for running CA on a contingency table named β€œmy_table.”

Exercise 10.14 Interpretation of Results:

  1. How can you access the row and column scores of the CA results in R?

Exercise 10.15 Visualization:

  1. Write the R command to create a biplot for a Correspondence Analysis result.
  2. How can you visually assess the relationships between rows and columns in a CA plot?


Exercise 10.16 E3:

  1. Write the R command to extract the contributions of dimensions in CA (write it in general)

Exercise 10.17 Disjunctive table:

  1. Construct the disjunctive table of the following data

library(tibble) disj_table = tribble( ~Var1, ~Var2, ~Var3, β€œCB”, β€œYB”, β€œF”, β€œCB”, β€œYV”, β€œF”, β€œCC”, β€œYB”, β€œM”, β€œCC”, β€œYM”, β€œF”, β€œCR”, β€œYV”, β€œM”, β€œCB”, β€œYB”, β€œM” )


Exercise 10.18 Chi-Square Test:

  1. What is the role of the chi-square test in Correspondence Analysis?
  2. How can you perform a chi-square test on a CA result in R?

Exercise 10.19 Clustering:

  1. What is the main goal of clustering algorithms?

Exercise 10.20 K-Means Clustering:

  1. What is the fundamental concept behind K-means clustering?
  2. Explain the meaning of centroids in the context of K-means clustering.
  3. Write the R command to perform K-means clustering on a dataset named β€œmy_data.”

Exercise 10.21 Hierarchical Clustering:

  1. Briefly explain how hierarchical clustering works.
  2. Write the R command to conduct hierarchical clustering on a dataset.

Exercise 10.22 Interpretation of Clustering Results:

  1. How do you interpret the following output of a clustering analysis on the cattle data?
var_desc_clust plot
var_desc_clust plot

Exercise 10.23 Classification:

  1. What is the main goal of a classification ?
  2. Provide an example of a real-world application where classification analysis could be beneficial.
  3. How can classification be used in medical diagnosis or fraud detection?

Exercise 10.24 What does PCA stand for?

  1. Primary Component Analysis
  2. Principal Component Algorithm
  3. Principal Component Analysis
  4. Primary Component Algorithm

Exercise 10.25 In PCA, what is the primary goal?

  1. Reduce dimensionality while preserving variance
  2. Increase dimensionality for better visualization
  3. Minimize all components equally
  4. Focus on individual components only

Exercise 10.26 Which R function is commonly used to perform PCA?

  1. kmeans()
  2. PCA()
  3. prcomp()
  4. corresp()

Exercise 10.27 What is the purpose of a scree plot in PCA?

  1. Visualize the clusters in data
  2. Assess the quality of clustering
  3. Evaluate the distribution of data
  4. Display the eigenvalues of principal components

Exercise 10.28 How do you determine the optimal number of principal components to retain in PCA?

  1. Use hierarchical clustering
  2. Examine the scree plot
  3. Apply k-means clustering
  4. Perform a chi-square test

Exercise 10.29 What is the primary application of Correspondence Analysis (CA)?

  1. Reducing dimensionality of numerical data
  2. Analyzing relationships in categorical data
  3. Classifying data points into clusters
  4. Predicting future values in a time series

Exercise 10.30 Which R library is commonly used for Correspondence Analysis?

  1. cluster
  2. caret
  3. ca
  4. factoextra
  5. FactomineR

Exercise 10.31 What is the role of the chi-square test in Correspondence Analysis?

  1. Assess the significance of relationships
  2. Determine the optimal number of clusters
  3. Evaluate the distribution of data
  4. Visualize the proximity between data points

Exercise 10.32 What is the primary goal of clustering algorithms?

  1. Dimensionality reduction
  2. Classification
  3. Grouping similar data points
  4. Visualization of data

Exercise 10.33 Which R function is commonly used for K-means clustering?

  1. hierarch()
  2. PCA()
  3. kmeans()
  4. prcomp()

Exercise 10.34 How can you visually assess relationships between rows and columns in a clustering plot?

  1. Scree plot
  2. Dendrogram
  3. Silhouette plot
  4. Biplot

Exercise 10.35 What is the primary goal of a classification algorithm?

  1. Group similar data points
  2. Predict numerical values
  3. Assign labels to data points
  4. Visualize high-dimensional data

Exercise 10.36 Which algorithm is commonly used for binary classification tasks? (Answers can be mnore than one)

  1. Decision Trees
  2. K-means
  3. LDA
  4. Logistic Regression

Exercise 10.37 Which metric is commonly used to evaluate the performance of a classification model?

  1. R-squared
  2. Mean Absolute Error
  3. Silhouette Score
  4. Accuracy

10.2 solutions

Answer to Question 10.7:

  1. p
  2. The sum of variances of the p variables
  3. None of the responses are true

Answer to Question 10.8:

  1. TRUE
  2. FALSE

Answer to Question 10.9:

  1. 1
  2. 2
  3. 3

Answer to Question 10.30:

  1. cluster
  2. caret
  3. ca
  4. factoextra
  5. FactomineR

Answer to Question 10.32:

  1. Dimensionality reduction
  2. Classification
  3. Grouping similar data points
  4. Visualization of data

Answer to Question 10.33:

  1. hierarch()
  2. PCA()
  3. kmeans()
  4. prcomp()a

Answer to Question 10.34:

  1. Scree plot
  2. Dendrogram
  3. Silhouette plot
  4. Biplot

Answer to Question 10.35:

  1. Group similar data points
  2. Predict numerical values
  3. Assign labels to data points
  4. Visualize high-dimensional data

Answer to Question 10.36:

  1. Decision Trees
  2. K-means
  3. LDA
  4. Logistic Regression

Answer to Question 10.37:

  1. R-squared
  2. Mean Absolute Error
  3. Silhouette Score
  4. Accuracy