Lecture 15: Review

class: center, middle

# Machine Learning

## Review
### III-Verano 2019

---

# Question

What does an agent learn in a reinforcement learning problem? What inputs and outputs does it use to do that?

---

# Question

What does an agent learn in a reinforcement learning problem? What inputs and outputs does it use to do that?

Answer: The agent learns a **policy**. It gets observations about its environment as inputs, can perform actions (which are its "outputs"), and also obtains reward information as additional input.

---

# Question

What is the epsilon-greedy strategy for action selection in Reinforcement Learning? Why would you use it?

---

# Question

Given the Q-table below, which action would the policy defined by this table select in state 2?

State | Walk left | Walk right | Jump
------|-----------|------------|--------
1     |   2.313   | 1.337      | 6.1
2     |   1.5     | -2.8       | 0.24
3     |  -4.1     | 2.4        | 0.0
4     |  -2.6     | 3.4        | -1

---

# Question

Given the Q-table below, your agent is in state 2, performs the action "walk left", which results in a reward of 0.7 and leads to state 3. How will the Q-table change, using the Q-update rule using a learning rate of 0.5, and a discount factor (gamma) of 0.75?

$$
Q(s,a) \leftarrow (1-\alpha) Q(s,a) + \alpha \cdot (R(s) + \gamma \cdot \max_{a} Q(T(s,a),a))
$$

State | Walk left | Walk right | Jump
------|-----------|------------|--------
1     |   2.313   | 1.23       | 6.1
2     |   1.5     | -2.8       | 0.24
3     |  -4.1     | 2.4        | 0.0
4     |  -2.6     | 3.4        | -1

---

# Question

According to what we discussed in class and the literature provided to you.

a) What is the difference between a parameter and a hyperparameter?

b) In your explanation and comparison priovde examples of each (parameters and hyperparameters) in the context of architectures/algorithms seen in this coures or which you know.

c) For your chosen algorithm explain the hyperparameters, the tuning methods and challenges associayted with them.

---

# Question

What is the goal of semi-supervised learning? What are the situations in which using semi-supervised learning is considered a good alternative? Provide an explanation and use examples based on class discussion. Compare semi-superviosed learning to unsupervised learning and supervised learning.

---

# Question

What is the difference between inductive and transductive learning? Why is inductive learning associated with supervised learning and transductive learning with semi-supervised learning?

---

# Question

.left-column[
<img src="/CI-2600/assets/img/binaryClassification.png" width="100%"/>
]

.right-column[
You want to use Lloyd's algorithm to cluster this data (with k=2), and initialized the cluster centers. How would the data currently be classified? (approximately)

Update the cluster centers, i.e. perform one step of Lloyd's algorithm (approximately)
]

---

# Question

a) What are the probabilistic background behind the notion of random sampling, and why it can perform as good as grid search. What are the assumtons that it needs to meet to perform this well?

b) Based on your experience using grid search using grid search or what you have read about it. What are the potential drawback of this technoque? And its benefits?

---

# Question

G. Rind R. is the CEO of a startup that wants to make a dating app targeted at gay people. He offers to pay you 100 000 USD for a machine learning system that takes people's facebook pictures to predict if they are gay.

Briefly discuss ethical and legal concerns you have about this assignment.

---

# Question

For your TFIA you want to perform an experiment. For which of the following cases would you need informed consent from your participants? Why or why not?

- Participants play a game and you measure how many points they get.
 - Participants play a game, and you ask them about the emotions they experienced.
 - You analyze blood samples from the participants for indications of hereditary diseases using Machine Learning.
 - You collect data from Twitter and analyze which topics Costan Ricans feel positively/negatively about.