Lecture 6: Neural Networks 3

# Machine Learning

## Support Vector Regression & Semi Supervised Learning

### III-Verano 2019

---

# Support Vector Regression

<p style="margin-top: 5cm; text-align: center; font-size: 2em">
Fundamentals of a SVR
</p>

---

# Background

* A Regression task involves the approximation of a mapping function from inputs to a continuous output.

* When using support vector machines (SVM) to solve a regression problem, it is called a support vector regression.

---

# Key Concepts

* Here are some key concepts that we have to reminds:

* Kernel: a function that is used to map a lower dimensional data into a higher dimensional data

* Hyperplane: the separation line between the classes. In SVR, it is defined as the line that will predict the continuous values.

* Boundary line: in SVM these are two lines which help create the margin. The support vectors can be found on the Boundary, or outside but close to it. In SVR it is the same.

* Support vectors: The data points closest to the boundary.

---

# What is the main difference between SVR and Regression?

* In regression we try to minimize the error.

* In SVR we our task is to fit the error within a certain threshold

---

# SVR

---

# Behavior of SVR

* The objective in SVR is to consider all the points that are within the boundary line

* How does that compare to SVM?

---

# Understanding the Mechanics

* The boundary are “lines” that are at a distance e from the hyperplane (actually not e, but epsilon)

* If we assume the hyperplane us a straight line that goes across the y-axis

* The equation of the hyperplane can be written as follows:

\[wx+b=0\]

---

# Understanding the Mechanics

* The boundary are “lines” that are at an distance e from the hyperplane (actually not e, but epsilon)

* If we assume the hyperplane us a straight line that goes across the y-axis

* The equation of the hyperplane can be written as follows:

\[wx+b=0\]

\[Wx+b=+e\]

---

# Understanding the Mechanics

* The boundary are “lines” that are at an distance e from the hyperplane (actually not e, but epsilon)

* If we assume the hyperplane us a straight line that goes across the y-axis

* The equation of the hyperplane can be written as follows:

wx+b=0

Wx+b=+e

Wx+b=-e

---

# What does this mean for SVR?

* The previously shown equations also demonstrate that our linear hyperplane

e≤y-Wx-b≤+e

* This allows us to state the fact that:

y-Wx-b=0

---

# Summary

* What n SVR is attempting to achieve is identify a decision boundary with an assigned epsilon, known as a defines a margin of tolerance where no penalty is given to errors.

* Doing this will allow us to find data points closer to the hyperplane (support vectors), also within that boundary line.

---

# Introduction to Semi-Supevised Learning

* SSL is seen by some as a middle ground between unsupervised (UL) and supervised learning (SL).

* This is a reason why, when people present the types of learning only SL and UL are presented.

---

# The beginning

* The SSL task starts with a series of labeled data points as well as some data points for which the label is unknown

* What is the goal?

---

# The beginning

* The SSL task starts with a series of labeled data points as well as some data points for which the label is unknown

* What is the goal?

* To classify some of the unlabeled data using the labeled information

---

# The beginning

* The SSL task starts with a series of labeled data points as well as some data points for which the label is unknown

* What is the goal?

* To classify some of the unlabeled data using the labeled information

* This is why some consider it a supervised-like task, because you are classifying data points, by precising a label. Contrarily, other see SSL as supervised.

---

# Applications of SSL

* Speech Analysis: Labeling audio files typically is a very intensive, requiring a lot of human resources. Applying SSL techniques can improve traditional speech analytic models.

* Protein Sequence Classification: Inferring the function of proteins typically requires expert knowledge.

* Web Content Classification: Organizing the corpus of knowledge (billions of web pages) will advance different segments of AIHowever such task requires human intervention to classify the content.

---

* There are a few essential characteristics that should be present on a problem to be effectively solvable using SSL.

Sizable Unlabeled Dataset: the number of unlabeled instances should be significant. Otherwise use Supervised Learning

* Input-Output Proximity Symmetry: SSL operates by inferring classification for unlabeled data based on proximity with labeled data points.

* Relatively Simple Labeling & Low-Dimension Nature of the Problem: In SSL scenarios, it is important that the inference of the labeled data doesn’t become a problem more complicated than the original problem.

* AKA: “Vapnik Principle” – that in order to solve a problem we should not pick an intermediate problem of a higher order of complexity.

* Problems with datasets containing high dimensionality or large N of attributes are likely to become really challenging for SSL algorithms as labeling task will become very complex.
	
---

# Transductive Learning

* Intriduced by Vapnik
 
 * Consists basically of accurately predicitng on an unlabeled tests set using a labeled training set
 
 * Commonly used in situations where: labeled data is limited and the dataset is finite

---

# Transductive SVM

* Originally introduced in ofnration retrieval
 
 * The objective was to classify documents within a database
 
 * Although it can be seen as an indictive problem it is different, why?

---

# Transductive SVM

* Originally introduced in ofnration retrieval
 
 * The objective was to classify documents within a database
 
 * Although it can be seen as an indictive problem it is different, why?
 
 * The rules learned do not need to be general because the data is no infinite, it is actually finte

---

# Transductive SVM

---
 
 # TSVM
 
 * In a set of of labeled samples S, which are part of a training set
 
 * There are binary labeled {-1,1}
 
 * And have attributes within a space desgoned X.
 
 * The algorithm separates the training 
 
 
---

# TSVM

* Adter splitting the algorithm has acces to the following"
 
   * The training vector
   
   * The the training labels
   
   * The unlabeld test vector

* However it DOES NOT have access to the test labels
 
 * And it uses them to produce predictions
 
 * The goal is to minimze the fraction of erroneous predictions!

---

# Applications of TSVM

* Protein Stability Prediction
 
 * Disease Gene Predicxtion
 
 * Information Retreieval

---

# References

* [Semi-supervised Learning](http://www.acad.bg/ebook/ml/MITPress-%20SemiSupervised%20Learning.pdf)
  
  * [Understanding Support Vector Machine Regression](https://www.mathworks.com/help/stats/understanding-support-vector-machine-regression.html)