# (Study notes) basic concepts of machine learning

1. Data: Introductory data (famous iris data)
• Data set (data set): data as a whole
• Data representation
• Each row of data is a sample, and each row itself is also a feature vector
• Except for the last column, each column is a feature expression of the sample: X ~ i, j (essentially a matrix)
• The last column, usually a label (label) Representation method: y (essentially a vector)
• Examples of expressions:
• Feature space: the essence of each sample is a point in a space formed by the characteristics of the object
• The essence of the classification task is to segment the feature space (the same is true regardless of the spatial dimension)

2. Basic tasks in machine learning
• classification
• Two categories (choose one task): determine whether it is spam, whether something is risky, whether stocks rise or fall
• Multi-classification tasks: such as handwritten digit recognition, image recognition, event risk rating (many complex problems can be tried to be converted into multi-classification tasks, but multi-classification methods are not necessarily the best solution)
• The relationship between the two: some algorithms only support the completion of two classification tasks, but multi-classification tasks can be converted into two classification tasks, and some algorithms can naturally complete polyphenol tasks
• return
• Data format:
• Features: The prediction result is not a category, but a continuous numeric value. [For example, price forecasts, market analysis, stock forecasts, student performance]

3. machine learning algorithm perspective classification
• Supervised learning: Have "marks" or "answers" for the training data of the machine
• Unsupervised learning: The training data for the machine does not have any "marks" or "answers"
• Cluster analysis classifies data that is not "labeled"
• Dimensionality reduction
• Feature extraction:
For example, the credit rating of a credit card has nothing to do with a person s fatness and thinness, so the fatness and thinness characteristics need to be thrown away
• Feature compression: In the case of minimizing information loss, compress high-dimensional feature vectors to low-dimensional, [for example, PCA] (meaning: greatly improve the efficiency of the algorithm without affecting the accuracy; convenient visualization [humans cannot understand more than four-dimensional information])
• Anomaly detection: Some data cannot express the overall characteristics of the sample, and machine learning is to find the general characteristics of the sample. The process of finding these special points using a specific algorithm is anomaly detection.
• Semi-supervised learning: part of the data has "marks" or "answers", and the other part of the data does not. This situation is more common (the marker is missing due to various reasons)
• First use unsupervised learning methods to process the data, and then use supervised learning methods for model training and prediction
• Reinforced learning takes actions based on the surrounding environment, and learns (improves) the way of action based on the results of the actions taken.
(For example, in the field of robotics, unmanned driving, alphago, etc.)

4. other classifications of machine learning
• Batch learning
• Advantages: simple, a model will not change after training
• Problem: The business environment will change (such as whether classified mail is spam, and new spam will change with the development of spam characteristics (such as style)) Solution: Re-learning in batches
• Disadvantages: Re-learning in batches each time requires a huge amount of calculations.

• e-learning
• Advantages: timely reflect new environmental changes
• Problem: New data (unusual data) may bring bad changes
• Solution: Strengthen the monitoring of data, (abnormal detection)
• Others: suitable for environments where the amount of data is huge and batch learning is completely impossible
• Parameter learning
• Find the parameters of the model function. Once the parameters are learned, the original data is no longer needed.
• Assuming that the relationship between the feature and the result to be predicted is a relationship that can be covered by a certain statistical model, the task is to learn the parameters of this statistical model.
• Non-parametric learning
• Does not make statistical assumptions about the model
• Non-parameters does not mean that there are no parameters
5. Philosophical thinking (philosophical thinking related to machine learning)
• Occam s razor principle [simple is best]
• Don t make too many assumptions and complication about the problem to be solved
• In the field of machine learning, what is simple?
• There is no free lunch theorem
• It can be derived strictly mathematically: for any two algorithms, their expected performance is the same. But for specific problems, some algorithms are better.
• It doesn't make sense to talk about the quality of the algorithm out of specific problems.
• Faced with a specific problem, it is necessary to try a variety of algorithms for comparative experiments.
6. technical environment installation and configuration
• Calculation package management tool
anaconda