
Modeling data in high-dimensional spaces
As you saw, we can represent real-world observations by redefining them as a function of different features. The speed of an object, for example, is a function of the distance it traveled over a given time. Similarly, the color of a pixel on your TV screen is actually a function of the red, green, and blue intensity values that make up that pixel. These elements are what data scientists call features or dimensions of your data. When we have dimensions that are labeled, we deal with a supervised learning task, as we can check the learning of our model with respect to what is truly the case. When we have unlabeled dimensions, we calculate the distances between our observation points to find similar groups in our data. This is known as unsupervised ML. Hence, in this manner, we can start building a model of a real-world phenomenon, by simply representing it using informative features.