
Feeding a neural network
Essentially, all of the data that enters and propagates through the network is represented by a mathematical structure known as a tensor. This applies to audio data, images, video, and any other data we can think of, to feed our data-hungry network. In mathematics (https://en.wikipedia.org/wiki/Mathematics), a tensor is defined as an abstract and arbitrary geometric (https://en.wikipedia.org/wiki/Geometry) entity that maps aggregations of vectors in a multi-linear (https://en.wikipedia.org/wiki/Linear_map) manner to a resulting tensor. In fact, vectors and scalars are considered simpler forms of tensors. In Python, tensors are defined with three specific properties, as follows:
- Rank: Specifically, this denotes the number axes. A matrix is said to have the rank 2, as it represents a two-dimensional tensor. In Python libraries, this is often indicated as ndim.
- Shape: The shape of a tensor can be checked by calling the shape property on a NumPy n-dimensional array (which is how a tensor is represented in Python). This will return a tuple of integers, indicating the number of dimensions a tensor has along each axis.
- Content: This refers to the type of data that's stored in the tensor, and can be checked by calling the type() method on a tensor of interest. This will return data types such as float32, uint8, float64, and so on, except for string values, which are first converted into vector representations before being represented as a tensor.
The following is a tensor graph. Don't worry about the complex diagram—we will look at what it means later:
