Tabular Data, Dense Neural Networks

Deadline

<aside> 💡 UPD: March 19, 2023 11:59 PM (GMT+2)

</aside>

Prerequisites

Install Jupyter Lab/Notebook and required libraries/frameworks → Follow Installing Jupyter Notebook → Install PyTorch
Get familiar with NumPy arrays → Follow Python Numpy Array Tutorial
Generate a sizeable tabular dataset with 10 features, 1 regression target, 150 000 rows → Use sklearn.datasets.make_friedman1

<aside> 💡 Warning: each dataset, feature, and target must be unique for each student. Copy-pasted datasets are immediately evaluated at 0 points for all copies

</aside>

Tasks

Fit the target using Linear Regression (LR) (1 point) → Use sklearn LinearRegression Remember this result as the baseline. **Plot learning curves.
Fit target with all combinations of DNNs (1, 5, 25 layers), optimizers (SGD, RMSProp, Adam - with default parameters except learning rate), learning rates (0.1, 0.01, 0.001), and the same loss function = MSE.

Plot learning curves (R2, MAPE, MSE - Loss function) for every combination (27 charts) (7 points)

DNN 1 hidden layer, SGD, LR = 0.1 DNN 1 hidden layer, SGD, LR = 0.01 DNN 1 hidden layer, SGD, LR = 0.001 DNN 1 hidden layer, RMSProp, LR = 0.1 DNN 1 hidden layer, RMSProp, LR = 0.01 DNN 1 hidden layer, RMSProp, LR = 0.001 DNN 1 hidden layer, Adam, LR = 0.1 DNN 1 hidden layer, Adam, LR = 0.01 DNN 1 hidden layer, Adam, LR = 0.001

DNN 5 hidden layers, SGD, LR = 0.1 DNN 5 hidden layers, SGD, LR = 0.01 DNN 5 hidden layers, SGD, LR = 0.001 DNN 5 hidden layers, RMSProp, LR = 0.1 DNN 5 hidden layers, RMSProp, LR = 0.01 DNN 5 hidden layers, RMSProp, LR = 0.001 DNN 5 hidden layers, Adam, LR = 0.1 DNN 5 hidden layers, Adam, LR = 0.01 DNN 5 hidden layers, Adam, LR = 0.001

DNN 25 hidden layers, SGD, LR = 0.1 DNN 25 hidden layers, SGD, LR= 0.01 DNN 25 hidden layers, SGD, LR = 0.001 DNN 25 hidden layers, RMSProp, LR = 0.1 DNN 25 hidden layers, RMSProp, LR = 0.01 DNN 25 hidden layers, RMSProp, LR = 0.001 DNN 25 hidden layers, Adam, LR = 0.1 DNN 25 hidden layers, Adam, LR = 0.01 DNN 25 hidden layers, Adam, LR = 0.001

(try to write a nested double for loop over the grid of all settings) 3. Compare DNN results with the baseline (LR). Make conclusions: (2 points)

What models and combinations of hyper-parameters worked best? Why?
Influence of optimizer?
Influence of learning rate?
Influence of DNN architecture (number of layers)?
Influence of data distribution (linear or non-linear?)

The conclusions should be technically correct and specific to your results rather than general.

How to Plot Learning Curves

Tensorboard tutorial: https://www.tensorflow.org/tensorboard/tensorboard_in_notebooks

<aside> 💡 Spoiler: this is just an example. Your curves will be different. Also, your plots must have 3 sets of learning curve pairs (Train/Test): R2, MAPE, and MSE. 6 curves on 1 plot in total or side-by-side 3 plots each with a pair of Train/Test curves per each metric

</aside>

…

history = model.fit(x_train, y_train)

…

*# plot learning curves* print(history.history.keys())