Home / Blog / Data Science / Energy Sector-Auto Encoders

Energy Sector-Auto Encoders

June 27, 2024
25

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of AiSPRY and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 18+ years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

Energy Sector: Auto encoders-Unsupervised Approach

This article demonstrates an unsupervised method for a dataset in the energy domain. This article focuses on using deep learning to uncover the dataset's hidden patterns. Unsupervised learning is a kind of machine learning that focuses on spotting trends and patterns in datasets. The goal variable does not need to be identified or labelled for this type of learning. This model doesn't have a training step. The technique utilised here can determine whether or not there is an underlying data structure. If such structure is present in the data, it may also indicate the presence of clusters, strange data, etc.

In case of Deep unsupervised learning, target is unknown as it is a unsupervised model. The deep learning techniques can be used to find the relation between target and predictor variables. This can be used to arrive at a conclusion about the outcome of the dataset (The outcome is the predicted value of the target.

The instance of energy drawn from solar panels is the business issue that is being researched. The energy industry, in particular renewable energy sources, is the subject of the business issue. Here, solar electricity or solar panels are the subject of investigation. To convert heat and light energy into electrical power, the solar panel makes use of silicon and photovoltaic solar cells. It has developed quite a reputation as a result of its capacity to draw electricity directly from the sun. It is the long-term energy solution.

The business problem relates to this sector of renewable form of energy. The problem statement is, ‘Considering a scenario where 2 sites for power generation are there, out of which site 1 has solar panels installed and constant benefits in the form of savings from the electricity board is being seen. Similar kind of solar panels are being considered for site 2 which has a different geographic positioning. Before installation or investing money on site 2, it is imperative that we know site 2 will also function well. By this analysis, a clear cut idea concerning investment and power needs at that location.’

Statement

Predict the electricity that can be generated from solar panels put at a new site B on any future date if B and B' are close by, that is, if they share the same location, based on previous data generated from the installed solar panels at sites A, B' during the last M months.

Dataset:The dataset is based on solar panels. Solar Dataset: (239538, 10)

The data dates from a period of December 2019 to October 2020 for a particular site. The shape of the data gives the information that total no of columns present is 10 and the rows present is around 0.23 million. This data will be converted to Numpy arrays to for further processing. A snippet of the array code is given here.

array( [2019., 12., 4., …, 6., 0., 0.],
[2019., 12., 4., …, 6., 1., 0.],
[2019., 12., 4., …, 6., 2., 0.],
…,
[2020., 10., 4., …, 17., 57., 0.],
[2020., 10., 4., …, 17., 58., 0.],
[2020., 10., 4., …, 17., 59., 0.]])

Sunlight is the primary source of energy for solar panels. The dataset includes columns for things like Year, Month, Quarter, Day of Week, Day of Month, Day of Year, Weekday of Year, Hour, Minute, etc. that may account for seasonality and other things that effect electricity generation. The predictors are these columns. The graph below represents a typical day in the production of electricity.

This graph makes it obvious that the curve is a normal (Gaussian) curve. Between the hours of 6:00 and 18:00, the majority of the energy is produced. (Minutes are used to symbolise time) The sun is shining during this time. The power begins weakly at first, then progressively builds, sometimes reaching its peak value around midday. The strength gradually starts to fade around midday. A bell curve is created by this. This leads to the obvious conclusion that the pattern will be the same everywhere, regardless of location. The amount of electricity produced is the sole variation. The quantity of power will also be influenced by elements like as the climate, the time of year, and the amount of sunshine.

Auto-encoders (AE)

Auto-encoders are a special type of feed-forward neural network which can be used for unsupervised modeling. In this model, the algorithm compresses the original data and then reconstructs it from the compressed data. It is done using two components, an encoder, and a decoder where the encoder compresses and the decoder reconstructs. The main objective here is to transform the input into outputs with very few distortions. Auto-encoders work very well under these conditions where output and input are similar to each other (E.g. Image denoising).

Auto-encoders can also be used as a feature extraction tool, especially when high-dimensional data is present. Generally, traditional Auto-encoders are used for image datasets but can also be used for other datasets. In this article, two different variants of Auto-encoders will be demonstrated, one with LSTM and the second one using traditional Auto-encoders. As per the above example, the output for a new site will be calculated from a given site by using Auto-encoders. The output of the new site should be similar to the old one.

Case 1: We are attempting to determine the output of B using LSTM-based Auto-encoders from the historical data provided for the two locations A and B' from December 2019 to October 2020. The geographical location of B and B' are the same. The target location B has no known information. The historical data from sites A and B should provide this result.

The data are first normalised, with the normalised data for site A being 'X1' and 'X2' and for site B being 'X3' and 'X4'. With slight change in X2[:9] and X4[:9], which include the power value of these locations, the datasets are identical to one another. We will convert X1 and X2 using the LSTM Auto-encoder. 'Linear' activation is utilised to forecast the values as closely as feasible to actuals, while Mean Absolute Error is employed as the loss function.

The code that follows is only an example of how to create an LSTM-based AE model. To improve outcomes, tuning can be carried out. To obtain a more accurate approximation, hyperparameter tweaking may be done by utilising several optimizers, numerous layers, and varying values of epochs.

from keras.models import Sequential
from keras.layers import LSTM
from keras.layers import Dense,Dropout
from keras.layers import RepeatVector
from keras.layers import TimeDistributed
from keras.callbacks import EarlyStopping,M
odelCheckpoint
from keras.regularizers import l1
from keras import regularizers
# define model
model = Sequential()
model.add(LSTM(256, activation='linear', input_shape=(X1801861s.shape[1],1),activity_regularizer=regularizers.l1(10e-,return_sequences=False))
model.add(RepeatVector(X1s.shape[1]))
model.add(LSTM(256, activation='linear', return_sequences=True))
model.add(TimeDistributed(Dense(1, activation='linear')))
adam = keras.optimizers.Adam(lr=0.001)
model.compile(optimizer=adam, loss='MAE')
model.summary()
earlyStopping = EarlyStopping(monitor='val_loss', patience=30, verbose=0, mode='min')
mcp_save = ModelCheckpoint('sola-001.mdl_wts.hdf5', save_best_only=True, monitor='val_loss', mode='min')
history = model.fit(X1s, X2s, epochs=500, batch_size=1024,callbacks=[earlyStopping, mcp_save], validation_data = (X3s,X4s)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_8 (LSTM) (None, 256) 264192
_________________________________________________________________
repeat_vector_4 (RepeatVecto (None, 10, 256) 0
_________________________________________________________________
lstm_9 (LSTM) (None, 10, 256) 525312
_________________________________________________________________
time_distributed_4 (TimeDist (None, 10, 1) 257
=================================================================
Total params: 789,761
Trainable params: 789,761
Non-trainable params: 0

Based on the data from sites A and B', the forecast for site B is shown in the graphic below. The aforementioned justification suggests that B should be situated fairly near B' in the plot. Using an LSTM-based Auto-encoder model and the historical data provided for sites A and B', the prediction for site B is carried out. Three arbitrary days are used for the plot.

The graph below displays the variation in power and peak power over a period of 300 days at all 3 locations (A, B', and B).

Case 2: Traditional Autoencoder Model

input_dim = X1s.shape[1]
encoding_dim = 10
input_layer = Input(shape=(input_dim, ))
encoder = Dense(encoding_dim, activation="linear",activity_regularizer=regularizers.l1(10e-5))(input_layer)
decoder = Dense(input_dim, activation='linear')(encoder)
encoder = Model(input_layer,encoder)
autoencoder = Model(inputs=input_layer, outputs=decoder)
adam = keras.optimizers.Adam(lr=0.001)
earlyStopping = EarlyStopping(monitor='val_loss', patience=30, verbose=0, mode='min')
mcp_save = ModelCheckpoint('sola-002.mdl_wts.hdf5', save_best_only=True, monitor='loss', mode='min')
autoencoder.compile(optimizer=adam,loss='MAE')
autoencoder.summary()
Model: "functional_3"
________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 10)] 0
_________________________________________________________________
dense (Dense) (None, 10) 110
_________________________________________________________________
dense_1 (Dense) (None, 10) 110
=================================================================
Total params: 220
Trainable params: 220
Non-trainable params: 0
Evaluate the model after training the LSTM / Regular AE Model :
from sklearn.metrics import
mean_absolute_error,mean_squared_error,mean_squared_log_error
import math
mean_absolute_error(a,pd),math.sqrt(mean_squared_error(a,pd)),mean_absolute_error(act,p),math.sqrt(mean_squared_error(act,p))
Models
MAE – Peak Power
MSE -Peak Power
MAE -Instant Power
MSE -Instant Power
LSTM AE
16.425307183116885
36.75704731479356
1.8404600416117294
12.231826378081937
Regular AE
16.415917917337666
38.456323346248176
2.859275677961502
17.71553061457107
Where p – instant power for the day; pd – peak power for the day
act – actual instant power for the day; a – actual peak power for the day

Conclusion

Lastly, a plot is picked at random between B' and B for 30 days.

From the foregoing, it is obvious that the model has consistently predicted peak values that are near to the actuals discovered at site B' at the appropriate time for many days, with just a few significant differences on other days. This article demonstrates the effectiveness of auto-encoders and how they may be used to find unfamiliar objects. In order to train the model and provide better approximations, more data points should be employed. In order to capture that property, domain-specific variables need also be included. Additionally, if tuning is performed on the same, greater results may be obtained.

Click here to learn Data Science Course, Data Science Course in Hyderabad, Data Science Course in Bangalore