dataframe to CNN

CodingPython 2022. 1. 4. 14:42

https://github.com/Futuremine97/PythonPrac1/blob/main/feature_CNN.ipynb

GitHub - Futuremine97/PythonPrac1

Contribute to Futuremine97/PythonPrac1 development by creating an account on GitHub.

github.com

: pandas dataframe형식으로 데이터를 가져 오고 kerasRNN이 있는 시계열 회귀 모델을 사용하여 예측을 하는 것입니다. 여기서 하나 이상의 독립 X(특징 또는 예측)과 종속이 하나 이상 y있습니다.

https://towardsdatascience.com/how-to-convert-pandas-dataframe-to-keras-rnn-and-back-to-pandas-for-multivariate-regression-dcc34c991df9 이곳을 참고했다

How to Convert Pandas Dataframe to Keras RNN and Back to Pandas for Multivariate Regression…

This post provides a straightforward Python code that takes data in Pandas dataframe and outputs predictions in the same format using…

towardsdatascience.com

y2 = ['sensor1']

y 변수를 define 하여 우리가 예측하고 싶은 것을 정해준다

plt.plot(range(len(df)),df[y1]);

이것을 []으로 묶고 df 객체로 plot 해준다

test_size = int(len(df) * 0.2) # the test data will be 10% (0.1) of the entire data
train = df.iloc[:-test_size,:].copy() 
# the copy() here is important, it will prevent us from getting: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_index,col_indexer] = value instead
test = df.iloc[-test_size:,:].copy()

print(train.shape, test.shape)

학습 및 훈련을 위해 데이터 분할

.copy() 는 같은 객체를 참조하기 때문에 하나를 바꾸면 나머지 하나도 바뀐다.

immutable 하기 때문에 다른 것으로 바뀐다.(수정이 아니라 , 대체)

test 와 train 데이터를 나누기 위해 이런 작업을 하는 것 같다

Xscaler = MinMaxScaler(feature_range=(0, 1)) # scale so that all the X data will range from 0 to 1
Xscaler.fit(X_train)
scaled_X_train = Xscaler.transform(X_train)
print(X_train.shape)
Yscaler = MinMaxScaler(feature_range=(0, 1))
Yscaler.fit(y_train)
scaled_y_train = Yscaler.transform(y_train)
print(scaled_y_train.shape)
scaled_y_train = scaled_y_train.reshape(-1) # remove the second dimention from y so the shape changes from (n,1) to (n,)
print(scaled_y_train.shape)

scaled_y_train = np.insert(scaled_y_train, 0, 0)
scaled_y_train = np.delete(scaled_y_train, -1)

y 의 모양을 keras 에 응용가능하게 만들어 주기 위해 좀 조작을 한다

n은 여기에서 행의 개수를 말하는 것이고

scaled_y_train = np.insert(scaled_y_train, 0, 0)

이부분은 0을 첫 공간에 더해줌으로서 한 스텝 앞으로 푸시 해주는 것이다

last time step 을 제거함(reshape(-1))으로서 모양을 유지한다.

이 부분이 오류가 나서 해결하고 싶다

이것은 y축에 독립변수 2차원이상이므로 [['sensor1]]이라고 넣어주니까 된다 !!!

plt.figure(figsize=(50,4))
plt.plot(train.index,train[y2],label='Train');
plt.plot(test.index,test[y2],label='test')
plt.legend();

x = tf.keras.utils.normalize(df, axis=1) # x becomes a tensor

x = df.astype('float32')
#-->이 두 코드도 중요한 코드이다... 스택오버플로우에서 답을 찾았다...(자료형 오류 때문에)

X_train = train.drop('sensor1',axis=1).copy()
y_train = train[['sensor1']].copy() # the double brakets here are to keep the y in a dataframe format, otherwise it will be pandas Series
# print(X_train.shape, y_train.shape)

Xscaler = MinMaxScaler(feature_range=(0, 1)) # scale so that all the X data will range from 0 to 1

데이터를 0 에서 1로 scale 해준다 Xscaler.fit(X_train)
scaled_X_train = Xscaler.transform(X_train)
print(X_train.shape)
Yscaler = MinMaxScaler(feature_range=(0, 1))
Yscaler.fit(y_train)
scaled_y_train = Yscaler.transform(y_train)
print(scaled_y_train.shape) # 18행 1열 (1열만 빼줬으므로 train 을 위해서)
scaled_y_train = scaled_y_train.reshape(-1) # remove the second dimention from y so the shape changes from (n,1) to (n,)
print(scaled_y_train.shape)

scaled_y_train = np.insert(scaled_y_train, 0, 0)
scaled_y_train = np.delete(scaled_y_train, -1) #--> 차원을 한개 낮춘다

n_input = 25 #how many samples/rows/timesteps to look in the past in order to forecast the next sample
n_features= X_train.shape[1] # how many predictors/Xs/features we have to predict y
b_size = 32 # Number of timeseries samples in each batch

generator = TimeseriesGenerator(scaled_X_train, scaled_y_train, length=17, batch_size=32)

#--> 아이펠에서는 못본 제너레이터. 이 것이 인풋 데이터 같은 느낌인 것 같다. 이 쉐입은 (1,17,35)로 텐서 형태인 것 같다.

1개의 샘플, 17개의 타임 스텝, 35 개의 특징을 가진 3차원 배열 이다

print(generator[0][0].shape)

model = Sequential()
model.add(LSTM(150, activation='relu', input_shape=(n_input, n_features)))

#--> 모든 LSTM 층은 입력의 모양을 명시해야 함

다음 샘플을 예측하기 위해 보아야 할 샘플 혹은 행 수 혹은 타임스텝 수

한 개의 특징은 한 개의 타임 스텝 관측임

model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')
model.summary()

시계열 모델이다.

model.fit(generator,epochs=15)

loss_per_epoch = model.history.history['loss']
plt.plot(range(len(loss_per_epoch)),loss_per_epoch);

두번째 loss 그래프는 16개 열중 2개 열을 test 데이터로 만들었을 때의 그래프이다.

빠르게 내려간다. 대신 과적합도 빠르게 오는 듯...

세 열을 테스트로 넣었을 때. 오버피팅이 크게 일어난다

저작자표시 비영리 변경금지

'CodingPython' 카테고리의 다른 글

tqqq 이번년도 50프로 하락 가능성 존재하지만... (3)	2022.01.14
얼굴인식에 대한 단상 (0)	2022.01.12
make_pipeline 의 문제점? 다양한 모델 (0)	2022.01.09
GradientBoostingClassifierTOcorrelationTOheatmap (0)	2022.01.08
클래스, 힙 메모리, 스택 메모리 개념, 코딩도장 34.6 (0)	2022.01.05

ABOUT ME

파이토치 파이토치

'CodingPython' 카테고리의 다른 글

티스토리툴바

ABOUT ME

'CodingPython' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바