← Вернуться к списку

Сильное смешение дифференциации сигналов из открытого набора фонов с помощью CNN.

Краткое содержание

Я в настоящее время пытаюсь обнаружить сигнал на фоне шума. Сигнал довольно хорошо известен, но фон имеет много вариаций. Я узнал, что эта проблема называется Open Set Recognition. Другим усложняющим фактором является то, что сигнал смешивается с фоновым шумом (представьте себе прозрачное стекло перед пейзажем или услышьте звук падающей булавки в офисе). Когда я начинал этот проект, мне казалось, что современный уровень технологий в этой области — это генерация спектрограмм и подача их в CNN, и я пошел по этому пути. Я дошел до точки, в которой, как мне кажется, я преодолел большинство начальных проблем, с которыми можно столкнуться, но результаты все еще недостаточно хороши для внедрения в проекте. Вот общий список шагов, которые я прошел: 1. Сгенерировал 17000 эталонных «сигналов» и 17000 фонов (отрицательные примеры или другие классы в зависимости от того, какую схему нейросети я обучаю). 2. Сгенерировал отдельные тестовые выборки (не обучающие, а...

Полный текст

Heavily mixing signal differentiation from Open Set of backgrounds via CNN Ask Question

Asked 5 years, 9 months ago Modified today Viewed 131 times

Asked 5 years, 9 months ago

2 $\begingroup$ I am currently attempting to detect a signal from background noise. The signal is pretty well known but the background has a lot of variability. I've since come to know this problem as Open Set Recognition. Another complicating factor is that the signal mixes with the background noise (think equivalent to a transparent piece of glass in-front of scenery for a picture, or picking out the sound of a pin drop in an office space). When I started this project, it seemed like the current state of the art in this space was generating Spectrograms and feeding them to a CNN and this is the path I've followed. I'm at a place where I think I've overcome most of the initial problems you might encounter but I'm still not getting good enough results for a project solution. Here's the overall steps I've gone through: Generate 17000 ground truth "signals" and 17000 backgrounds (negatives or other classes depending on what nn scheme I'm training) Generate separate test samples (not training samples but external model validation samples: "blind test") where I take the backgrounds and randomly overlay the signal into it at various intensities. My first attempt was with a pre-built library training solution (ImageAI) with resnet50 base model. This solution is a multiclass classifier so I had 400 each of the signal + 5 other classes that were the background. It did not work well at classifying the signal. I don't think I ever got this off the ground for two reasons a) My spectrogram pictures were not optimised (waay to large) and b) I couldn't adjust the image input shape via the library. It mostly just ended up classifying one background class. I then started building my own neural nets. The first reason to make sure my spectrogram input shape was matched in the input shape of the CNN. The second reason was to test various neural net schemes to see what worked best. The first net I built was a simple feed forward net with a couple of dense layers. This trains to .9998 val_acc. It (like the rest of what I try) produces poor results on my blind tests, in the range of 60% true positive. def build(width, height, depth, classes): # initialize the model along with the input shape to be # "channels last" and the channels dimension itself model = Sequential() inputShape = (height, width, depth) chanDim = -1 # if we are using "channels first", update the input shape # and channels dimension if K.image_data_format() == "channels_first": inputShape = (depth, height, width) chanDim = 1 model.add(Flatten()) model.add(Dense(512, input_shape=(inputShape),activation="relu")) model.add(Dense(128, activation="relu")) model.add(Dense(32, activation="relu")) # sigmoid classifier model.add(Dense(classes)) model.add(Activation("sigmoid")) # return the constructed network architecture return model I then try a "VGG Light" model. Again, trains to .9999 but gives me only 62% true positive results on my blind tests def build(width, height, depth, classes): # initialize the model along with the input shape to be # "channels last" and the channels dimension itself model = Sequential() inputShape = (height, width, depth) chanDim = -1 # if we are using "channels first", update the input shape # and channels dimension if K.image_data_format() == "channels_first": inputShape = (depth, height, width) chanDim = 1 # CONV => RELU => POOL model.add(Conv2D(32, (3, 3), padding="same", input_shape=inputShape)) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(3, 3))) model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(64, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(64, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(GaussianNoise(.05)) # first (and only) set of FC => RELU layers model.add(Flatten()) model.add(Dense(1024)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(Dropout(0.5)) model.add(Dense(512)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(Dropout(.5)) model.add(Dense(128)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(GaussianDropout(0.5)) # sigmoid classifier model.add(Dense(classes)) model.add(Activation("sigmoid")) # return the constructed network architecture return model I then try a "full VGG" net. This again trains to .9999 but only a blind test true positive result of 63%. def build(width, height, depth, classes): # initialize the model along with the input shape to be # "channels last" and the channels dimension itself model = Sequential() inputShape = (height, width, depth) chanDim = -1 # if we are using "channels first", update the input shape # and channels dimension if K.image_data_format() == "channels_first": inputShape = (depth, height, width) chanDim = 1 #CONV => RELU => POOL model.add(Conv2D(64, (3, 3), padding="same", input_shape=inputShape)) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(3, 3))) #model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) #model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(256, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(256, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) #model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(512, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(512, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) #model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(1024, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(1024, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) #model.add(Dropout(0.25)) model.add(GaussianNoise(.1)) # first (and only) set of FC => RELU layers model.add(Flatten()) model.add(Dense(8192)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(Dropout(0.5)) model.add(Dense(4096)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(Dropout(0.5)) model.add(Dense(1024)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(GaussianDropout(0.5)) # sigmoid classifier model.add(Dense(classes)) model.add(Activation("sigmoid")) # return the constructed network architecture return model All of the above are binary_crossentropy trained in keras. I've tried multi-class with these models as well but when testing them on the blind test they usually pick the background rather than the signal. I've also messed around with Autoencoders to try and get the encoder to rebuild the signal well and then compare to known results but haven't been successful yet though I'd be willing to give it another try if everyone thought that might produce better results. In the beginning I ran into unbalanced classification problems (I was noob) but under all the models shown above the classes all have the same number of samples. I'm at the point where the larger VGG models trained on 34,000 samples is taking days and I don't see any better results than a basic, feed forward NN that takes 4 minutes to train. Does anyone see the path forward here? convolutional-neural-networks keras architecture Share Improve this question Follow edited May 18, 2022 at 8:57 nbro 43.2k 14 14 gold badges 121 121 silver badges 222 222 bronze badges asked Apr 17, 2020 at 0:58 Mecho Engineer 21 1 1 bronze badge $\endgroup$ 4 1 $\begingroup$ have you tried using an adjustable learning rate? Try using ReduceLROnPlateau while monitoring validation loss. documentation is at keras.io/callbacks . Sometimes the validation loss surface is like going down into an increasingly narrower valley. You can get further down into the valley if you reduce the learning rate. I also recommend to use an established model like MobileNet it has about 4 million parameters so it is much faster than VGG and about as accurate. $\endgroup$ Gerry P – Gerry P 2020-04-17 08:23:12 +00:00 Commented Apr 17, 2020 at 8:23 1 $\begingroup$ Also in line 5 you state "This trains to .9998 val_acc." I think you meant training accuracy correct? $\endgroup$ Gerry P – Gerry P 2020-04-17 08:29:15 +00:00 Commented Apr 17, 2020 at 8:29 1 $\begingroup$ Having questions about your problem When you say you have a signal, what kind of signal is it? Is it a waveform like an electrical signal? Is it an image embedded in noisy pixels? If it is an electrical signal embeded in noise use a narrowband bandpass filter. If it is an image with noisy pixels use a noise cancelling auto encoder. $\endgroup$ Gerry P – Gerry P 2020-04-17 16:40:42 +00:00 Commented Apr 17, 2020 at 16:40 $\begingroup$ @GerryP To answer some of your questions. Yes, .9998 is acc and val_acc with the training data set. My 60% (.6) accuracy is with data the model has never seen before. I have never used adjustable learning rates but I have used very low learning rates (1e-6) with similar results. The problem is an audio signal in background audio "noise" where the background is extremely varied and often at similar or greater amplitude levels to the signal. For using an established model, is this training a custom set with MobileNet as a base? $\endgroup$ Mecho Engineer – Mecho Engineer 2020-04-18 06:46:54 +00:00 Commented Apr 18, 2020 at 6:46 Add a comment | 2 Answers 2 Sorted by: Reset to default Highest score (default) Date modified (newest first) Date created (oldest first) 0 $\begingroup$ Thanks for the answers. If you are processing an audio signal I think the application of a low pass filter (lpf) would help to enhance the signal to noise ratio. This would help especially if the noise component occupies a large part of the spectrum. If the audio is human speech the majority of the energy is within the 300Hz to 3Khz region. Using a low pass filter with a cutoff frequency of 3Khz would eliminate noise that is in the higher part of the spectrum. You could implement the lpf as a pre-processing function. I am not knowledgeable on the implementation but a search should get you the info you need. I did find an article here. If I recall the process is to convert the time domain signal to the frequency domain using a FFT, then set a cutoff point and reconvert back to the time domain. I also know there are ways to implement that directly in the time domain.Hope this helps. I am also supersized that if you achieve a high validation accuracy that your test set accuracy is so low. Your validation data should be data the network has not seen before just like your test data. Only thing I can think off is that the test data has a very different probability distribution than the training and validation data. How were the various data sets (train, test, validate) selected? Best choice is to select these randomly using something like sklearn train_test_split or Keras ImageDataGenerator flow from directory. Hope this helps. Share Improve this answer Follow edited Apr 18, 2020 at 17:07 answered Apr 18, 2020 at 16:59 Gerry P 724 4 4 silver badges 11 11 bronze badges $\endgroup$ 1 $\begingroup$ Thanks for the additional help. This application is not human speech, unfortunately. I believe that any filtering I could do for the background noise would effectively filter the signal as well. With respect to train, test, and validate, I get train, test from train_test_split. To validate, I take audio clips of the signal and overlay them at various times over background audio (at various intensities) and then run the composited audio clip through the model.predict function. The model has been trained on both the signal and background but just not together. $\endgroup$ Mecho Engineer – Mecho Engineer 2020-04-19 05:07:50 +00:00 Commented Apr 19, 2020 at 5:07 Add a comment | 0 $\begingroup$ To anyone who reads this, I still haven't solved this completely. At the moment I'm doing a lot better with much cleaner data, using a loss metric that matches what I'm after (F1_Score), using a very deep learning model (a custom Inception Resnet V2 model), use a custom learning rate function that depends on the training round's F1 Score, and every training round computing a F1 Score for various dB of signal/noise test sets with which I compute a model wellness score with which I determine if the model is good enough. Pretty close. Share Improve this answer Follow answered Aug 25, 2020 at 1:16 Mecho Engineer 21 1 1 bronze badge $\endgroup$ Add a comment | You must log in to answer this question. Start asking to get answers Find the answer to your question by asking. Ask question Explore related questions convolutional-neural-networks keras architecture See similar questions with these tags.

2 $\begingroup$ I am currently attempting to detect a signal from background noise. The signal is pretty well known but the background has a lot of variability. I've since come to know this problem as Open Set Recognition. Another complicating factor is that the signal mixes with the background noise (think equivalent to a transparent piece of glass in-front of scenery for a picture, or picking out the sound of a pin drop in an office space). When I started this project, it seemed like the current state of the art in this space was generating Spectrograms and feeding them to a CNN and this is the path I've followed. I'm at a place where I think I've overcome most of the initial problems you might encounter but I'm still not getting good enough results for a project solution. Here's the overall steps I've gone through: Generate 17000 ground truth "signals" and 17000 backgrounds (negatives or other classes depending on what nn scheme I'm training) Generate separate test samples (not training samples but external model validation samples: "blind test") where I take the backgrounds and randomly overlay the signal into it at various intensities. My first attempt was with a pre-built library training solution (ImageAI) with resnet50 base model. This solution is a multiclass classifier so I had 400 each of the signal + 5 other classes that were the background. It did not work well at classifying the signal. I don't think I ever got this off the ground for two reasons a) My spectrogram pictures were not optimised (waay to large) and b) I couldn't adjust the image input shape via the library. It mostly just ended up classifying one background class. I then started building my own neural nets. The first reason to make sure my spectrogram input shape was matched in the input shape of the CNN. The second reason was to test various neural net schemes to see what worked best. The first net I built was a simple feed forward net with a couple of dense layers. This trains to .9998 val_acc. It (like the rest of what I try) produces poor results on my blind tests, in the range of 60% true positive. def build(width, height, depth, classes): # initialize the model along with the input shape to be # "channels last" and the channels dimension itself model = Sequential() inputShape = (height, width, depth) chanDim = -1 # if we are using "channels first", update the input shape # and channels dimension if K.image_data_format() == "channels_first": inputShape = (depth, height, width) chanDim = 1 model.add(Flatten()) model.add(Dense(512, input_shape=(inputShape),activation="relu")) model.add(Dense(128, activation="relu")) model.add(Dense(32, activation="relu")) # sigmoid classifier model.add(Dense(classes)) model.add(Activation("sigmoid")) # return the constructed network architecture return model I then try a "VGG Light" model. Again, trains to .9999 but gives me only 62% true positive results on my blind tests def build(width, height, depth, classes): # initialize the model along with the input shape to be # "channels last" and the channels dimension itself model = Sequential() inputShape = (height, width, depth) chanDim = -1 # if we are using "channels first", update the input shape # and channels dimension if K.image_data_format() == "channels_first": inputShape = (depth, height, width) chanDim = 1 # CONV => RELU => POOL model.add(Conv2D(32, (3, 3), padding="same", input_shape=inputShape)) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(3, 3))) model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(64, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(64, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(GaussianNoise(.05)) # first (and only) set of FC => RELU layers model.add(Flatten()) model.add(Dense(1024)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(Dropout(0.5)) model.add(Dense(512)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(Dropout(.5)) model.add(Dense(128)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(GaussianDropout(0.5)) # sigmoid classifier model.add(Dense(classes)) model.add(Activation("sigmoid")) # return the constructed network architecture return model I then try a "full VGG" net. This again trains to .9999 but only a blind test true positive result of 63%. def build(width, height, depth, classes): # initialize the model along with the input shape to be # "channels last" and the channels dimension itself model = Sequential() inputShape = (height, width, depth) chanDim = -1 # if we are using "channels first", update the input shape # and channels dimension if K.image_data_format() == "channels_first": inputShape = (depth, height, width) chanDim = 1 #CONV => RELU => POOL model.add(Conv2D(64, (3, 3), padding="same", input_shape=inputShape)) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(3, 3))) #model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) #model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(256, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(256, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) #model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(512, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(512, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) #model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(1024, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(1024, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) #model.add(Dropout(0.25)) model.add(GaussianNoise(.1)) # first (and only) set of FC => RELU layers model.add(Flatten()) model.add(Dense(8192)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(Dropout(0.5)) model.add(Dense(4096)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(Dropout(0.5)) model.add(Dense(1024)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(GaussianDropout(0.5)) # sigmoid classifier model.add(Dense(classes)) model.add(Activation("sigmoid")) # return the constructed network architecture return model All of the above are binary_crossentropy trained in keras. I've tried multi-class with these models as well but when testing them on the blind test they usually pick the background rather than the signal. I've also messed around with Autoencoders to try and get the encoder to rebuild the signal well and then compare to known results but haven't been successful yet though I'd be willing to give it another try if everyone thought that might produce better results. In the beginning I ran into unbalanced classification problems (I was noob) but under all the models shown above the classes all have the same number of samples. I'm at the point where the larger VGG models trained on 34,000 samples is taking days and I don't see any better results than a basic, feed forward NN that takes 4 minutes to train. Does anyone see the path forward here? convolutional-neural-networks keras architecture Share Improve this question Follow edited May 18, 2022 at 8:57 nbro 43.2k 14 14 gold badges 121 121 silver badges 222 222 bronze badges asked Apr 17, 2020 at 0:58 Mecho Engineer 21 1 1 bronze badge $\endgroup$ 4 1 $\begingroup$ have you tried using an adjustable learning rate? Try using ReduceLROnPlateau while monitoring validation loss. documentation is at keras.io/callbacks . Sometimes the validation loss surface is like going down into an increasingly narrower valley. You can get further down into the valley if you reduce the learning rate. I also recommend to use an established model like MobileNet it has about 4 million parameters so it is much faster than VGG and about as accurate. $\endgroup$ Gerry P – Gerry P 2020-04-17 08:23:12 +00:00 Commented Apr 17, 2020 at 8:23 1 $\begingroup$ Also in line 5 you state "This trains to .9998 val_acc." I think you meant training accuracy correct? $\endgroup$ Gerry P – Gerry P 2020-04-17 08:29:15 +00:00 Commented Apr 17, 2020 at 8:29 1 $\begingroup$ Having questions about your problem When you say you have a signal, what kind of signal is it? Is it a waveform like an electrical signal? Is it an image embedded in noisy pixels? If it is an electrical signal embeded in noise use a narrowband bandpass filter. If it is an image with noisy pixels use a noise cancelling auto encoder. $\endgroup$ Gerry P – Gerry P 2020-04-17 16:40:42 +00:00 Commented Apr 17, 2020 at 16:40 $\begingroup$ @GerryP To answer some of your questions. Yes, .9998 is acc and val_acc with the training data set. My 60% (.6) accuracy is with data the model has never seen before. I have never used adjustable learning rates but I have used very low learning rates (1e-6) with similar results. The problem is an audio signal in background audio "noise" where the background is extremely varied and often at similar or greater amplitude levels to the signal. For using an established model, is this training a custom set with MobileNet as a base? $\endgroup$ Mecho Engineer – Mecho Engineer 2020-04-18 06:46:54 +00:00 Commented Apr 18, 2020 at 6:46 Add a comment |

2 $\begingroup$ I am currently attempting to detect a signal from background noise. The signal is pretty well known but the background has a lot of variability. I've since come to know this problem as Open Set Recognition. Another complicating factor is that the signal mixes with the background noise (think equivalent to a transparent piece of glass in-front of scenery for a picture, or picking out the sound of a pin drop in an office space). When I started this project, it seemed like the current state of the art in this space was generating Spectrograms and feeding them to a CNN and this is the path I've followed. I'm at a place where I think I've overcome most of the initial problems you might encounter but I'm still not getting good enough results for a project solution. Here's the overall steps I've gone through: Generate 17000 ground truth "signals" and 17000 backgrounds (negatives or other classes depending on what nn scheme I'm training) Generate separate test samples (not training samples but external model validation samples: "blind test") where I take the backgrounds and randomly overlay the signal into it at various intensities. My first attempt was with a pre-built library training solution (ImageAI) with resnet50 base model. This solution is a multiclass classifier so I had 400 each of the signal + 5 other classes that were the background. It did not work well at classifying the signal. I don't think I ever got this off the ground for two reasons a) My spectrogram pictures were not optimised (waay to large) and b) I couldn't adjust the image input shape via the library. It mostly just ended up classifying one background class. I then started building my own neural nets. The first reason to make sure my spectrogram input shape was matched in the input shape of the CNN. The second reason was to test various neural net schemes to see what worked best. The first net I built was a simple feed forward net with a couple of dense layers. This trains to .9998 val_acc. It (like the rest of what I try) produces poor results on my blind tests, in the range of 60% true positive. def build(width, height, depth, classes): # initialize the model along with the input shape to be # "channels last" and the channels dimension itself model = Sequential() inputShape = (height, width, depth) chanDim = -1 # if we are using "channels first", update the input shape # and channels dimension if K.image_data_format() == "channels_first": inputShape = (depth, height, width) chanDim = 1 model.add(Flatten()) model.add(Dense(512, input_shape=(inputShape),activation="relu")) model.add(Dense(128, activation="relu")) model.add(Dense(32, activation="relu")) # sigmoid classifier model.add(Dense(classes)) model.add(Activation("sigmoid")) # return the constructed network architecture return model I then try a "VGG Light" model. Again, trains to .9999 but gives me only 62% true positive results on my blind tests def build(width, height, depth, classes): # initialize the model along with the input shape to be # "channels last" and the channels dimension itself model = Sequential() inputShape = (height, width, depth) chanDim = -1 # if we are using "channels first", update the input shape # and channels dimension if K.image_data_format() == "channels_first": inputShape = (depth, height, width) chanDim = 1 # CONV => RELU => POOL model.add(Conv2D(32, (3, 3), padding="same", input_shape=inputShape)) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(3, 3))) model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(64, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(64, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(GaussianNoise(.05)) # first (and only) set of FC => RELU layers model.add(Flatten()) model.add(Dense(1024)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(Dropout(0.5)) model.add(Dense(512)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(Dropout(.5)) model.add(Dense(128)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(GaussianDropout(0.5)) # sigmoid classifier model.add(Dense(classes)) model.add(Activation("sigmoid")) # return the constructed network architecture return model I then try a "full VGG" net. This again trains to .9999 but only a blind test true positive result of 63%. def build(width, height, depth, classes): # initialize the model along with the input shape to be # "channels last" and the channels dimension itself model = Sequential() inputShape = (height, width, depth) chanDim = -1 # if we are using "channels first", update the input shape # and channels dimension if K.image_data_format() == "channels_first": inputShape = (depth, height, width) chanDim = 1 #CONV => RELU => POOL model.add(Conv2D(64, (3, 3), padding="same", input_shape=inputShape)) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(3, 3))) #model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) #model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(256, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(256, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) #model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(512, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(512, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) #model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(1024, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(1024, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) #model.add(Dropout(0.25)) model.add(GaussianNoise(.1)) # first (and only) set of FC => RELU layers model.add(Flatten()) model.add(Dense(8192)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(Dropout(0.5)) model.add(Dense(4096)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(Dropout(0.5)) model.add(Dense(1024)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(GaussianDropout(0.5)) # sigmoid classifier model.add(Dense(classes)) model.add(Activation("sigmoid")) # return the constructed network architecture return model All of the above are binary_crossentropy trained in keras. I've tried multi-class with these models as well but when testing them on the blind test they usually pick the background rather than the signal. I've also messed around with Autoencoders to try and get the encoder to rebuild the signal well and then compare to known results but haven't been successful yet though I'd be willing to give it another try if everyone thought that might produce better results. In the beginning I ran into unbalanced classification problems (I was noob) but under all the models shown above the classes all have the same number of samples. I'm at the point where the larger VGG models trained on 34,000 samples is taking days and I don't see any better results than a basic, feed forward NN that takes 4 minutes to train. Does anyone see the path forward here? convolutional-neural-networks keras architecture Share Improve this question Follow edited May 18, 2022 at 8:57 nbro 43.2k 14 14 gold badges 121 121 silver badges 222 222 bronze badges asked Apr 17, 2020 at 0:58 Mecho Engineer 21 1 1 bronze badge $\endgroup$ 4 1 $\begingroup$ have you tried using an adjustable learning rate? Try using ReduceLROnPlateau while monitoring validation loss. documentation is at keras.io/callbacks . Sometimes the validation loss surface is like going down into an increasingly narrower valley. You can get further down into the valley if you reduce the learning rate. I also recommend to use an established model like MobileNet it has about 4 million parameters so it is much faster than VGG and about as accurate. $\endgroup$ Gerry P – Gerry P 2020-04-17 08:23:12 +00:00 Commented Apr 17, 2020 at 8:23 1 $\begingroup$ Also in line 5 you state "This trains to .9998 val_acc." I think you meant training accuracy correct? $\endgroup$ Gerry P – Gerry P 2020-04-17 08:29:15 +00:00 Commented Apr 17, 2020 at 8:29 1 $\begingroup$ Having questions about your problem When you say you have a signal, what kind of signal is it? Is it a waveform like an electrical signal? Is it an image embedded in noisy pixels? If it is an electrical signal embeded in noise use a narrowband bandpass filter. If it is an image with noisy pixels use a noise cancelling auto encoder. $\endgroup$ Gerry P – Gerry P 2020-04-17 16:40:42 +00:00 Commented Apr 17, 2020 at 16:40 $\begingroup$ @GerryP To answer some of your questions. Yes, .9998 is acc and val_acc with the training data set. My 60% (.6) accuracy is with data the model has never seen before. I have never used adjustable learning rates but I have used very low learning rates (1e-6) with similar results. The problem is an audio signal in background audio "noise" where the background is extremely varied and often at similar or greater amplitude levels to the signal. For using an established model, is this training a custom set with MobileNet as a base? $\endgroup$ Mecho Engineer – Mecho Engineer 2020-04-18 06:46:54 +00:00 Commented Apr 18, 2020 at 6:46 Add a comment |

$\begingroup$ I am currently attempting to detect a signal from background noise. The signal is pretty well known but the background has a lot of variability. I've since come to know this problem as Open Set Recognition. Another complicating factor is that the signal mixes with the background noise (think equivalent to a transparent piece of glass in-front of scenery for a picture, or picking out the sound of a pin drop in an office space). When I started this project, it seemed like the current state of the art in this space was generating Spectrograms and feeding them to a CNN and this is the path I've followed. I'm at a place where I think I've overcome most of the initial problems you might encounter but I'm still not getting good enough results for a project solution. Here's the overall steps I've gone through: Generate 17000 ground truth "signals" and 17000 backgrounds (negatives or other classes depending on what nn scheme I'm training) Generate separate test samples (not training samples but external model validation samples: "blind test") where I take the backgrounds and randomly overlay the signal into it at various intensities. My first attempt was with a pre-built library training solution (ImageAI) with resnet50 base model. This solution is a multiclass classifier so I had 400 each of the signal + 5 other classes that were the background. It did not work well at classifying the signal. I don't think I ever got this off the ground for two reasons a) My spectrogram pictures were not optimised (waay to large) and b) I couldn't adjust the image input shape via the library. It mostly just ended up classifying one background class. I then started building my own neural nets. The first reason to make sure my spectrogram input shape was matched in the input shape of the CNN. The second reason was to test various neural net schemes to see what worked best. The first net I built was a simple feed forward net with a couple of dense layers. This trains to .9998 val_acc. It (like the rest of what I try) produces poor results on my blind tests, in the range of 60% true positive. def build(width, height, depth, classes): # initialize the model along with the input shape to be # "channels last" and the channels dimension itself model = Sequential() inputShape = (height, width, depth) chanDim = -1 # if we are using "channels first", update the input shape # and channels dimension if K.image_data_format() == "channels_first": inputShape = (depth, height, width) chanDim = 1 model.add(Flatten()) model.add(Dense(512, input_shape=(inputShape),activation="relu")) model.add(Dense(128, activation="relu")) model.add(Dense(32, activation="relu")) # sigmoid classifier model.add(Dense(classes)) model.add(Activation("sigmoid")) # return the constructed network architecture return model I then try a "VGG Light" model. Again, trains to .9999 but gives me only 62% true positive results on my blind tests def build(width, height, depth, classes): # initialize the model along with the input shape to be # "channels last" and the channels dimension itself model = Sequential() inputShape = (height, width, depth) chanDim = -1 # if we are using "channels first", update the input shape # and channels dimension if K.image_data_format() == "channels_first": inputShape = (depth, height, width) chanDim = 1 # CONV => RELU => POOL model.add(Conv2D(32, (3, 3), padding="same", input_shape=inputShape)) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(3, 3))) model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(64, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(64, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(GaussianNoise(.05)) # first (and only) set of FC => RELU layers model.add(Flatten()) model.add(Dense(1024)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(Dropout(0.5)) model.add(Dense(512)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(Dropout(.5)) model.add(Dense(128)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(GaussianDropout(0.5)) # sigmoid classifier model.add(Dense(classes)) model.add(Activation("sigmoid")) # return the constructed network architecture return model I then try a "full VGG" net. This again trains to .9999 but only a blind test true positive result of 63%. def build(width, height, depth, classes): # initialize the model along with the input shape to be # "channels last" and the channels dimension itself model = Sequential() inputShape = (height, width, depth) chanDim = -1 # if we are using "channels first", update the input shape # and channels dimension if K.image_data_format() == "channels_first": inputShape = (depth, height, width) chanDim = 1 #CONV => RELU => POOL model.add(Conv2D(64, (3, 3), padding="same", input_shape=inputShape)) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(3, 3))) #model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) #model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(256, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(256, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) #model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(512, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(512, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) #model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(1024, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(1024, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) #model.add(Dropout(0.25)) model.add(GaussianNoise(.1)) # first (and only) set of FC => RELU layers model.add(Flatten()) model.add(Dense(8192)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(Dropout(0.5)) model.add(Dense(4096)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(Dropout(0.5)) model.add(Dense(1024)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(GaussianDropout(0.5)) # sigmoid classifier model.add(Dense(classes)) model.add(Activation("sigmoid")) # return the constructed network architecture return model All of the above are binary_crossentropy trained in keras. I've tried multi-class with these models as well but when testing them on the blind test they usually pick the background rather than the signal. I've also messed around with Autoencoders to try and get the encoder to rebuild the signal well and then compare to known results but haven't been successful yet though I'd be willing to give it another try if everyone thought that might produce better results. In the beginning I ran into unbalanced classification problems (I was noob) but under all the models shown above the classes all have the same number of samples. I'm at the point where the larger VGG models trained on 34,000 samples is taking days and I don't see any better results than a basic, feed forward NN that takes 4 minutes to train. Does anyone see the path forward here? convolutional-neural-networks keras architecture Share Improve this question Follow edited May 18, 2022 at 8:57 nbro 43.2k 14 14 gold badges 121 121 silver badges 222 222 bronze badges asked Apr 17, 2020 at 0:58 Mecho Engineer 21 1 1 bronze badge $\endgroup$

I am currently attempting to detect a signal from background noise. The signal is pretty well known but the background has a lot of variability. I've since come to know this problem as Open Set Recognition. Another complicating factor is that the signal mixes with the background noise (think equivalent to a transparent piece of glass in-front of scenery for a picture, or picking out the sound of a pin drop in an office space). When I started this project, it seemed like the current state of the art in this space was generating Spectrograms and feeding them to a CNN and this is the path I've followed. I'm at a place where I think I've overcome most of the initial problems you might encounter but I'm still not getting good enough results for a project solution. Here's the overall steps I've gone through: Generate 17000 ground truth "signals" and 17000 backgrounds (negatives or other classes depending on what nn scheme I'm training) Generate separate test samples (not training samples but external model validation samples: "blind test") where I take the backgrounds and randomly overlay the signal into it at various intensities. My first attempt was with a pre-built library training solution (ImageAI) with resnet50 base model. This solution is a multiclass classifier so I had 400 each of the signal + 5 other classes that were the background. It did not work well at classifying the signal. I don't think I ever got this off the ground for two reasons a) My spectrogram pictures were not optimised (waay to large) and b) I couldn't adjust the image input shape via the library. It mostly just ended up classifying one background class. I then started building my own neural nets. The first reason to make sure my spectrogram input shape was matched in the input shape of the CNN. The second reason was to test various neural net schemes to see what worked best. The first net I built was a simple feed forward net with a couple of dense layers. This trains to .9998 val_acc. It (like the rest of what I try) produces poor results on my blind tests, in the range of 60% true positive. def build(width, height, depth, classes): # initialize the model along with the input shape to be # "channels last" and the channels dimension itself model = Sequential() inputShape = (height, width, depth) chanDim = -1 # if we are using "channels first", update the input shape # and channels dimension if K.image_data_format() == "channels_first": inputShape = (depth, height, width) chanDim = 1 model.add(Flatten()) model.add(Dense(512, input_shape=(inputShape),activation="relu")) model.add(Dense(128, activation="relu")) model.add(Dense(32, activation="relu")) # sigmoid classifier model.add(Dense(classes)) model.add(Activation("sigmoid")) # return the constructed network architecture return model I then try a "VGG Light" model. Again, trains to .9999 but gives me only 62% true positive results on my blind tests def build(width, height, depth, classes): # initialize the model along with the input shape to be # "channels last" and the channels dimension itself model = Sequential() inputShape = (height, width, depth) chanDim = -1 # if we are using "channels first", update the input shape # and channels dimension if K.image_data_format() == "channels_first": inputShape = (depth, height, width) chanDim = 1 # CONV => RELU => POOL model.add(Conv2D(32, (3, 3), padding="same", input_shape=inputShape)) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(3, 3))) model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(64, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(64, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) model.add(Dropout(0.25)) model.add(GaussianNoise(.05)) # first (and only) set of FC => RELU layers model.add(Flatten()) model.add(Dense(1024)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(Dropout(0.5)) model.add(Dense(512)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(Dropout(.5)) model.add(Dense(128)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(GaussianDropout(0.5)) # sigmoid classifier model.add(Dense(classes)) model.add(Activation("sigmoid")) # return the constructed network architecture return model I then try a "full VGG" net. This again trains to .9999 but only a blind test true positive result of 63%. def build(width, height, depth, classes): # initialize the model along with the input shape to be # "channels last" and the channels dimension itself model = Sequential() inputShape = (height, width, depth) chanDim = -1 # if we are using "channels first", update the input shape # and channels dimension if K.image_data_format() == "channels_first": inputShape = (depth, height, width) chanDim = 1 #CONV => RELU => POOL model.add(Conv2D(64, (3, 3), padding="same", input_shape=inputShape)) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(3, 3))) #model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(128, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) #model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(256, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(256, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) #model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(512, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(512, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) #model.add(Dropout(0.25)) # (CONV => RELU) * 2 => POOL model.add(Conv2D(1024, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(Conv2D(1024, (3, 3), padding="same")) model.add(Activation("relu")) model.add(BatchNormalization(axis=chanDim)) model.add(MaxPooling2D(pool_size=(2, 2))) #model.add(Dropout(0.25)) model.add(GaussianNoise(.1)) # first (and only) set of FC => RELU layers model.add(Flatten()) model.add(Dense(8192)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(Dropout(0.5)) model.add(Dense(4096)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(Dropout(0.5)) model.add(Dense(1024)) model.add(Activation("relu")) model.add(BatchNormalization()) model.add(GaussianDropout(0.5)) # sigmoid classifier model.add(Dense(classes)) model.add(Activation("sigmoid")) # return the constructed network architecture return model All of the above are binary_crossentropy trained in keras. I've tried multi-class with these models as well but when testing them on the blind test they usually pick the background rather than the signal. I've also messed around with Autoencoders to try and get the encoder to rebuild the signal well and then compare to known results but haven't been successful yet though I'd be willing to give it another try if everyone thought that might produce better results. In the beginning I ran into unbalanced classification problems (I was noob) but under all the models shown above the classes all have the same number of samples. I'm at the point where the larger VGG models trained on 34,000 samples is taking days and I don't see any better results than a basic, feed forward NN that takes 4 minutes to train. Does anyone see the path forward here?

I am currently attempting to detect a signal from background noise. The signal is pretty well known but the background has a lot of variability. I've since come to know this problem as Open Set Recognition. Another complicating factor is that the signal mixes with the background noise (think equivalent to a transparent piece of glass in-front of scenery for a picture, or picking out the sound of a pin drop in an office space).

When I started this project, it seemed like the current state of the art in this space was generating Spectrograms and feeding them to a CNN and this is the path I've followed. I'm at a place where I think I've overcome most of the initial problems you might encounter but I'm still not getting good enough results for a project solution.

Here's the overall steps I've gone through:

Generate 17000 ground truth "signals" and 17000 backgrounds (negatives or other classes depending on what nn scheme I'm training)

Generate separate test samples (not training samples but external model validation samples: "blind test") where I take the backgrounds and randomly overlay the signal into it at various intensities.

My first attempt was with a pre-built library training solution (ImageAI) with resnet50 base model. This solution is a multiclass classifier so I had 400 each of the signal + 5 other classes that were the background. It did not work well at classifying the signal. I don't think I ever got this off the ground for two reasons a) My spectrogram pictures were not optimised (waay to large) and b) I couldn't adjust the image input shape via the library. It mostly just ended up classifying one background class.

I then started building my own neural nets. The first reason to make sure my spectrogram input shape was matched in the input shape of the CNN. The second reason was to test various neural net schemes to see what worked best.

The first net I built was a simple feed forward net with a couple of dense layers. This trains to .9998 val_acc. It (like the rest of what I try) produces poor results on my blind tests, in the range of 60% true positive.

I then try a "VGG Light" model. Again, trains to .9999 but gives me only 62% true positive results on my blind tests

I then try a "full VGG" net. This again trains to .9999 but only a blind test true positive result of 63%.

All of the above are binary_crossentropy trained in keras.

I've tried multi-class with these models as well but when testing them on the blind test they usually pick the background rather than the signal.

I've also messed around with Autoencoders to try and get the encoder to rebuild the signal well and then compare to known results but haven't been successful yet though I'd be willing to give it another try if everyone thought that might produce better results.

In the beginning I ran into unbalanced classification problems (I was noob) but under all the models shown above the classes all have the same number of samples.

I'm at the point where the larger VGG models trained on 34,000 samples is taking days and I don't see any better results than a basic, feed forward NN that takes 4 minutes to train.

Does anyone see the path forward here?

convolutional-neural-networks keras architecture

convolutional-neural-networks keras architecture

convolutional-neural-networks keras architecture

Share Improve this question Follow edited May 18, 2022 at 8:57 nbro 43.2k 14 14 gold badges 121 121 silver badges 222 222 bronze badges asked Apr 17, 2020 at 0:58 Mecho Engineer 21 1 1 bronze badge

Share Improve this question Follow edited May 18, 2022 at 8:57 nbro 43.2k 14 14 gold badges 121 121 silver badges 222 222 bronze badges asked Apr 17, 2020 at 0:58 Mecho Engineer 21 1 1 bronze badge

Share Improve this question Follow

Share Improve this question Follow

Share Improve this question Follow

Improve this question

edited May 18, 2022 at 8:57 nbro 43.2k 14 14 gold badges 121 121 silver badges 222 222 bronze badges

edited May 18, 2022 at 8:57 nbro 43.2k 14 14 gold badges 121 121 silver badges 222 222 bronze badges

edited May 18, 2022 at 8:57

edited May 18, 2022 at 8:57

nbro 43.2k 14 14 gold badges 121 121 silver badges 222 222 bronze badges

43.2k 14 14 gold badges 121 121 silver badges 222 222 bronze badges

asked Apr 17, 2020 at 0:58 Mecho Engineer 21 1 1 bronze badge

asked Apr 17, 2020 at 0:58 Mecho Engineer 21 1 1 bronze badge

asked Apr 17, 2020 at 0:58

asked Apr 17, 2020 at 0:58

Mecho Engineer 21 1 1 bronze badge

1 $\begingroup$ have you tried using an adjustable learning rate? Try using ReduceLROnPlateau while monitoring validation loss. documentation is at keras.io/callbacks . Sometimes the validation loss surface is like going down into an increasingly narrower valley. You can get further down into the valley if you reduce the learning rate. I also recommend to use an established model like MobileNet it has about 4 million parameters so it is much faster than VGG and about as accurate. $\endgroup$ Gerry P – Gerry P 2020-04-17 08:23:12 +00:00 Commented Apr 17, 2020 at 8:23 1 $\begingroup$ Also in line 5 you state "This trains to .9998 val_acc." I think you meant training accuracy correct? $\endgroup$ Gerry P – Gerry P 2020-04-17 08:29:15 +00:00 Commented Apr 17, 2020 at 8:29 1 $\begingroup$ Having questions about your problem When you say you have a signal, what kind of signal is it? Is it a waveform like an electrical signal? Is it an image embedded in noisy pixels? If it is an electrical signal embeded in noise use a narrowband bandpass filter. If it is an image with noisy pixels use a noise cancelling auto encoder. $\endgroup$ Gerry P – Gerry P 2020-04-17 16:40:42 +00:00 Commented Apr 17, 2020 at 16:40 $\begingroup$ @GerryP To answer some of your questions. Yes, .9998 is acc and val_acc with the training data set. My 60% (.6) accuracy is with data the model has never seen before. I have never used adjustable learning rates but I have used very low learning rates (1e-6) with similar results. The problem is an audio signal in background audio "noise" where the background is extremely varied and often at similar or greater amplitude levels to the signal. For using an established model, is this training a custom set with MobileNet as a base? $\endgroup$ Mecho Engineer – Mecho Engineer 2020-04-18 06:46:54 +00:00 Commented Apr 18, 2020 at 6:46 Add a comment |

1 $\begingroup$ have you tried using an adjustable learning rate? Try using ReduceLROnPlateau while monitoring validation loss. documentation is at keras.io/callbacks . Sometimes the validation loss surface is like going down into an increasingly narrower valley. You can get further down into the valley if you reduce the learning rate. I also recommend to use an established model like MobileNet it has about 4 million parameters so it is much faster than VGG and about as accurate. $\endgroup$ Gerry P – Gerry P 2020-04-17 08:23:12 +00:00 Commented Apr 17, 2020 at 8:23 1 $\begingroup$ Also in line 5 you state "This trains to .9998 val_acc." I think you meant training accuracy correct? $\endgroup$ Gerry P – Gerry P 2020-04-17 08:29:15 +00:00 Commented Apr 17, 2020 at 8:29 1 $\begingroup$ Having questions about your problem When you say you have a signal, what kind of signal is it? Is it a waveform like an electrical signal? Is it an image embedded in noisy pixels? If it is an electrical signal embeded in noise use a narrowband bandpass filter. If it is an image with noisy pixels use a noise cancelling auto encoder. $\endgroup$ Gerry P – Gerry P 2020-04-17 16:40:42 +00:00 Commented Apr 17, 2020 at 16:40 $\begingroup$ @GerryP To answer some of your questions. Yes, .9998 is acc and val_acc with the training data set. My 60% (.6) accuracy is with data the model has never seen before. I have never used adjustable learning rates but I have used very low learning rates (1e-6) with similar results. The problem is an audio signal in background audio "noise" where the background is extremely varied and often at similar or greater amplitude levels to the signal. For using an established model, is this training a custom set with MobileNet as a base? $\endgroup$ Mecho Engineer – Mecho Engineer 2020-04-18 06:46:54 +00:00 Commented Apr 18, 2020 at 6:46

$\begingroup$ have you tried using an adjustable learning rate? Try using ReduceLROnPlateau while monitoring validation loss. documentation is at keras.io/callbacks . Sometimes the validation loss surface is like going down into an increasingly narrower valley. You can get further down into the valley if you reduce the learning rate. I also recommend to use an established model like MobileNet it has about 4 million parameters so it is much faster than VGG and about as accurate. $\endgroup$ Gerry P – Gerry P 2020-04-17 08:23:12 +00:00 Commented Apr 17, 2020 at 8:23

$\begingroup$ have you tried using an adjustable learning rate? Try using ReduceLROnPlateau while monitoring validation loss. documentation is at keras.io/callbacks . Sometimes the validation loss surface is like going down into an increasingly narrower valley. You can get further down into the valley if you reduce the learning rate. I also recommend to use an established model like MobileNet it has about 4 million parameters so it is much faster than VGG and about as accurate. $\endgroup$ Gerry P – Gerry P 2020-04-17 08:23:12 +00:00 Commented Apr 17, 2020 at 8:23

2020-04-17 08:23:12 +00:00

$\begingroup$ Also in line 5 you state "This trains to .9998 val_acc." I think you meant training accuracy correct? $\endgroup$ Gerry P – Gerry P 2020-04-17 08:29:15 +00:00 Commented Apr 17, 2020 at 8:29

$\begingroup$ Also in line 5 you state "This trains to .9998 val_acc." I think you meant training accuracy correct? $\endgroup$ Gerry P – Gerry P 2020-04-17 08:29:15 +00:00 Commented Apr 17, 2020 at 8:29

2020-04-17 08:29:15 +00:00

$\begingroup$ Having questions about your problem When you say you have a signal, what kind of signal is it? Is it a waveform like an electrical signal? Is it an image embedded in noisy pixels? If it is an electrical signal embeded in noise use a narrowband bandpass filter. If it is an image with noisy pixels use a noise cancelling auto encoder. $\endgroup$ Gerry P – Gerry P 2020-04-17 16:40:42 +00:00 Commented Apr 17, 2020 at 16:40

$\begingroup$ Having questions about your problem When you say you have a signal, what kind of signal is it? Is it a waveform like an electrical signal? Is it an image embedded in noisy pixels? If it is an electrical signal embeded in noise use a narrowband bandpass filter. If it is an image with noisy pixels use a noise cancelling auto encoder. $\endgroup$ Gerry P – Gerry P 2020-04-17 16:40:42 +00:00 Commented Apr 17, 2020 at 16:40

2020-04-17 16:40:42 +00:00

$\begingroup$ @GerryP To answer some of your questions. Yes, .9998 is acc and val_acc with the training data set. My 60% (.6) accuracy is with data the model has never seen before. I have never used adjustable learning rates but I have used very low learning rates (1e-6) with similar results. The problem is an audio signal in background audio "noise" where the background is extremely varied and often at similar or greater amplitude levels to the signal. For using an established model, is this training a custom set with MobileNet as a base? $\endgroup$ Mecho Engineer – Mecho Engineer 2020-04-18 06:46:54 +00:00 Commented Apr 18, 2020 at 6:46

$\begingroup$ @GerryP To answer some of your questions. Yes, .9998 is acc and val_acc with the training data set. My 60% (.6) accuracy is with data the model has never seen before. I have never used adjustable learning rates but I have used very low learning rates (1e-6) with similar results. The problem is an audio signal in background audio "noise" where the background is extremely varied and often at similar or greater amplitude levels to the signal. For using an established model, is this training a custom set with MobileNet as a base? $\endgroup$ Mecho Engineer – Mecho Engineer 2020-04-18 06:46:54 +00:00 Commented Apr 18, 2020 at 6:46

Mecho Engineer – Mecho Engineer

2020-04-18 06:46:54 +00:00

2 Answers 2 Sorted by: Reset to default Highest score (default) Date modified (newest first) Date created (oldest first) 0 $\begingroup$ Thanks for the answers. If you are processing an audio signal I think the application of a low pass filter (lpf) would help to enhance the signal to noise ratio. This would help especially if the noise component occupies a large part of the spectrum. If the audio is human speech the majority of the energy is within the 300Hz to 3Khz region. Using a low pass filter with a cutoff frequency of 3Khz would eliminate noise that is in the higher part of the spectrum. You could implement the lpf as a pre-processing function. I am not knowledgeable on the implementation but a search should get you the info you need. I did find an article here. If I recall the process is to convert the time domain signal to the frequency domain using a FFT, then set a cutoff point and reconvert back to the time domain. I also know there are ways to implement that directly in the time domain.Hope this helps. I am also supersized that if you achieve a high validation accuracy that your test set accuracy is so low. Your validation data should be data the network has not seen before just like your test data. Only thing I can think off is that the test data has a very different probability distribution than the training and validation data. How were the various data sets (train, test, validate) selected? Best choice is to select these randomly using something like sklearn train_test_split or Keras ImageDataGenerator flow from directory. Hope this helps. Share Improve this answer Follow edited Apr 18, 2020 at 17:07 answered Apr 18, 2020 at 16:59 Gerry P 724 4 4 silver badges 11 11 bronze badges $\endgroup$ 1 $\begingroup$ Thanks for the additional help. This application is not human speech, unfortunately. I believe that any filtering I could do for the background noise would effectively filter the signal as well. With respect to train, test, and validate, I get train, test from train_test_split. To validate, I take audio clips of the signal and overlay them at various times over background audio (at various intensities) and then run the composited audio clip through the model.predict function. The model has been trained on both the signal and background but just not together. $\endgroup$ Mecho Engineer – Mecho Engineer 2020-04-19 05:07:50 +00:00 Commented Apr 19, 2020 at 5:07 Add a comment | 0 $\begingroup$ To anyone who reads this, I still haven't solved this completely. At the moment I'm doing a lot better with much cleaner data, using a loss metric that matches what I'm after (F1_Score), using a very deep learning model (a custom Inception Resnet V2 model), use a custom learning rate function that depends on the training round's F1 Score, and every training round computing a F1 Score for various dB of signal/noise test sets with which I compute a model wellness score with which I determine if the model is good enough. Pretty close. Share Improve this answer Follow answered Aug 25, 2020 at 1:16 Mecho Engineer 21 1 1 bronze badge $\endgroup$ Add a comment | You must log in to answer this question. Start asking to get answers Find the answer to your question by asking. Ask question Explore related questions convolutional-neural-networks keras architecture See similar questions with these tags.

2 Answers 2 Sorted by: Reset to default Highest score (default) Date modified (newest first) Date created (oldest first)

2 Answers 2 Sorted by: Reset to default Highest score (default) Date modified (newest first) Date created (oldest first)

Sorted by: Reset to default Highest score (default) Date modified (newest first) Date created (oldest first)

Sorted by: Reset to default Highest score (default) Date modified (newest first) Date created (oldest first)

Sorted by: Reset to default

Highest score (default) Date modified (newest first) Date created (oldest first)

0 $\begingroup$ Thanks for the answers. If you are processing an audio signal I think the application of a low pass filter (lpf) would help to enhance the signal to noise ratio. This would help especially if the noise component occupies a large part of the spectrum. If the audio is human speech the majority of the energy is within the 300Hz to 3Khz region. Using a low pass filter with a cutoff frequency of 3Khz would eliminate noise that is in the higher part of the spectrum. You could implement the lpf as a pre-processing function. I am not knowledgeable on the implementation but a search should get you the info you need. I did find an article here. If I recall the process is to convert the time domain signal to the frequency domain using a FFT, then set a cutoff point and reconvert back to the time domain. I also know there are ways to implement that directly in the time domain.Hope this helps. I am also supersized that if you achieve a high validation accuracy that your test set accuracy is so low. Your validation data should be data the network has not seen before just like your test data. Only thing I can think off is that the test data has a very different probability distribution than the training and validation data. How were the various data sets (train, test, validate) selected? Best choice is to select these randomly using something like sklearn train_test_split or Keras ImageDataGenerator flow from directory. Hope this helps. Share Improve this answer Follow edited Apr 18, 2020 at 17:07 answered Apr 18, 2020 at 16:59 Gerry P 724 4 4 silver badges 11 11 bronze badges $\endgroup$ 1 $\begingroup$ Thanks for the additional help. This application is not human speech, unfortunately. I believe that any filtering I could do for the background noise would effectively filter the signal as well. With respect to train, test, and validate, I get train, test from train_test_split. To validate, I take audio clips of the signal and overlay them at various times over background audio (at various intensities) and then run the composited audio clip through the model.predict function. The model has been trained on both the signal and background but just not together. $\endgroup$ Mecho Engineer – Mecho Engineer 2020-04-19 05:07:50 +00:00 Commented Apr 19, 2020 at 5:07 Add a comment |

0 $\begingroup$ Thanks for the answers. If you are processing an audio signal I think the application of a low pass filter (lpf) would help to enhance the signal to noise ratio. This would help especially if the noise component occupies a large part of the spectrum. If the audio is human speech the majority of the energy is within the 300Hz to 3Khz region. Using a low pass filter with a cutoff frequency of 3Khz would eliminate noise that is in the higher part of the spectrum. You could implement the lpf as a pre-processing function. I am not knowledgeable on the implementation but a search should get you the info you need. I did find an article here. If I recall the process is to convert the time domain signal to the frequency domain using a FFT, then set a cutoff point and reconvert back to the time domain. I also know there are ways to implement that directly in the time domain.Hope this helps. I am also supersized that if you achieve a high validation accuracy that your test set accuracy is so low. Your validation data should be data the network has not seen before just like your test data. Only thing I can think off is that the test data has a very different probability distribution than the training and validation data. How were the various data sets (train, test, validate) selected? Best choice is to select these randomly using something like sklearn train_test_split or Keras ImageDataGenerator flow from directory. Hope this helps. Share Improve this answer Follow edited Apr 18, 2020 at 17:07 answered Apr 18, 2020 at 16:59 Gerry P 724 4 4 silver badges 11 11 bronze badges $\endgroup$ 1 $\begingroup$ Thanks for the additional help. This application is not human speech, unfortunately. I believe that any filtering I could do for the background noise would effectively filter the signal as well. With respect to train, test, and validate, I get train, test from train_test_split. To validate, I take audio clips of the signal and overlay them at various times over background audio (at various intensities) and then run the composited audio clip through the model.predict function. The model has been trained on both the signal and background but just not together. $\endgroup$ Mecho Engineer – Mecho Engineer 2020-04-19 05:07:50 +00:00 Commented Apr 19, 2020 at 5:07 Add a comment |

$\begingroup$ Thanks for the answers. If you are processing an audio signal I think the application of a low pass filter (lpf) would help to enhance the signal to noise ratio. This would help especially if the noise component occupies a large part of the spectrum. If the audio is human speech the majority of the energy is within the 300Hz to 3Khz region. Using a low pass filter with a cutoff frequency of 3Khz would eliminate noise that is in the higher part of the spectrum. You could implement the lpf as a pre-processing function. I am not knowledgeable on the implementation but a search should get you the info you need. I did find an article here. If I recall the process is to convert the time domain signal to the frequency domain using a FFT, then set a cutoff point and reconvert back to the time domain. I also know there are ways to implement that directly in the time domain.Hope this helps. I am also supersized that if you achieve a high validation accuracy that your test set accuracy is so low. Your validation data should be data the network has not seen before just like your test data. Only thing I can think off is that the test data has a very different probability distribution than the training and validation data. How were the various data sets (train, test, validate) selected? Best choice is to select these randomly using something like sklearn train_test_split or Keras ImageDataGenerator flow from directory. Hope this helps. Share Improve this answer Follow edited Apr 18, 2020 at 17:07 answered Apr 18, 2020 at 16:59 Gerry P 724 4 4 silver badges 11 11 bronze badges $\endgroup$

Thanks for the answers. If you are processing an audio signal I think the application of a low pass filter (lpf) would help to enhance the signal to noise ratio. This would help especially if the noise component occupies a large part of the spectrum. If the audio is human speech the majority of the energy is within the 300Hz to 3Khz region. Using a low pass filter with a cutoff frequency of 3Khz would eliminate noise that is in the higher part of the spectrum. You could implement the lpf as a pre-processing function. I am not knowledgeable on the implementation but a search should get you the info you need. I did find an article here. If I recall the process is to convert the time domain signal to the frequency domain using a FFT, then set a cutoff point and reconvert back to the time domain. I also know there are ways to implement that directly in the time domain.Hope this helps. I am also supersized that if you achieve a high validation accuracy that your test set accuracy is so low. Your validation data should be data the network has not seen before just like your test data. Only thing I can think off is that the test data has a very different probability distribution than the training and validation data. How were the various data sets (train, test, validate) selected? Best choice is to select these randomly using something like sklearn train_test_split or Keras ImageDataGenerator flow from directory. Hope this helps.

Thanks for the answers. If you are processing an audio signal I think the application of a low pass filter (lpf) would help to enhance the signal to noise ratio. This would help especially if the noise component occupies a large part of the spectrum. If the audio is human speech the majority of the energy is within the 300Hz to 3Khz region. Using a low pass filter with a cutoff frequency of 3Khz would eliminate noise that is in the higher part of the spectrum. You could implement the lpf as a pre-processing function. I am not knowledgeable on the implementation but a search should get you the info you need. I did find an article here. If I recall the process is to convert the time domain signal to the frequency domain using a FFT, then set a cutoff point and reconvert back to the time domain. I also know there are ways to implement that directly in the time domain.Hope this helps. I am also supersized that if you achieve a high validation accuracy that your test set accuracy is so low. Your validation data should be data the network has not seen before just like your test data. Only thing I can think off is that the test data has a very different probability distribution than the training and validation data. How were the various data sets (train, test, validate) selected? Best choice is to select these randomly using something like sklearn train_test_split or Keras ImageDataGenerator flow from directory. Hope this helps.

Share Improve this answer Follow edited Apr 18, 2020 at 17:07 answered Apr 18, 2020 at 16:59 Gerry P 724 4 4 silver badges 11 11 bronze badges

Share Improve this answer Follow edited Apr 18, 2020 at 17:07 answered Apr 18, 2020 at 16:59 Gerry P 724 4 4 silver badges 11 11 bronze badges

Share Improve this answer Follow

Share Improve this answer Follow

Share Improve this answer Follow

edited Apr 18, 2020 at 17:07

edited Apr 18, 2020 at 17:07

edited Apr 18, 2020 at 17:07

edited Apr 18, 2020 at 17:07

answered Apr 18, 2020 at 16:59 Gerry P 724 4 4 silver badges 11 11 bronze badges

answered Apr 18, 2020 at 16:59 Gerry P 724 4 4 silver badges 11 11 bronze badges

answered Apr 18, 2020 at 16:59

answered Apr 18, 2020 at 16:59

Gerry P 724 4 4 silver badges 11 11 bronze badges

724 4 4 silver badges 11 11 bronze badges

$\begingroup$ Thanks for the additional help. This application is not human speech, unfortunately. I believe that any filtering I could do for the background noise would effectively filter the signal as well. With respect to train, test, and validate, I get train, test from train_test_split. To validate, I take audio clips of the signal and overlay them at various times over background audio (at various intensities) and then run the composited audio clip through the model.predict function. The model has been trained on both the signal and background but just not together. $\endgroup$ Mecho Engineer – Mecho Engineer 2020-04-19 05:07:50 +00:00 Commented Apr 19, 2020 at 5:07 Add a comment |

$\begingroup$ Thanks for the additional help. This application is not human speech, unfortunately. I believe that any filtering I could do for the background noise would effectively filter the signal as well. With respect to train, test, and validate, I get train, test from train_test_split. To validate, I take audio clips of the signal and overlay them at various times over background audio (at various intensities) and then run the composited audio clip through the model.predict function. The model has been trained on both the signal and background but just not together. $\endgroup$ Mecho Engineer – Mecho Engineer 2020-04-19 05:07:50 +00:00 Commented Apr 19, 2020 at 5:07

$\begingroup$ Thanks for the additional help. This application is not human speech, unfortunately. I believe that any filtering I could do for the background noise would effectively filter the signal as well. With respect to train, test, and validate, I get train, test from train_test_split. To validate, I take audio clips of the signal and overlay them at various times over background audio (at various intensities) and then run the composited audio clip through the model.predict function. The model has been trained on both the signal and background but just not together. $\endgroup$ Mecho Engineer – Mecho Engineer 2020-04-19 05:07:50 +00:00 Commented Apr 19, 2020 at 5:07

$\begingroup$ Thanks for the additional help. This application is not human speech, unfortunately. I believe that any filtering I could do for the background noise would effectively filter the signal as well. With respect to train, test, and validate, I get train, test from train_test_split. To validate, I take audio clips of the signal and overlay them at various times over background audio (at various intensities) and then run the composited audio clip through the model.predict function. The model has been trained on both the signal and background but just not together. $\endgroup$ Mecho Engineer – Mecho Engineer 2020-04-19 05:07:50 +00:00 Commented Apr 19, 2020 at 5:07

Mecho Engineer – Mecho Engineer

2020-04-19 05:07:50 +00:00

0 $\begingroup$ To anyone who reads this, I still haven't solved this completely. At the moment I'm doing a lot better with much cleaner data, using a loss metric that matches what I'm after (F1_Score), using a very deep learning model (a custom Inception Resnet V2 model), use a custom learning rate function that depends on the training round's F1 Score, and every training round computing a F1 Score for various dB of signal/noise test sets with which I compute a model wellness score with which I determine if the model is good enough. Pretty close. Share Improve this answer Follow answered Aug 25, 2020 at 1:16 Mecho Engineer 21 1 1 bronze badge $\endgroup$ Add a comment |

0 $\begingroup$ To anyone who reads this, I still haven't solved this completely. At the moment I'm doing a lot better with much cleaner data, using a loss metric that matches what I'm after (F1_Score), using a very deep learning model (a custom Inception Resnet V2 model), use a custom learning rate function that depends on the training round's F1 Score, and every training round computing a F1 Score for various dB of signal/noise test sets with which I compute a model wellness score with which I determine if the model is good enough. Pretty close. Share Improve this answer Follow answered Aug 25, 2020 at 1:16 Mecho Engineer 21 1 1 bronze badge $\endgroup$ Add a comment |

$\begingroup$ To anyone who reads this, I still haven't solved this completely. At the moment I'm doing a lot better with much cleaner data, using a loss metric that matches what I'm after (F1_Score), using a very deep learning model (a custom Inception Resnet V2 model), use a custom learning rate function that depends on the training round's F1 Score, and every training round computing a F1 Score for various dB of signal/noise test sets with which I compute a model wellness score with which I determine if the model is good enough. Pretty close. Share Improve this answer Follow answered Aug 25, 2020 at 1:16 Mecho Engineer 21 1 1 bronze badge $\endgroup$

To anyone who reads this, I still haven't solved this completely. At the moment I'm doing a lot better with much cleaner data, using a loss metric that matches what I'm after (F1_Score), using a very deep learning model (a custom Inception Resnet V2 model), use a custom learning rate function that depends on the training round's F1 Score, and every training round computing a F1 Score for various dB of signal/noise test sets with which I compute a model wellness score with which I determine if the model is good enough. Pretty close.

To anyone who reads this, I still haven't solved this completely. At the moment I'm doing a lot better with much cleaner data, using a loss metric that matches what I'm after (F1_Score), using a very deep learning model (a custom Inception Resnet V2 model), use a custom learning rate function that depends on the training round's F1 Score, and every training round computing a F1 Score for various dB of signal/noise test sets with which I compute a model wellness score with which I determine if the model is good enough. Pretty close.

Share Improve this answer Follow answered Aug 25, 2020 at 1:16 Mecho Engineer 21 1 1 bronze badge

Share Improve this answer Follow answered Aug 25, 2020 at 1:16 Mecho Engineer 21 1 1 bronze badge

Share Improve this answer Follow

Share Improve this answer Follow

Share Improve this answer Follow

answered Aug 25, 2020 at 1:16 Mecho Engineer 21 1 1 bronze badge

answered Aug 25, 2020 at 1:16 Mecho Engineer 21 1 1 bronze badge

answered Aug 25, 2020 at 1:16

answered Aug 25, 2020 at 1:16

Mecho Engineer 21 1 1 bronze badge

Start asking to get answers Find the answer to your question by asking. Ask question Explore related questions convolutional-neural-networks keras architecture See similar questions with these tags.

Start asking to get answers Find the answer to your question by asking. Ask question

Start asking to get answers Find the answer to your question by asking. Ask question

Start asking to get answers

Find the answer to your question by asking.

Explore related questions convolutional-neural-networks keras architecture See similar questions with these tags.

Explore related questions convolutional-neural-networks keras architecture See similar questions with these tags.

Explore related questions

convolutional-neural-networks keras architecture

See similar questions with these tags.

The Overflow Blog Stack Gives Back 2025! Featured on Meta Community Engagement Across the Network: Focus for 2026 Results of the January 2026 Community Asks Sprint: Community Badges Related 0 Using CNN to identify buildings from aerial images 1 Keras 1D CNN always predicts the same result even if accuracy is high on training set 1 Additional Optimizations for Convolutional Models On Inferencing 0 CNN model to infer classes from unlabelled, unpartitioned data 1 What is the channel dimension other than color representation in Conv2D? Shall I use Conv3D instead? Hot Network Questions How can I create latitude indents (or outdents) on a sphere? How can I send data (not commands) over ssh? Getting attacked 3 times in a row on fresh DigitalOcean droplets - what am I missing? What is the purpose of glutinous rice flour in this kimchi paste? Can I leave it out or substitute? Some books written by professional mathematicians about the process of research and discovery in mathematics at PhD level and above? is it possible to modify "front" axis? I'm following a tutorial Why is running away discouraged? Nominal value versus market value of legal tender gold coins Can a mordent fall before the beat? Swift/SwiftUI: UI-control for ratings Graecism or Latinism for 'false balancing' Did the words 'childhood' and 'boyhood'/'girlhood' ever refer to different parts of life? Clarification of "Kap" vs. "Cape" Is "Cascade Merging" (Forward Porting) riskier than Backporting? What were the five poems Cavafy translated into Greek? What is the motivation behind the shape operator? Do you know this drink? French scifi time travel romance novel, at least 5 years old, that ends in self-sacrifice to stop gendercidal virus ObjectCountingSort.java Translations in Quantum Mechanics поотпадать and other verbs with an otiose по- Showcase of beautiful 'electrical circuit diagrams' done in TeX & friends How can I stop repeating the same words in my essay Do diastereotopic carbons lead to different chemical shifts in NMR spectra? more hot questions Question feed

The Overflow Blog Stack Gives Back 2025! Featured on Meta Community Engagement Across the Network: Focus for 2026 Results of the January 2026 Community Asks Sprint: Community Badges

Stack Gives Back 2025!

Community Engagement Across the Network: Focus for 2026

Results of the January 2026 Community Asks Sprint: Community Badges

Related 0 Using CNN to identify buildings from aerial images 1 Keras 1D CNN always predicts the same result even if accuracy is high on training set 1 Additional Optimizations for Convolutional Models On Inferencing 0 CNN model to infer classes from unlabelled, unpartitioned data 1 What is the channel dimension other than color representation in Conv2D? Shall I use Conv3D instead?

0 Using CNN to identify buildings from aerial images 1 Keras 1D CNN always predicts the same result even if accuracy is high on training set 1 Additional Optimizations for Convolutional Models On Inferencing 0 CNN model to infer classes from unlabelled, unpartitioned data 1 What is the channel dimension other than color representation in Conv2D? Shall I use Conv3D instead?

0 Using CNN to identify buildings from aerial images

1 Keras 1D CNN always predicts the same result even if accuracy is high on training set

1 Additional Optimizations for Convolutional Models On Inferencing

0 CNN model to infer classes from unlabelled, unpartitioned data

1 What is the channel dimension other than color representation in Conv2D? Shall I use Conv3D instead?

Hot Network Questions How can I create latitude indents (or outdents) on a sphere? How can I send data (not commands) over ssh? Getting attacked 3 times in a row on fresh DigitalOcean droplets - what am I missing? What is the purpose of glutinous rice flour in this kimchi paste? Can I leave it out or substitute? Some books written by professional mathematicians about the process of research and discovery in mathematics at PhD level and above? is it possible to modify "front" axis? I'm following a tutorial Why is running away discouraged? Nominal value versus market value of legal tender gold coins Can a mordent fall before the beat? Swift/SwiftUI: UI-control for ratings Graecism or Latinism for 'false balancing' Did the words 'childhood' and 'boyhood'/'girlhood' ever refer to different parts of life? Clarification of "Kap" vs. "Cape" Is "Cascade Merging" (Forward Porting) riskier than Backporting? What were the five poems Cavafy translated into Greek? What is the motivation behind the shape operator? Do you know this drink? French scifi time travel romance novel, at least 5 years old, that ends in self-sacrifice to stop gendercidal virus ObjectCountingSort.java Translations in Quantum Mechanics поотпадать and other verbs with an otiose по- Showcase of beautiful 'electrical circuit diagrams' done in TeX & friends How can I stop repeating the same words in my essay Do diastereotopic carbons lead to different chemical shifts in NMR spectra? more hot questions