← Вернуться к списку

Что авторы данной статьи подразумевают под термином смещения (bias term) на этой иллюстрации реализации нейронной сети?

Краткое содержание

Читаю статью, в которой реализован алгоритм градиентной политики глубокого детерминированного портфеля управления активами. Мой вопрос касается конкретной реализации нейронной сети, показанной на этой иллюстрации (статья, рисунок находится на странице 14). Первые три шага — свёрточные слои. После того, как исходный тензор преобразуется в вектор, авторы добавляют маленькое жёлтое квадратное поле, называемое денежным смещением («cash bias»), после чего выполняют операцию softmax. Авторы статьи не уточняют подробно, что именно представляет собой этот параметр смещения, лишь указывают, что он добавляется перед операцией softmax. Это заставляет меня предположить, что такой подход является стандартным этапом обработки? Но мне неизвестно, является ли это учимым параметром модели или же просто скалярной константой, присоединяемой к вектору до операции softmax. У меня два вопроса: 1) Когда говорится про softmax, корректно ли считать, что речь идёт просто о функции softmax без каких-либо настраиваемых параметров? Или имеется в виду полносвязный линейный слой с функцией активации softmax? 2) Если подразумевается последний вариант, то каким образом определяется размерность выходного слоя? И какова роль денежного смещения («cash bias») в таком случае?

Полный текст

What do the authors of this paper mean by the bias term in this picture of a neural network implementation?Ask Question Asked5 years, 6 months agoModified5 days agoViewed210 times Asked5 years, 6 months ago 2$\begingroup$I am reading a paper implementing a deep deterministic policy gradient algorithm for portfolio management. My question is about a specific neural network implementation they depict in this picture (paper, picture is on page 14).The first three steps are convolutions. Once they have reduced the initial tensor into a vector, they add that little yellow square entry to the vector, called the cash bias, and then they do a softmax operation.The paper does not go into any detail about what this bias term could be, they just say that they add this bias before the softmax. This makes me think that perhaps this is a standard step? But I don't know if this is a learnable parameter, or just a scalar constant they concatenate to the vector prior to the softmax.I have two questions:1) When they write softmax, is it safe to assume that this is just a softmax function, with no learnable parameters? Or is this meant to depict a fully connected linear layer, with a softmax activation?2) If it's the latter, then I can interpret the cash bias as being a constant term they concatenate to the vector before the fully connected layer, just to add one more feature for the cash assets. However, if softmax means just a function, then what is this cash bias? It must be a constant that they implement, but I don't see what the use of that would be, how can you pick a constant scalar that you are confident will have the intended impact on the softmax output to bias the network to put some weight on that feature (cash)?Any comments/interpretations are appreciated!convolutional-neural-networkspapersalgorithmic-biassoftmaxShareImprove this questionFollowaskedMay 9, 2020 at 19:49Mike14144 bronze badges$\endgroup$Add a comment|3 Answers3Sorted by:Reset to defaultHighest score (default)Date modified (newest first)Date created (oldest first)0$\begingroup$to the cash bias: I think this is simply the money that is still available at time t=50 and has not yet been invested.ShareImprove this answerFollowansweredNov 8, 2021 at 9:57Pastela1$\endgroup$Add a comment|0$\begingroup$Yes, the softmax is just a softmax, applied to the 12 values from the final linear layer.As far as the "cash bias" goes — I'm not sure if the "cash" part has significance to the authors. But, the "bias" part is standard. The inputs to the last layer are 11-dimensional vectors. But the last layer has12parameters: one parameter for each input value, and one additive parameter called the bias.That is, assuming the input vector is$(x_1, x_2, ..., x_{11})$, then output will be:$$ y = w_0 + x_1w_1 + x_2w_2 + ... + x_{11}w_{11} $$The weight$w_0$is called a bias. If you remove it, then you'll always map the input of all zeros,$(0, ... 0)$to the output of$0$. This might not be desirable, and might prevent the layer from converging. Adding a bias$w_0$allows the layer to map any input to any output.Not that the$w_0$isnot a true constant. It's a trainable parameter like all others.ShareImprove this answerFollowansweredMay 18, 2023 at 22:11Alex10111 bronze badge$\endgroup$Add a comment|0$\begingroup$On page 8 they say:In the experiments of the paper, the 11 most-volumed non-cash assets are preselected for the portfolio. Together with the cash, Bitcoin, the size of the portfolio, m + 1, is 12. This number is chosen by experience and can be adjusted in future experiments. For markets with large volumes, like the foreign exchange market, m can be as big as the total number of available assets.They have 12 possible actions in the policy: buy one or more of the 11 non-cash assets, or sell them and hold more cash. It seems that the bias towards the cash position is a learnable parameter.Although now that I read that quote a few times, it seems that they treatBitcoin as cash?! At first I thought they meant a currency like USD. But in effect the outcome is the same. As you increase the cash bias, the softmax activation gets biased towards that policy.ShareImprove this answerFollowansweredFeb 13, 2024 at 9:56NikoNyrh86255 silver badges99 bronze badges$\endgroup$Add a comment|You mustlog into answer this question.Start asking to get answersFind the answer to your question by asking.Ask questionExplore related questionsconvolutional-neural-networkspapersalgorithmic-biassoftmaxSee similar questions with these tags. 2$\begingroup$I am reading a paper implementing a deep deterministic policy gradient algorithm for portfolio management. My question is about a specific neural network implementation they depict in this picture (paper, picture is on page 14).The first three steps are convolutions. Once they have reduced the initial tensor into a vector, they add that little yellow square entry to the vector, called the cash bias, and then they do a softmax operation.The paper does not go into any detail about what this bias term could be, they just say that they add this bias before the softmax. This makes me think that perhaps this is a standard step? But I don't know if this is a learnable parameter, or just a scalar constant they concatenate to the vector prior to the softmax.I have two questions:1) When they write softmax, is it safe to assume that this is just a softmax function, with no learnable parameters? Or is this meant to depict a fully connected linear layer, with a softmax activation?2) If it's the latter, then I can interpret the cash bias as being a constant term they concatenate to the vector before the fully connected layer, just to add one more feature for the cash assets. However, if softmax means just a function, then what is this cash bias? It must be a constant that they implement, but I don't see what the use of that would be, how can you pick a constant scalar that you are confident will have the intended impact on the softmax output to bias the network to put some weight on that feature (cash)?Any comments/interpretations are appreciated!convolutional-neural-networkspapersalgorithmic-biassoftmaxShareImprove this questionFollowaskedMay 9, 2020 at 19:49Mike14144 bronze badges$\endgroup$Add a comment| 2$\begingroup$I am reading a paper implementing a deep deterministic policy gradient algorithm for portfolio management. My question is about a specific neural network implementation they depict in this picture (paper, picture is on page 14).The first three steps are convolutions. Once they have reduced the initial tensor into a vector, they add that little yellow square entry to the vector, called the cash bias, and then they do a softmax operation.The paper does not go into any detail about what this bias term could be, they just say that they add this bias before the softmax. This makes me think that perhaps this is a standard step? But I don't know if this is a learnable parameter, or just a scalar constant they concatenate to the vector prior to the softmax.I have two questions:1) When they write softmax, is it safe to assume that this is just a softmax function, with no learnable parameters? Or is this meant to depict a fully connected linear layer, with a softmax activation?2) If it's the latter, then I can interpret the cash bias as being a constant term they concatenate to the vector before the fully connected layer, just to add one more feature for the cash assets. However, if softmax means just a function, then what is this cash bias? It must be a constant that they implement, but I don't see what the use of that would be, how can you pick a constant scalar that you are confident will have the intended impact on the softmax output to bias the network to put some weight on that feature (cash)?Any comments/interpretations are appreciated!convolutional-neural-networkspapersalgorithmic-biassoftmaxShareImprove this questionFollowaskedMay 9, 2020 at 19:49Mike14144 bronze badges$\endgroup$Add a comment| $\begingroup$I am reading a paper implementing a deep deterministic policy gradient algorithm for portfolio management. My question is about a specific neural network implementation they depict in this picture (paper, picture is on page 14).The first three steps are convolutions. Once they have reduced the initial tensor into a vector, they add that little yellow square entry to the vector, called the cash bias, and then they do a softmax operation.The paper does not go into any detail about what this bias term could be, they just say that they add this bias before the softmax. This makes me think that perhaps this is a standard step? But I don't know if this is a learnable parameter, or just a scalar constant they concatenate to the vector prior to the softmax.I have two questions:1) When they write softmax, is it safe to assume that this is just a softmax function, with no learnable parameters? Or is this meant to depict a fully connected linear layer, with a softmax activation?2) If it's the latter, then I can interpret the cash bias as being a constant term they concatenate to the vector before the fully connected layer, just to add one more feature for the cash assets. However, if softmax means just a function, then what is this cash bias? It must be a constant that they implement, but I don't see what the use of that would be, how can you pick a constant scalar that you are confident will have the intended impact on the softmax output to bias the network to put some weight on that feature (cash)?Any comments/interpretations are appreciated!convolutional-neural-networkspapersalgorithmic-biassoftmaxShareImprove this questionFollowaskedMay 9, 2020 at 19:49Mike14144 bronze badges$\endgroup$ I am reading a paper implementing a deep deterministic policy gradient algorithm for portfolio management. My question is about a specific neural network implementation they depict in this picture (paper, picture is on page 14).The first three steps are convolutions. Once they have reduced the initial tensor into a vector, they add that little yellow square entry to the vector, called the cash bias, and then they do a softmax operation.The paper does not go into any detail about what this bias term could be, they just say that they add this bias before the softmax. This makes me think that perhaps this is a standard step? But I don't know if this is a learnable parameter, or just a scalar constant they concatenate to the vector prior to the softmax.I have two questions:1) When they write softmax, is it safe to assume that this is just a softmax function, with no learnable parameters? Or is this meant to depict a fully connected linear layer, with a softmax activation?2) If it's the latter, then I can interpret the cash bias as being a constant term they concatenate to the vector before the fully connected layer, just to add one more feature for the cash assets. However, if softmax means just a function, then what is this cash bias? It must be a constant that they implement, but I don't see what the use of that would be, how can you pick a constant scalar that you are confident will have the intended impact on the softmax output to bias the network to put some weight on that feature (cash)?Any comments/interpretations are appreciated! I am reading a paper implementing a deep deterministic policy gradient algorithm for portfolio management. My question is about a specific neural network implementation they depict in this picture (paper, picture is on page 14). The first three steps are convolutions. Once they have reduced the initial tensor into a vector, they add that little yellow square entry to the vector, called the cash bias, and then they do a softmax operation. The paper does not go into any detail about what this bias term could be, they just say that they add this bias before the softmax. This makes me think that perhaps this is a standard step? But I don't know if this is a learnable parameter, or just a scalar constant they concatenate to the vector prior to the softmax. I have two questions: 1) When they write softmax, is it safe to assume that this is just a softmax function, with no learnable parameters? Or is this meant to depict a fully connected linear layer, with a softmax activation? 2) If it's the latter, then I can interpret the cash bias as being a constant term they concatenate to the vector before the fully connected layer, just to add one more feature for the cash assets. However, if softmax means just a function, then what is this cash bias? It must be a constant that they implement, but I don't see what the use of that would be, how can you pick a constant scalar that you are confident will have the intended impact on the softmax output to bias the network to put some weight on that feature (cash)? Any comments/interpretations are appreciated! convolutional-neural-networkspapersalgorithmic-biassoftmax convolutional-neural-networkspapersalgorithmic-biassoftmax convolutional-neural-networkspapersalgorithmic-biassoftmax ShareImprove this questionFollowaskedMay 9, 2020 at 19:49Mike14144 bronze badges ShareImprove this questionFollowaskedMay 9, 2020 at 19:49Mike14144 bronze badges ShareImprove this questionFollow ShareImprove this questionFollow ShareImprove this questionFollow Improve this question askedMay 9, 2020 at 19:49Mike14144 bronze badges askedMay 9, 2020 at 19:49Mike14144 bronze badges askedMay 9, 2020 at 19:49 askedMay 9, 2020 at 19:49 Mike14144 bronze badges 3 Answers3Sorted by:Reset to defaultHighest score (default)Date modified (newest first)Date created (oldest first)0$\begingroup$to the cash bias: I think this is simply the money that is still available at time t=50 and has not yet been invested.ShareImprove this answerFollowansweredNov 8, 2021 at 9:57Pastela1$\endgroup$Add a comment|0$\begingroup$Yes, the softmax is just a softmax, applied to the 12 values from the final linear layer.As far as the "cash bias" goes — I'm not sure if the "cash" part has significance to the authors. But, the "bias" part is standard. The inputs to the last layer are 11-dimensional vectors. But the last layer has12parameters: one parameter for each input value, and one additive parameter called the bias.That is, assuming the input vector is$(x_1, x_2, ..., x_{11})$, then output will be:$$ y = w_0 + x_1w_1 + x_2w_2 + ... + x_{11}w_{11} $$The weight$w_0$is called a bias. If you remove it, then you'll always map the input of all zeros,$(0, ... 0)$to the output of$0$. This might not be desirable, and might prevent the layer from converging. Adding a bias$w_0$allows the layer to map any input to any output.Not that the$w_0$isnot a true constant. It's a trainable parameter like all others.ShareImprove this answerFollowansweredMay 18, 2023 at 22:11Alex10111 bronze badge$\endgroup$Add a comment|0$\begingroup$On page 8 they say:In the experiments of the paper, the 11 most-volumed non-cash assets are preselected for the portfolio. Together with the cash, Bitcoin, the size of the portfolio, m + 1, is 12. This number is chosen by experience and can be adjusted in future experiments. For markets with large volumes, like the foreign exchange market, m can be as big as the total number of available assets.They have 12 possible actions in the policy: buy one or more of the 11 non-cash assets, or sell them and hold more cash. It seems that the bias towards the cash position is a learnable parameter.Although now that I read that quote a few times, it seems that they treatBitcoin as cash?! At first I thought they meant a currency like USD. But in effect the outcome is the same. As you increase the cash bias, the softmax activation gets biased towards that policy.ShareImprove this answerFollowansweredFeb 13, 2024 at 9:56NikoNyrh86255 silver badges99 bronze badges$\endgroup$Add a comment|You mustlog into answer this question.Start asking to get answersFind the answer to your question by asking.Ask questionExplore related questionsconvolutional-neural-networkspapersalgorithmic-biassoftmaxSee similar questions with these tags. 3 Answers3Sorted by:Reset to defaultHighest score (default)Date modified (newest first)Date created (oldest first) 3 Answers3Sorted by:Reset to defaultHighest score (default)Date modified (newest first)Date created (oldest first) Sorted by:Reset to defaultHighest score (default)Date modified (newest first)Date created (oldest first) Sorted by:Reset to defaultHighest score (default)Date modified (newest first)Date created (oldest first) Sorted by:Reset to default Highest score (default)Date modified (newest first)Date created (oldest first) 0$\begingroup$to the cash bias: I think this is simply the money that is still available at time t=50 and has not yet been invested.ShareImprove this answerFollowansweredNov 8, 2021 at 9:57Pastela1$\endgroup$Add a comment| 0$\begingroup$to the cash bias: I think this is simply the money that is still available at time t=50 and has not yet been invested.ShareImprove this answerFollowansweredNov 8, 2021 at 9:57Pastela1$\endgroup$Add a comment| $\begingroup$to the cash bias: I think this is simply the money that is still available at time t=50 and has not yet been invested.ShareImprove this answerFollowansweredNov 8, 2021 at 9:57Pastela1$\endgroup$ to the cash bias: I think this is simply the money that is still available at time t=50 and has not yet been invested. to the cash bias: I think this is simply the money that is still available at time t=50 and has not yet been invested. ShareImprove this answerFollowansweredNov 8, 2021 at 9:57Pastela1 ShareImprove this answerFollowansweredNov 8, 2021 at 9:57Pastela1 ShareImprove this answerFollow ShareImprove this answerFollow ShareImprove this answerFollow answeredNov 8, 2021 at 9:57Pastela1 answeredNov 8, 2021 at 9:57Pastela1 answeredNov 8, 2021 at 9:57 answeredNov 8, 2021 at 9:57 0$\begingroup$Yes, the softmax is just a softmax, applied to the 12 values from the final linear layer.As far as the "cash bias" goes — I'm not sure if the "cash" part has significance to the authors. But, the "bias" part is standard. The inputs to the last layer are 11-dimensional vectors. But the last layer has12parameters: one parameter for each input value, and one additive parameter called the bias.That is, assuming the input vector is$(x_1, x_2, ..., x_{11})$, then output will be:$$ y = w_0 + x_1w_1 + x_2w_2 + ... + x_{11}w_{11} $$The weight$w_0$is called a bias. If you remove it, then you'll always map the input of all zeros,$(0, ... 0)$to the output of$0$. This might not be desirable, and might prevent the layer from converging. Adding a bias$w_0$allows the layer to map any input to any output.Not that the$w_0$isnot a true constant. It's a trainable parameter like all others.ShareImprove this answerFollowansweredMay 18, 2023 at 22:11Alex10111 bronze badge$\endgroup$Add a comment| 0$\begingroup$Yes, the softmax is just a softmax, applied to the 12 values from the final linear layer.As far as the "cash bias" goes — I'm not sure if the "cash" part has significance to the authors. But, the "bias" part is standard. The inputs to the last layer are 11-dimensional vectors. But the last layer has12parameters: one parameter for each input value, and one additive parameter called the bias.That is, assuming the input vector is$(x_1, x_2, ..., x_{11})$, then output will be:$$ y = w_0 + x_1w_1 + x_2w_2 + ... + x_{11}w_{11} $$The weight$w_0$is called a bias. If you remove it, then you'll always map the input of all zeros,$(0, ... 0)$to the output of$0$. This might not be desirable, and might prevent the layer from converging. Adding a bias$w_0$allows the layer to map any input to any output.Not that the$w_0$isnot a true constant. It's a trainable parameter like all others.ShareImprove this answerFollowansweredMay 18, 2023 at 22:11Alex10111 bronze badge$\endgroup$Add a comment| $\begingroup$Yes, the softmax is just a softmax, applied to the 12 values from the final linear layer.As far as the "cash bias" goes — I'm not sure if the "cash" part has significance to the authors. But, the "bias" part is standard. The inputs to the last layer are 11-dimensional vectors. But the last layer has12parameters: one parameter for each input value, and one additive parameter called the bias.That is, assuming the input vector is$(x_1, x_2, ..., x_{11})$, then output will be:$$ y = w_0 + x_1w_1 + x_2w_2 + ... + x_{11}w_{11} $$The weight$w_0$is called a bias. If you remove it, then you'll always map the input of all zeros,$(0, ... 0)$to the output of$0$. This might not be desirable, and might prevent the layer from converging. Adding a bias$w_0$allows the layer to map any input to any output.Not that the$w_0$isnot a true constant. It's a trainable parameter like all others.ShareImprove this answerFollowansweredMay 18, 2023 at 22:11Alex10111 bronze badge$\endgroup$ Yes, the softmax is just a softmax, applied to the 12 values from the final linear layer.As far as the "cash bias" goes — I'm not sure if the "cash" part has significance to the authors. But, the "bias" part is standard. The inputs to the last layer are 11-dimensional vectors. But the last layer has12parameters: one parameter for each input value, and one additive parameter called the bias.That is, assuming the input vector is$(x_1, x_2, ..., x_{11})$, then output will be:$$ y = w_0 + x_1w_1 + x_2w_2 + ... + x_{11}w_{11} $$The weight$w_0$is called a bias. If you remove it, then you'll always map the input of all zeros,$(0, ... 0)$to the output of$0$. This might not be desirable, and might prevent the layer from converging. Adding a bias$w_0$allows the layer to map any input to any output.Not that the$w_0$isnot a true constant. It's a trainable parameter like all others. Yes, the softmax is just a softmax, applied to the 12 values from the final linear layer. As far as the "cash bias" goes — I'm not sure if the "cash" part has significance to the authors. But, the "bias" part is standard. The inputs to the last layer are 11-dimensional vectors. But the last layer has12parameters: one parameter for each input value, and one additive parameter called the bias. That is, assuming the input vector is$(x_1, x_2, ..., x_{11})$, then output will be: $$ y = w_0 + x_1w_1 + x_2w_2 + ... + x_{11}w_{11} $$ The weight$w_0$is called a bias. If you remove it, then you'll always map the input of all zeros,$(0, ... 0)$to the output of$0$. This might not be desirable, and might prevent the layer from converging. Adding a bias$w_0$allows the layer to map any input to any output. Not that the$w_0$isnot a true constant. It's a trainable parameter like all others. ShareImprove this answerFollowansweredMay 18, 2023 at 22:11Alex10111 bronze badge ShareImprove this answerFollowansweredMay 18, 2023 at 22:11Alex10111 bronze badge ShareImprove this answerFollow ShareImprove this answerFollow ShareImprove this answerFollow answeredMay 18, 2023 at 22:11Alex10111 bronze badge answeredMay 18, 2023 at 22:11Alex10111 bronze badge answeredMay 18, 2023 at 22:11 answeredMay 18, 2023 at 22:11 Alex10111 bronze badge 0$\begingroup$On page 8 they say:In the experiments of the paper, the 11 most-volumed non-cash assets are preselected for the portfolio. Together with the cash, Bitcoin, the size of the portfolio, m + 1, is 12. This number is chosen by experience and can be adjusted in future experiments. For markets with large volumes, like the foreign exchange market, m can be as big as the total number of available assets.They have 12 possible actions in the policy: buy one or more of the 11 non-cash assets, or sell them and hold more cash. It seems that the bias towards the cash position is a learnable parameter.Although now that I read that quote a few times, it seems that they treatBitcoin as cash?! At first I thought they meant a currency like USD. But in effect the outcome is the same. As you increase the cash bias, the softmax activation gets biased towards that policy.ShareImprove this answerFollowansweredFeb 13, 2024 at 9:56NikoNyrh86255 silver badges99 bronze badges$\endgroup$Add a comment| 0$\begingroup$On page 8 they say:In the experiments of the paper, the 11 most-volumed non-cash assets are preselected for the portfolio. Together with the cash, Bitcoin, the size of the portfolio, m + 1, is 12. This number is chosen by experience and can be adjusted in future experiments. For markets with large volumes, like the foreign exchange market, m can be as big as the total number of available assets.They have 12 possible actions in the policy: buy one or more of the 11 non-cash assets, or sell them and hold more cash. It seems that the bias towards the cash position is a learnable parameter.Although now that I read that quote a few times, it seems that they treatBitcoin as cash?! At first I thought they meant a currency like USD. But in effect the outcome is the same. As you increase the cash bias, the softmax activation gets biased towards that policy.ShareImprove this answerFollowansweredFeb 13, 2024 at 9:56NikoNyrh86255 silver badges99 bronze badges$\endgroup$Add a comment| $\begingroup$On page 8 they say:In the experiments of the paper, the 11 most-volumed non-cash assets are preselected for the portfolio. Together with the cash, Bitcoin, the size of the portfolio, m + 1, is 12. This number is chosen by experience and can be adjusted in future experiments. For markets with large volumes, like the foreign exchange market, m can be as big as the total number of available assets.They have 12 possible actions in the policy: buy one or more of the 11 non-cash assets, or sell them and hold more cash. It seems that the bias towards the cash position is a learnable parameter.Although now that I read that quote a few times, it seems that they treatBitcoin as cash?! At first I thought they meant a currency like USD. But in effect the outcome is the same. As you increase the cash bias, the softmax activation gets biased towards that policy.ShareImprove this answerFollowansweredFeb 13, 2024 at 9:56NikoNyrh86255 silver badges99 bronze badges$\endgroup$ On page 8 they say:In the experiments of the paper, the 11 most-volumed non-cash assets are preselected for the portfolio. Together with the cash, Bitcoin, the size of the portfolio, m + 1, is 12. This number is chosen by experience and can be adjusted in future experiments. For markets with large volumes, like the foreign exchange market, m can be as big as the total number of available assets.They have 12 possible actions in the policy: buy one or more of the 11 non-cash assets, or sell them and hold more cash. It seems that the bias towards the cash position is a learnable parameter.Although now that I read that quote a few times, it seems that they treatBitcoin as cash?! At first I thought they meant a currency like USD. But in effect the outcome is the same. As you increase the cash bias, the softmax activation gets biased towards that policy. In the experiments of the paper, the 11 most-volumed non-cash assets are preselected for the portfolio. Together with the cash, Bitcoin, the size of the portfolio, m + 1, is 12. This number is chosen by experience and can be adjusted in future experiments. For markets with large volumes, like the foreign exchange market, m can be as big as the total number of available assets. They have 12 possible actions in the policy: buy one or more of the 11 non-cash assets, or sell them and hold more cash. It seems that the bias towards the cash position is a learnable parameter. Although now that I read that quote a few times, it seems that they treatBitcoin as cash?! At first I thought they meant a currency like USD. But in effect the outcome is the same. As you increase the cash bias, the softmax activation gets biased towards that policy. ShareImprove this answerFollowansweredFeb 13, 2024 at 9:56NikoNyrh86255 silver badges99 bronze badges ShareImprove this answerFollowansweredFeb 13, 2024 at 9:56NikoNyrh86255 silver badges99 bronze badges ShareImprove this answerFollow ShareImprove this answerFollow ShareImprove this answerFollow answeredFeb 13, 2024 at 9:56NikoNyrh86255 silver badges99 bronze badges answeredFeb 13, 2024 at 9:56NikoNyrh86255 silver badges99 bronze badges answeredFeb 13, 2024 at 9:56 answeredFeb 13, 2024 at 9:56 NikoNyrh86255 silver badges99 bronze badges 86255 silver badges99 bronze badges Start asking to get answersFind the answer to your question by asking.Ask questionExplore related questionsconvolutional-neural-networkspapersalgorithmic-biassoftmaxSee similar questions with these tags. Start asking to get answersFind the answer to your question by asking.Ask question Start asking to get answersFind the answer to your question by asking.Ask question Start asking to get answers Find the answer to your question by asking. Explore related questionsconvolutional-neural-networkspapersalgorithmic-biassoftmaxSee similar questions with these tags. Explore related questionsconvolutional-neural-networkspapersalgorithmic-biassoftmaxSee similar questions with these tags. Explore related questions convolutional-neural-networkspapersalgorithmic-biassoftmax See similar questions with these tags. Featured on MetaChat room owners can now establish room guidelinesResults of the October 2025 Community Asks Sprint: copy button for code...We’re releasing our proactive anti-spam measure network-wideRelated9How does weight normalization work?1Why are there transition layers in DenseNet?2What does "class-level discriminative feature representation" mean in the paper "Semi-Supervised Deep Learning with Memory"?0Correctly input additional values into CNN0What is intended for 1x1 convolution for input images?1Number of units in Final softmax layer in VGGNet16Hot Network QuestionsWhat is the single biggest animal a single human with a spear can kill?how to pull with the hingeBy how many centimeters does point Q move upwards?Defining Lebesgue non measurable sets with countable informationA 30 or more years old Fantasy movie, about a man in Tibet centuries agoShould I kill a companion or would that be frowned upon at the table?How to reply to "How are you?"Numerical results of precision tests in perturbative QFTPractice: Quantity or QualityAgatha Christie - Hercule Poirot short story or novelHow big can determined sets be?Low torque/force motor designs that don't contain any magnets or non-magnetized ferromagnetic materials (no "iron"), and compatible with high vacuum?Speculative Question on NSA total storage capacityA single magical ant wandering in a vast garden of cubesHow to "attach" tikz arrows to elements without hard-coding position?What kind of Fungus is this?If you are clearly winning & only need a draw, is it appropriate to offer a draw?What genre comes close, other than Young Adult?Could the Big Bang be nothing more than the natural behavior of gravity in general relativity?beautification of a block matrixWho gets the accumulated preferred stock late dividend payments?Retiring early to pursue research in Pure MathematicsA question on the nonrelativistic limit of special relativityShort story about machine that shows clients' fantasiesmore hot questionsQuestion feed Featured on MetaChat room owners can now establish room guidelinesResults of the October 2025 Community Asks Sprint: copy button for code...We’re releasing our proactive anti-spam measure network-wide Chat room owners can now establish room guidelines Results of the October 2025 Community Asks Sprint: copy button for code... We’re releasing our proactive anti-spam measure network-wide Related9How does weight normalization work?1Why are there transition layers in DenseNet?2What does "class-level discriminative feature representation" mean in the paper "Semi-Supervised Deep Learning with Memory"?0Correctly input additional values into CNN0What is intended for 1x1 convolution for input images?1Number of units in Final softmax layer in VGGNet16 9How does weight normalization work?1Why are there transition layers in DenseNet?2What does "class-level discriminative feature representation" mean in the paper "Semi-Supervised Deep Learning with Memory"?0Correctly input additional values into CNN0What is intended for 1x1 convolution for input images?1Number of units in Final softmax layer in VGGNet16 9How does weight normalization work? 1Why are there transition layers in DenseNet? 2What does "class-level discriminative feature representation" mean in the paper "Semi-Supervised Deep Learning with Memory"? 0Correctly input additional values into CNN 0What is intended for 1x1 convolution for input images? 1Number of units in Final softmax layer in VGGNet16 Hot Network QuestionsWhat is the single biggest animal a single human with a spear can kill?how to pull with the hingeBy how many centimeters does point Q move upwards?Defining Lebesgue non measurable sets with countable informationA 30 or more years old Fantasy movie, about a man in Tibet centuries agoShould I kill a companion or would that be frowned upon at the table?How to reply to "How are you?"Numerical results of precision tests in perturbative QFTPractice: Quantity or QualityAgatha Christie - Hercule Poirot short story or novelHow big can determined sets be?Low torque/force motor designs that don't contain any magnets or non-magnetized ferromagnetic materials (no "iron"), and compatible with high vacuum?Speculative Question on NSA total storage capacityA single magical ant wandering in a vast garden of cubesHow to "attach" tikz arrows to elements without hard-coding position?What kind of Fungus is this?If you are clearly winning & only need a draw, is it appropriate to offer a draw?What genre comes close, other than Young Adult?Could the Big Bang be nothing more than the natural behavior of gravity in general relativity?beautification of a block matrixWho gets the accumulated preferred stock late dividend payments?Retiring early to pursue research in Pure MathematicsA question on the nonrelativistic limit of special relativityShort story about machine that shows clients' fantasiesmore hot questions