Inputs are initially passed through some thoroughly connected layer, to the double-layer residual multihead awareness as demonstrated in Fig. seven. Residual networks (Kaiming He, 2016), incorporate feedforward to circumvent neurons from enduring exploding or vanishing gradients throughout the training method. The fully linked layers while in the r