Generate a fused fp16 LayerNorm plus SiLU kernel over the last dimension for exactly one input tensor x. Do not use affine gamma or beta parameters: y = (x - mean(x)) / sqrt(var(x) + 1e-5), out = y * sigmoid(y).
