Generate a CUDA kernel for fp16 GELU activation using tanh approximation.
