Required: Rich experience in optimizing AI chip architectures and AI systems, familiar with mainstream heterogeneous computing software and hardware architectures in the industry, and have comprehensive capabilities from applications to basic software to chips. Hands-on experience with one of the following technologies: Numerical Calculation, Compilation, Algorithm & chip co-design, Runtime, Shared Memory. Knowledge of AI industry application scenarios, familiar with mainstream models and algorithm development trends, and able to extract requirements for the chip layer. Experience in analyzing workload sensitivity to micro-architecture features, evaluating performance trade-offs, and recommending improvements to both micro-architecture and application software for optimal efficiency. Familiarity with the performance impact of different compute, memory, and communication configurations, as well as hardware and software implementation choices, on AI acceleration. Experience with GPU compute APIs such as CUDA or OpenCL, and the ability to utilize GPU/NPU-optimized libraries to enhance performance. Experience in the development of deep learning frameworks, compilers, or system software. Strong background in compilers and optimization techniques; experience with LLVM-MLIR is a plus, but not required. Experience in software development using C/C++ and Python.
Desired: Relevant experience in several sub-fields of AI application algorithms, frameworks, runtime, modeling and simulation, and compilers. In-depth understanding of the innovative methods, platforms, and tools of AI head manufacturers, and experience in transforming application and academic research achievements into commercial products. Experience with GPU acceleration using AMD or Nvidia GPUs. Experience in developing inference backends and compilers for GPU or NPU. Experience with AI/ML inference frameworks like ONNXRuntime, IREE, or TVM. Experience with deploying AI models in production environments.
#J-18808-Ljbffr