About Me
Hi, my name is Zhiyu Ding. I am an undergraduate student at Southwest Petroleum University, majoring in Data Science and Big Data Technology. My main interests are high-performance computing, parallel programming, CUDA, Linux, and engineering-oriented optimization.
What interests me most is not just getting a program to run, but tracing a real performance problem from profiling and bottleneck analysis to memory access strategy, parallel design, and final speedup validation.
This site serves both as my personal homepage and as a long-term knowledge base for CUDA, supercomputing practice, Linux, and everyday engineering notes.
Download my CV (Updated via my online resume)
Education

Southwest Petroleum University
School of Computer Science and Software Engineering · Data Science and Big Data Technology
GPA 4.13 / 5.0, ranked 1 / 66 in major
2023 - Present
Experience

AlphaFold3 Inference Performance Optimization
ASC25 Student Supercomputer Challenge
Deployed and optimized AlphaFold3 on an Intel Xeon plus NVIDIA A100 heterogeneous platform, focusing on JAX compilation latency, GPU execution strategy, and CPU-side numerical stability. Achieved 1.2x to 5.3x speedup while preserving result quality.
2025.01 - 2025.02

Parallel Optimization for an Oil Spill Prediction Model
2024 Ocean Computing Challenge Finals
Implemented MPI plus OpenMP hybrid parallelization for a two-dimensional oil spill prediction model, then used VTune to improve load balance, communication, and memory behavior. Reached 2482.14x speedup on 2 nodes / 128 CPU cores and won the national third prize.
2024.07 - 2024.08

Tecorigin Deep Learning Operator Optimization
OpenAtom Competition · Operator Development Challenge
Identified I/O bottlenecks in a convolution forward operator and optimized it with SPM memory usage, double-buffered asynchronous pipelining, SIMD data reordering, and cost-model-driven tuning, delivering 3.7x overall speedup.
2024.06 - 2024.09

PCG Optimization on the New Generation Sunway Platform
Domestic CPU Parallel Application Challenge
Reworked SpMV, dot product, and preconditioning stages with DMA / LDM dataflow optimization, kernel fusion, and reduction tuning, reducing total runtime from 1287 seconds to 32.5 seconds.
2024.02 - 2024.04
