- Simulated function and performance of GPU TLB, MMU, and cache bus interface in modern C++ within the full-chip
- 10000 lines of code
- Implemented a CUDA C++ framework to dissect micro-architecture of Nvidia GPUs. Analyzed cache latency, bandwidth, and Network-on-Chip utilization through benchmarking kernels
- Purposed a configurable hash function for MMU cache bank interleaving, and a unified reversible hash function across
- Designed double end-to-end bandwidth feature on arch to support 2x speedup of MMA operation on texture memory
- Performed performance validation and analysis. Developed CUDA and OpenCL kernels exposing multiple address access
I
Intel
Cloud System Software Intern DCAI Cloud Engineering
February 2022 - June 2023
S
Shenzhen Research Institute of Big Data
Software Engineer Intern
August 2021 - January 2022
Skills
Languages
ChineseEnglish
Skills
Adobe FlashAmazon Elastic Compute CloudAmazon S3Amazon Web ServicesApplication Programming Interfaces (APIs)ArchitectureArtificial IntelligenceAssembly and InstallationAutoscalingAWS LambdaBenchmarking SkillsBig DataBoolean AlgebraBooting (BIOS)BuffersCachingCloud ComputingCloud EngineeringCloud Platform SystemComputer ArchitectureComputer ProgrammingContinuous IntegrationC++ (Programming Language)Creation of TexturesCycle Time VariationDashboardsData AnalysisData ProcessingDistributed SystemsDockerFlask (Web Framework)Graphics Processing Unit (GPU)Information TechnologyJenkinsKnowledge of EngineeringLinuxLinux KernelLoad BalancingMemory ManagementMicroarchitectureMicroservicesMicrosoft AccessNetworking SkillsNetwork PerformanceNginxNvidia CUDAOpenCLOpenMPOperational SystemsParallel ComputingPayment SystemsPerformance ManagementPostgreSQLPython (Programming Language)Quick EMUlator (QEMU)Real Time DataRedisRestful APIsScientific ComputatingSimulationsSoftware DebuggingSoftware EngineeringSpinnakerStorage SystemsSubsystemsSystem ProgrammingSystem SoftwareUser AuthenticationVerilogX86-64