Avery Huang

Location

Pittsburgh, PA

Education

Carnegie Mellon University

August 2024 - December 2025

degree

Master's

major

Computer Science

Chinese University of Hong Kong

Work Experience

Moore Threads Tech

C GPU Architect Architecture Memory Subsystem

July 2023 - August 2024

company

Moore Threads Tech

title

C GPU Architect Architecture Memory Subsystem

overview

- Simulated function and performance of GPU TLB, MMU, and cache bus interface in modern C++ within the full-chip - 10000 lines of code - Implemented a CUDA C++ framework to dissect micro-architecture of Nvidia GPUs. Analyzed cache latency, bandwidth, and Network-on-Chip utilization through benchmarking kernels - Purposed a configurable hash function for MMU cache bank interleaving, and a unified reversible hash function across - Designed double end-to-end bandwidth feature on arch to support 2x speedup of MMA operation on texture memory - Performed performance validation and analysis. Developed CUDA and OpenCL kernels exposing multiple address access

Intel

Cloud System Software Intern DCAI Cloud Engineering

February 2022 - June 2023

Shenzhen Research Institute of Big Data

Software Engineer Intern

August 2021 - January 2022

Skills

Languages

ChineseEnglish

Skills

Adobe FlashAmazon Elastic Compute CloudAmazon S3Amazon Web ServicesApplication Programming Interfaces (APIs)ArchitectureArtificial IntelligenceAssembly and InstallationAutoscalingAWS LambdaBenchmarking SkillsBig DataBoolean AlgebraBooting (BIOS)BuffersCachingCloud ComputingCloud EngineeringCloud Platform SystemComputer ArchitectureComputer ProgrammingContinuous IntegrationC++ (Programming Language)Creation of TexturesCycle Time VariationDashboardsData AnalysisData ProcessingDistributed SystemsDockerFlask (Web Framework)Graphics Processing Unit (GPU)Information TechnologyJenkinsKnowledge of EngineeringLinuxLinux KernelLoad BalancingMemory ManagementMicroarchitectureMicroservicesMicrosoft AccessNetworking SkillsNetwork PerformanceNginxNvidia CUDAOpenCLOpenMPOperational SystemsParallel ComputingPayment SystemsPerformance ManagementPostgreSQLPython (Programming Language)Quick EMUlator (QEMU)Real Time DataRedisRestful APIsScientific ComputatingSimulationsSoftware DebuggingSoftware EngineeringSpinnakerStorage SystemsSubsystemsSystem ProgrammingSystem SoftwareUser AuthenticationVerilogX86-64