Avery Huang


Location

Pittsburgh, PA
Education
    Carnegie Mellon University
    August 2024 - December 2025
    degree
    Master's
    major
    Computer Science
    Chinese University of Hong Kong
Work Experience
    Moore Threads Tech
    C GPU Architect Architecture Memory Subsystem
    July 2023 - August 2024
    company
    Moore Threads Tech
    title
    C GPU Architect Architecture Memory Subsystem
    overview
    - Simulated function and performance of GPU TLB, MMU, and cache bus interface in modern C++ within the full-chip - 10000 lines of code - Implemented a CUDA C++ framework to dissect micro-architecture of Nvidia GPUs. Analyzed cache latency, bandwidth, and Network-on-Chip utilization through benchmarking kernels - Purposed a configurable hash function for MMU cache bank interleaving, and a unified reversible hash function across - Designed double end-to-end bandwidth feature on arch to support 2x speedup of MMA operation on texture memory - Performed performance validation and analysis. Developed CUDA and OpenCL kernels exposing multiple address access
    Intel
    Cloud System Software Intern DCAI Cloud Engineering
    February 2022 - June 2023
    Shenzhen Research Institute of Big Data
    Software Engineer Intern
    August 2021 - January 2022
Skills
Languages
ChineseEnglish
Skills
Adobe FlashAmazon Elastic Compute CloudAmazon S3Amazon Web ServicesApplication Programming Interfaces (APIs)ArchitectureArtificial IntelligenceAssembly and InstallationAutoscalingAWS LambdaBenchmarking SkillsBig DataBoolean AlgebraBooting (BIOS)BuffersCachingCloud ComputingCloud EngineeringCloud Platform SystemComputer ArchitectureComputer ProgrammingContinuous IntegrationC++ (Programming Language)Creation of TexturesCycle Time VariationDashboardsData AnalysisData ProcessingDistributed SystemsDockerFlask (Web Framework)Graphics Processing Unit (GPU)Information TechnologyJenkinsKnowledge of EngineeringLinuxLinux KernelLoad BalancingMemory ManagementMicroarchitectureMicroservicesMicrosoft AccessNetworking SkillsNetwork PerformanceNginxNvidia CUDAOpenCLOpenMPOperational SystemsParallel ComputingPayment SystemsPerformance ManagementPostgreSQLPython (Programming Language)Quick EMUlator (QEMU)Real Time DataRedisRestful APIsScientific ComputatingSimulationsSoftware DebuggingSoftware EngineeringSpinnakerStorage SystemsSubsystemsSystem ProgrammingSystem SoftwareUser AuthenticationVerilogX86-64