Roskilde University Digital Archive >
RUCs Digitale Projektbibliotek / RU Digital Project Library  >
RUC overbygningsrapporter / RU Projects: Degree Programs >
Projektrapporter og specialer / Projectreports and master thesis >

Please use this identifier to cite or link to this item: http://rudar.ruc.dk/handle/1800/7445

Title: Parallel programming with CUDA
Other Titles: Parallelprogrammering med CUDA
Authors: Nielsen, Jon
Mosegaard, Truls
Advisor: Helsgaun, Keld
Keywords: parallel
programming
programmering
cuda
gpu
c
nbody
n-body
barnes-hut
barnes hut
openmp
Examination Date: 22-Mar-2012
Issue Date: 10-Apr-2012
Abstract: This report documents our master thesis project, which is about parallel programming with CUDA, the NVIDIA GPU architecture with support for general purpose computing. The purpose of the thesis is to uncover the qualities of CUDA as a parallel computing platform, determining the possibilities and limitations of its ability to handle different types of algorithms. We examine this by performing a case study of two algorithms used in the computationally intensive field of n-body simulations. In our report we present the topics of our thesis through chapters containing overviews of the relevant theory. Based on this we investigate how CUDA performs using the embarrassingly parallel n-body all-pairs algorithm, as well as the Barnes-Hut algorithm, which is partially irregular with regards to parallelization due to its datastructure. We have found that CUDA performs exceptionally well on n-body all-pairs, observing up to a 100x speed-up on an optimized GPU implementation compared to an implementation running on a computer with 16 CPU cores. The CUDA implementation of the Barnes-Hut algorithm also shows increased performance, as the most costly part of the algorithm is parallelizable. We find that although it is possible to implement an irregular algorithm in CUDA, doing so with success requires an understanding of CUDA programming and the CUDA model of parallelism. We conclude that CUDA performs well on massively parallel problems and can be useful for irregular problems as well. Programming for it can be complex when optimizing or when the algorithm is not easily parallelized. The platform has a good performance and potential to accelerate suitable applications.
URI: http://rudar.ruc.dk/handle/1800/7445
Subject: Thesis
Education: Datalogi / Computer Science - Master thesis
Appears in Collections:Datalogi rapporter / Computer Science Projects
Projektrapporter og specialer / Projectreports and master thesis

Files in This Item:

File Description SizeFormat
appendix_e.zipAppendix E: Source Code42,98 MBZip archiveView/Open
zreport.pdf3,58 MBAdobe PDFView/Open


This item is protected by original copyright

Recommend this item

Items in RUDAR are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Valid XHTML 1.0! RUDAR Software Copyright © 2002-2011  Duraspace