• 13.04.2011: Version 0.1.4 for CUDA 3.2 was just made available on the homepage
  • 21.12.2009: A release candidate of version 0.1.4 for CUDA 3.0 was just released in our newsgroup04.05.2009: Version 0.1.2 was just released
  • 28.02.2009: Version 0.1.1 has a bug that requires all CUDA kernels to be compiled with the option "-DNVCC". This will be fixed in the next release.
  • 16.01.2009: The paper "CuPP -- A framework for easy CUDA integration" was accepted for HIPS 2009.
  • 09.01.2009: Version 0.1.1 was released (release name: Daiquiri)

CuPP is our newly developed C++ framework designed to ease integration of NVIDIAs GPGPU system CUDA into existing C++ applications. CuPP provides interfaces to reoccuring tasks that are easier to use than the standard CUDA interfaces.

Important links

DescriptionThe CuPP framework consists of 5 highly interwoven parts of which some replace the existing CUDA counterparts whereas others offer new functionalities. 

  • Device management: Device management is not done implicit by associating a thread with a device as it is done by CUDA. Instead, the developer is forced to create a device handle (cupp::device), which is passed to all CuPP functions using the device, e.g. kernel calls and memory allocation.
  • Memory management: Two different memory management concepts are available. 
    • One is identical to the one offered by CUDA, unless that exceptions are thrown when an error occurs instead of returning an error code. To ease the development with this basic approach, a boost library-compliant shared pointer for global memory is supplied.
    • The second type of memory management uses a class called cupp::memory1d. Objects of this class represent a linear block of global memory. The memory is allocated when the object is created and freed when the object is destroyed. Data can be transferred to the memory from any data structure supporting iterators.
  • C++ kernel call: The CuPP kernel call is implemented by a C++ functor (cupp::kernel), which adds a call by reference like semantic to basic CUDA kernel calls. This can be used to pass datastructures like cupp::vector to a kernel, so the device can modify them.
  • Support for classes: Using a technique called "type transformations" generic C++ classes can easily be transferred to and from device memory.
  • Data structures: Currently only a std::vector wrapper offering automatic memory management is supplied. This class also implements a feature called lazy memory copying, to minimize any memory transfers between device and host memory. Currently no other datastructures are supplied, but can be added with ease.

A detailed description of all functions can be found in [Bre08a] or at our documentation website.

For further information please contact one of the following members of our staff:
Jens Breitbart
Prof. Dr. Claudia Fohry


  • [Bre09a] J. Breitbart: CuPP -- A framework for easy CUDA integration, HiPS 2009 workshop with IPDPS 2009, Rome, Italy, May 2009.
  • [Bre08b] J. Breitbart: Case studies on GPU usage and data structure design, Master Thesis, University of Kassel, 2008.
  • [Bre08a] J. Breitbart: A framework for easy CUDA integration in C++ applications, Diploma Thesis, University of Kassel, 2008.