Version | Network Installer | Local Installer |
---|---|---|
Windows 8.1 Windows 7 Win Server 2012 R2 Win Server 2008 R2 |
EXE (8.0MB) | EXE (939MB) |
cuFFT Patch | ZIP (52MB) , README | |
Windows Getting Started Guide |
Windows FAQ
Q: Where is the notebook installer?
A: Previous releases of the CUDA Toolkit had separate installation packages for notebook and desktop systems. Beginning with CUDA 7.0, these packages have been merged into a single package that is capable of installing on all supported platforms.
Q: What is the difference between the Network Installer and the Local Installer?
A: The Local Installer has all of the components embedded into it (toolkit, driver, samples). This makes the installer very large, but once downloaded, it can be installed without an internet connection. The Network Installer is a small executable that will only download the necessary components dynamically during the installation so an internet connection is required.
Q: Where do I get the GPU Deployment Kit (GDK) for Windows?
A: The installers give you an option to install the GDK. If you only want to install the GDK, then you should use the network installer, for efficiency.
Q: Where can I find old versions of the CUDA Toolkit?
A: Older versions of the toolkit can be found on the Legacy CUDA Toolkits page.
Q: Is cuDNN included as part of the CUDA Toolkit?
A: cuDNN is our library for Deep Learning frameworks, and can be downloaded separately from the cuDNN home page.
Version | Network Installer | Local Package Installer | Runfile Installer |
---|---|---|---|
Fedora 21 | RPM (3KB) | RPM (1GB) | RUN (1.1GB) |
OpenSUSE 13.2 | RPM (3KB) | RPM (1GB) | RUN (1.1GB) |
OpenSUSE 13.1 | RPM (3KB) | RPM (1GB) | RUN (1.1GB) |
RHEL 7 CentOS 7 |
RPM (10KB) | RPM (1GB) | RUN (1.1GB) |
RHEL 6 CentOS 6 |
RPM (18KB) | RPM (1GB) | RUN (1.1GB) |
SLES 12 | RPM (3KB) | RPM (1.1GB) | RUN (1.1GB) |
SLES 11 (SP3) | RPM (3KB) | RPM (1.1GB) | RUN (1.1GB) |
SteamOS 1.0-beta | RUN (1.1GB) | ||
Ubuntu 14.10 | DEB (3KB) | DEB (1.5GB) | RUN (1.1GB) |
Ubuntu 14.04* | DEB (10KB) | DEB (902MB) | RUN (1.1GB) |
Ubuntu 12.04 | DEB (3KB) | DEB (1.3GB) | RUN (1.1GB) |
GPU Deployment Kit | Included in Installer | Included in Installer | RUN (4MB) |
cuFFT Patch | TAR (122MB) , README | ||
Linux Getting Started Guide |
* Includes POWER8 cross-compilation tools.
Linux FAQ
Q: Where can I find the CUDA 7 Toolkit for my Jetson TK1?
A: Jetson TK1 is not supported by the CUDA 7 Toolkit. Please download the CUDA 6.5 Toolkit for Jetson TK1 instead.
Q: What is the difference between the Network Installer and the Local Installer?
A: The Local Installer has all of the components embedded into it (toolkit, driver, samples). This makes the installer very large, but once downloaded, it can be installed without an internal internet connection. The Network Installer is a small executable that will only download the necessary components dynamically during the installation so an internet connection is required to use this installer.
Q: Is cuDNN included as part of the CUDA Toolkit?
A: cuDNN is our library for Deep Learning frameworks, and can be downloaded separately from the cuDNN home page.
Version | Network Installer | Local Package Installer | Runfile Installer |
---|---|---|---|
Ubuntu 14.10 | DEB (3KB) | DEB (588MB) | |
Ubuntu 14.04 | DEB (3KB) | DEB (588MB) | |
GPU Deployment Kit | n/a | n/a | RUN (1.7MB) |
cuFFT Patch | TAR (105MB) , README | ||
Linux Getting Started Guide |
Linux Power8 FAQ
Q: What is the difference between the Network Installer and the Local Installer?
A: The Local Installer has all of the components embedded into it (toolkit, driver, samples). This makes the installer very large, but once downloaded, it can be installed without an internal internet connection. The Network Installer is a small executable that will only download the necessary components dynamically during the installation so an internet connection is required to use this installer.
Q: Is cuSOLVER available for the POWER8 architecture?
A: The initial release of the CUDA 7.0 toolkit omitted the cuSOLVER library from the installer. On May 29, 2015, new CUDA 7.0 installers were posted for the POWER8 architecture that included the cuSOLVER library. If you downloaded the CUDA 7.0 toolkit for POWER8 on or earlier than this date, and you need to use cuSOLVER, you will need to download the latest installer and re-install.
Version | Network Installer | Local Installer |
---|---|---|
10.9 10.10 |
DMG (0.4MB) | PKG (977MB) |
cuFFT Patch | TAR (104MB) , README | |
Mac Getting Started Guide |
MAC FAQ
Q: What is the difference between the Network Installer and the Local Installer?
A: The Local Installer has all of the components embedded into it (toolkit, driver, samples). This makes the installer very large, but once downloaded, it can be installed without an internal connection. The Network Installer is a small executable that will only download the necessary components dynamically during the installation so an internet connection is required to use this installer.
Q: Is cuDNN included as part of the CUDA Toolkit?
A: cuDNN is our library for Deep Learning frameworks, and can be downloaded separately from the cuDNN home page.
Q: What do I do if the Network Installer fails to run with the error message «The package is damaged and can’t be opened. You should eject the disk image»?
A: Check that your security preferences are set to allow apps downloaded from anywhere to run. This setting can be found under: System Preferences > Security & Privacy > General
Resources
- CUDA Documentation/Release Notes
- MacOS Tools
- Training
- Sample Code
- Forums
- Archive of Previous CUDA Releases
- FAQ
- Open Source Packages
- Submit a Bug
- Tarball and Zip Archive Deliverables
На чтение 3 мин Опубликовано Обновлено
Если вы работаете с высокоинтенсивными вычислениями на графических процессорах, то вам, вероятно, уже известно о технологии CUDA от Nvidia. CUDA позволяет использовать мощности GPU для обработки данных и ускорения вычислений.
Для работы с CUDA необходимо установить специальный набор инструментов — Nvidia Cuda Toolkit. В данной статье мы рассмотрим процесс установки CUDA Toolkit на операционную систему Windows 7.
Первым шагом является скачивание дистрибутива Cuda Toolkit. Вы можете найти его на официальном сайте Nvidia или на других ресурсах. Убедитесь, что вы выбрали версию для Windows 7 и правильную архитектуру вашей системы.
После завершения скачивания, запустите установочный файл и следуйте инструкциям на экране. Обычно установка состоит из нескольких шагов, включающих принятие лицензионного соглашения, выбор папки установки и установку дополнительных компонентов.
Что такое Nvidia Cuda Toolkit?
Технология Cuda (Compute Unified Device Architecture) позволяет использовать возможности графического процессора для выполнения общего назначения вычислений. Она предоставляет программистам более высокую производительность и эффективность при работе с вычислениями, которые могут быть распараллелены.
Набор инструментов Nvidia Cuda Toolkit включает в себя компилятор, библиотеки, документацию и другие утилиты, которые помогают разработчикам создавать, отлаживать и оптимизировать программы, использующие технологию Cuda.
Основные возможности Nvidia Cuda Toolkit: |
Разработка высокопроизводительных приложений с помощью языка программирования Cuda. |
Параллельная обработка данных на графическом процессоре. |
Оптимизация производительности приложений. |
Поддержка различных операционных систем, включая Windows 7. |
Расширение возможностей графического процессора для выполнения сложных математических операций и алгоритмов. |
Использование Nvidia Cuda Toolkit позволяет разработчикам создавать более эффективное и производительное программное обеспечение, которое может использовать мощности графического процессора для обработки больших объемов данных и сложных вычислений.
Описание и преимущества
Серия графических процессоров Nvidia CUDA (Compute Unified Device Architecture) представляет собой мощную платформу для выполнения вычислений общего назначения на GPU. Она позволяет разработчикам использовать преимущества параллельных вычислений и добиваться более высокой производительности в сравнении с использованием только центрального процессора (CPU).
Установка Nvidia Cuda Toolkit позволяет разработчикам обеспечивать более быструю обработку данных, улучшенную эффективность и скорость работы приложений. Кроме того, эта технология предлагает широкий набор библиотек и инструментов для разработки и оптимизации вычислительных задач, таких как машинное обучение, научные исследования, анализ данных и многое другое.
Основные преимущества Nvidia Cuda Toolkit:
- Параллельные вычисления: возможность использовать мощности графических процессоров для выполнения больших объемов вычислений одновременно, что позволяет существенно ускорить процесс обработки данных.
- Универсальность: возможность создания вычислительных приложений для различных областей – от научных исследований до графики и машинного обучения.
- Эффективность использования ресурсов: благодаря использованию графического процессора для параллельных вычислений, возможно значительно ускорить выполнение сложных задач, оптимизировать использование ресурсов и снизить нагрузку на центральный процессор.
- Большой набор инструментов: набор библиотек, заголовочных файлов и утилит упрощает разработку и оптимизацию приложений, а также предоставляет возможность легкого доступа к функциям графического процессора.
- Поддержка на различных платформах: Cuda Toolkit доступен для различных операционных систем, включая Windows 7, что позволяет разработчикам использовать его на своей предпочитаемой платформе.
Overview
Certified
What’s New
Features:
- C/C++ compiler
- Visual Profiler
- GPU-accelerated BLAS library
- GPU-accelerated FFT library
- GPU-accelerated Sparse Matrix library
- GPU-accelerated RNG library
- Additional tools and documentation
Highlights:
- Easier Application Porting
- Share GPUs across multiple threads
- Use all GPUs in the system concurrently from a single host thread
- No-copy pinning of system memory, a faster alternative to cudaMallocHost()
- C++ new/delete and support for virtual functions
- Support for inline PTX assembly
- Thrust library of templated performance primitives such as sort, reduce, etc.
- Nvidia Performance Primitives (NPP) library for image/video processing
- Layered Textures for working with same size/format textures at larger sizes and higher performance
- Faster Multi-GPU Programming
- Unified Virtual Addressing
- GPUDirect v2.0 support for Peer-to-Peer Communication
- New & Improved Developer Tools
- Automated Performance Analysis in Visual Profiler
- C++ debugging in CUDA-GDB for Linux and MacOS
- GPU binary disassembler for Fermi architecture (cuobjdump)
- Parallel Nsight 2.0 now available for Windows developers with new debugging and profiling features.
What’s New:
- Added a new API, cudaGraphNodeSetEnabled(), to allow disabling nodes in an instantiated graph. Support is limited to kernel nodes in this release. A corresponding API, cudaGraphNodeGetEnabled(), allows querying the enabled state of a node.
- Full release of 128-bit integer (__int128) data type including compiler and developer tools support. The host-side compiler must support the __int128 type to use this feature.
- Added ability to disable NULL kernel graph node launches.
- Added new NVML public APIs for querying functionality under Wayland.
- Added L2 cache control descriptors for atomics.
- Large CPU page support for UVM managed memory.
1.3. CUDA Compilers
11.6
- VS2022 Support: CUDA 11.6 officially supports the latest VS2022 as host compiler. A separate Nsight Visual Studio installer 2022.1.1 must be downloaded from here. A future CUDA release will have the Nsight Visual Studio installer with VS2022 support integrated into it.
- New instructions in public PTX: New instructions for bit mask creation — BMSK and sign extension — SZEXT are added to the public PTX ISA. You can find documentation for these instructions in the PTX ISA guide: BMSK and SZEXT.
- Unused Kernel Optimization: In CUDA 11.5, unused kernel pruning was introduced with the potential benefits of reducing binary size and improving performance through more efficient optimizations. This was an opt-in feature but in 11.6, this feature is enabled by default. As mentioned in the 11.5 blog here, there is an opt-out flag that can be used in case it becomes necessary for debug purposes or for other special situations.
- $ nvcc -rdc=true user.cu testlib.a -o user -Xnvlink -ignore-host-info
- In addition to the -arch=all and -arch=all-major options added in CUDA 11.5, NVCC introduced -arch= native in CUDA 11.5 update1. This -arch=native option is a convenient way for users to let NVCC determine the right target architecture to compile the CUDA device code to based on the GPU installed on the system. This can be particularly helpful for testing when applications are run on the same system they are compiled in.
- Generate PTX from nvlink: Using the following command line, device linker, nvlink will produce PTX as an output in addition to CUBIN:
- nvcc -dlto -dlink -ptx
- Device linking by nvlink is the final stage in the CUDA compilation process. Applications that have multiple source translation units have to be compiled in separate compilation mode. LTO (introduced in CUDA 11.4) allowed nvlink to perform optimizations at device link time instead of at compile time so that separately compiled applications with several translation units can be optimized to the same level as whole program compilations with a single translation unit. However, without the option to output PTX, applications that cared about forward compatibility of device code could not benefit from Link Time Optimization or had to constrain the device code to a single source file.
- With the option for nvlink that performs LTO to generate the output in PTX, customer applications that require forward compatibility across GPU architectures can span across multiple files and can also take advantage of Link Time Optimization.
- Bullseye support: NVCC compiled source code will work with code coverage tool Bullseye. The code coverage is only for the CPU or the host functions. Code coverage for device function is not supported through bullseye.
- INT128 developer tool support: In 11.5, CUDA C++ support for 128-bit was added. In this release, developer tools supports the datatype as well. With the latest version of libcu++, int 128 data type is supported by math functions.
cuSOLVER
New Features:
- New singular value decomposition (GESVDR) is added. GESVDR computes partial spectrum with random sampling, an order of magnitude faster than GESVD.
- libcusolver.so no longer links libcublas_static.a; instead, it depends on libcublas.so. This reduces the binary size of libcusolver.so. However, it breaks backward compatibility. The user has to link libcusolver.so with the correct version of libcublas.so.
cuSPARSE
New Features:
- New Tensor Core-accelerated Block Sparse Matrix — Matrix Multiplication (cusparseSpMM) and introduction of the Blocked-Ellpack storage format.
- New algorithms for CSR/COO Sparse Matrix — Vector Multiplication (cusparseSpMV) with better performance.
- Extended functionalities for cusparseSpMV:
- Support for the CSC format.
- Support for regular/complex bfloat16 data types for both uniform and mixed-precision computation.
- Support for mixed regular-complex data type computation.
- Support for deterministic and non-deterministic computation.
- New algorithm (CUSPARSE_SPMM_CSR_ALG3) for Sparse Matrix — Matrix Multiplication (cusparseSpMM) with better performance especially for small matrices.
- New routine for Sampled Dense Matrix — Dense Matrix Multiplication (cusparseSDDMM) which deprecated cusparseConstrainedGeMM and provides better performance.
- Better accuracy of cusparseAxpby, cusparseRot, cusparseSpVV for bfloat16 and half regular/complex data types.
- All routines support NVTX annotation for enhancing the profiler time line on complex applications.
Deprecations:
- cusparseConstrainedGeMM has been deprecated in favor of cusparseSDDMM.
- cusparseCsrmvEx has been deprecated in favor of cusparseSpMV.
- COO Array of Structure (CooAoS) format has been deprecated including cusparseCreateCooAoS, cusparseCooAoSGet, and its support for cusparseSpMV.
Known Issues:
- cusparseDestroySpVec, cusparseDestroyDnVec, cusparseDestroySpMat, cusparseDestroyDnMat, cusparseDestroy with NULL argument could cause segmentation fault on Windows.
Resolved Issues:
- cusparseAxpby, cusparseGather, cusparseScatter, cusparseRot, cusparseSpVV, cusparseSpMV now support zero-size matrices.
- cusparseCsr2cscEx2 now correctly handles empty matrices (nnz = 0).
- cusparseXcsr2csr_compress now uses 2-norm for the comparison of complex values instead of only the real part.
- NPPNew features:New APIs added to compute Distance Transform using Parallel Banding Algorithm (PBA):
- nppiDistanceTransformPBA_xxxxx_C1R_Ctx() — where xxxxx specifies the input and output combination: 8u16u, 8s16u, 16u16u, 16s16u, 8u32f, 8s32f, 16u32f, 16s32f
- nppiSignedDistanceTransformPBA_32f_C1R_Ctx()
Resolved issues:
- Fixed the issue in which Label Markers adds zero pixel as object region.
- NVJPEG
New Features:
- nvJPEG decoder added a new API to support region of interest (ROI) based decoding for batched hardware decoder:
- nvjpegDecodeBatchedEx()
- nvjpegDecodeBatchedSupportedEx()
cuFFTKnown Issues:
- cuFFT planning and plan estimation functions may not restore correct context affecting CUDA driver API applications.
- Plans with strides, primes larger than 127 in FFT size decomposition and total size of transform including strides bigger than 32GB produce incorrect results.
Resolved Issues:
- Previously, reduced performance of power-of-2 single precision FFTs was observed on GPUs with sm_86 architecture. This issue has been resolved.
- Large prime factors in size decomposition and real to complex or complex to real FFT type no longer cause cuFFT plan functions to fail.
- CUPTIDeprecations early notice:The following functions are scheduled to be deprecated in 11.3 and will be removed in a future release:
- NVPW_MetricsContext_RunScript and NVPW_MetricsContext_ExecScript_Begin from the header nvperf_host.h.
- cuptiDeviceGetTimestamp from the header cupti_events.h
Complete release notes can be found here.
Fast servers and clean downloads. Tested on TechSpot Labs. Here’s why you can trust us.
Last updated:
User rating:
23 votes
Popular apps
in For Developers
|
win-64/cuda-toolkit-12.2.2-0.tar.bz2
main
cuda-12.2.2
|
linux-ppc64le/cuda-toolkit-12.2.2-0.tar.bz2
main
cuda-12.2.2
|
linux-aarch64/cuda-toolkit-12.2.2-0.tar.bz2
main
cuda-12.2.2
|
linux-64/cuda-toolkit-12.2.2-0.tar.bz2
main
cuda-12.2.2
|
win-64/cuda-toolkit-12.2.1-0.tar.bz2
main
cuda-12.2.1
|
linux-ppc64le/cuda-toolkit-12.2.1-0.tar.bz2
main
cuda-12.2.1
|
linux-aarch64/cuda-toolkit-12.2.1-0.tar.bz2
main
cuda-12.2.1
|
linux-64/cuda-toolkit-12.2.1-0.tar.bz2
main
cuda-12.2.1
|
win-64/cuda-toolkit-12.2.0-0.tar.bz2
main
cuda-12.2.0
|
linux-aarch64/cuda-toolkit-12.2.0-0.tar.bz2
main
cuda-12.2.0
|
linux-ppc64le/cuda-toolkit-12.2.0-0.tar.bz2
main
cuda-12.2.0
|
linux-64/cuda-toolkit-12.2.0-0.tar.bz2
main
cuda-12.2.0
|
win-64/cuda-toolkit-12.1.1-0.tar.bz2
main
cuda-12.1.1
|
linux-ppc64le/cuda-toolkit-12.1.1-0.tar.bz2
main
cuda-12.1.1
|
linux-aarch64/cuda-toolkit-12.1.1-0.tar.bz2
main
cuda-12.1.1
|
linux-64/cuda-toolkit-12.1.1-0.tar.bz2
main
cuda-12.1.1
|
win-64/cuda-toolkit-12.1.0-0.tar.bz2
main
cuda-12.1.0
|
linux-ppc64le/cuda-toolkit-12.1.0-0.tar.bz2
main
cuda-12.1.0
|
linux-aarch64/cuda-toolkit-12.1.0-0.tar.bz2
main
cuda-12.1.0
|
linux-64/cuda-toolkit-12.1.0-0.tar.bz2
main
cuda-12.1.0
|
win-64/cuda-toolkit-12.0.1-0.tar.bz2
main
cuda-12.0.1
|
linux-ppc64le/cuda-toolkit-12.0.1-0.tar.bz2
main
cuda-12.0.1
|
linux-aarch64/cuda-toolkit-12.0.1-0.tar.bz2
main
cuda-12.0.1
|
linux-64/cuda-toolkit-12.0.1-0.tar.bz2
main
cuda-12.0.1
|
win-64/cuda-toolkit-12.0.0-0.tar.bz2
main
cuda-12.0.0
|
linux-ppc64le/cuda-toolkit-12.0.0-0.tar.bz2
main
cuda-12.0.0
|
linux-aarch64/cuda-toolkit-12.0.0-0.tar.bz2
main
cuda-12.0.0
|
linux-64/cuda-toolkit-12.0.0-0.tar.bz2
main
cuda-12.0.0
|
win-64/cuda-toolkit-11.8.0-0.tar.bz2
main
cuda-11.8.0
|
linux-ppc64le/cuda-toolkit-11.8.0-0.tar.bz2
main
cuda-11.8.0
|
linux-aarch64/cuda-toolkit-11.8.0-0.tar.bz2
main
cuda-11.8.0
|
linux-64/cuda-toolkit-11.8.0-0.tar.bz2
main
cuda-11.8.0
|
win-64/cuda-toolkit-11.7.0-0.tar.bz2
main
cuda-11.7.0
|
win-64/cuda-toolkit-11.6.2-0.tar.bz2
main
cuda-11.6.2
|
win-64/cuda-toolkit-11.6.1-0.tar.bz2
main
cuda-11.6.1
|
win-64/cuda-toolkit-11.6.0-0.tar.bz2
main
cuda-11.6.0
|
win-64/cuda-toolkit-11.5.2-0.tar.bz2
main
cuda-11.5.2
|
win-64/cuda-toolkit-11.5.1-0.tar.bz2
main
cuda-11.5.1
|
win-64/cuda-toolkit-11.5.0-0.tar.bz2
main
cuda-11.5.0
|
win-64/cuda-toolkit-11.4.4-0.tar.bz2
main
cuda-11.4.4
|
win-64/cuda-toolkit-11.4.3-0.tar.bz2
main
cuda-11.4.3
|
win-64/cuda-toolkit-11.4.2-0.tar.bz2
main
cuda-11.4.2
|
win-64/cuda-toolkit-11.4.1-0.tar.bz2
main
cuda-11.4.1
|
win-64/cuda-toolkit-11.4.0-0.tar.bz2
main
cuda-11.4.0
|
linux-ppc64le/cuda-toolkit-11.7.0-0.tar.bz2
main
cuda-11.7.0
|
linux-aarch64/cuda-toolkit-11.7.0-0.tar.bz2
main
cuda-11.7.0
|
linux-64/cuda-toolkit-11.7.0-0.tar.bz2
main
cuda-11.7.0
|
linux-ppc64le/cuda-toolkit-11.6.2-0.tar.bz2
main
cuda-11.6.2
|
linux-aarch64/cuda-toolkit-11.6.2-0.tar.bz2
main
cuda-11.6.2
|
linux-64/cuda-toolkit-11.6.2-0.tar.bz2
main
cuda-11.6.2