CuLab
GPU Toolkit for LabVIEW
Release Notes
V3.0.4.67
General description
This is a major update which brings lots of new features and improvements.
â€‹
Backward Compatibility
This version breaks compatibility with previous versions of the toolkit.
â€‹
New Featuresâ€‹â€‹

â€‹Added “Computer Vision” palette with the following new functions:

CU_CV_GrayMorpholgy.vi  Performs grayscale morphological transformations.

Supported operations: Erode, Dilate

Supported input types: U8, U16, SGL.

Batch mode supported: False.


CU_CV_Resample.vi  Performs resampling operation.

Supported input types: All CV types.

Batch mode supported: True.


CU_CV_Extract.vi  Extracts a portion of the input image.

Supported input types: All CV types.

Batch mode supported: True.

Images are represented as T2D tensors.

Grayscale images are represented with the help of U8, I8, U16, I16, SGL datatypes, while color images (e.g. ARGB) are represented using U32 datatype.

Some functions also support batch mode of operation in which case the batch of images are provided as T3D tensors.



Added new function in “Signal Operation” subpalette:

CU_Digital_Down_Conversion.vi  Performs digital down conversion.

Supported input types: SGL, DBL, CSG, CDB.

Batch mode supported: True.


CU_Convolution_Batch.vi – Performs batch convolution operation.

Supports 1D MCHMCH, MCH1CH modes.

Batched instances in CU_Convolution polymorphic vi have been removed.


Added support for 2D convolution to CU_Convolution.vi.

Supported input types: SGL, DBL.



Changed the “FFT” subpalette name to “Transforms” and added the following functions:

CU_Hilbert_Transform.vi  Computes the fast Hilbert transform of the input Tensor.

Accepted input Tensor types: SGL, DBL

Supported dimensionalities: T1D.


CU_Analytic_Signal.vi  Computes the complex Analytic Signal of the realvalued input Tensor.

Accepted input Tensor types: SGL, DBL

Return Tensor types: CSG, CDB

Supported dimensionalities: T1D.



Added “Boolean Operation” palette with the following functions.

CU_Boolean_2in.vi – compound function for different binary(twoinput) logical operations.

Supported Boolean Operations:

AND

OR

XOR

NAND

NOR

XNOR

Select X

Select Y



CU_Boolean_Not.vi

CU_And_Array_Elements.vi

CU_Or_Array_Elements.vi


Added “Comparison Operation” palette with the following functions.

CU_Compare_1_Input.vi – compound function for different unary (singleinput) comparison operations.

Supported Comparison Operations:
1. Equal To 0?
2. Not Equal To 0?
3. Greater Than 0?
4. Greater Or Equal To 0?
5. Less than 0?
6. Less Or Equal To 0?
Supports all Tensor types, except complex types.


CU_Compare_2_Inputs.vi – compound function for different binary (twoinput) comparison operations.

Supported Comparison Operations:

Equal?

Not Equal?

Greater?

Greater Or Equal?

Less?

Less Or Equal?


Supports all Tensor types.

Accepts Comparison with a Constant.


CU_In_Range_and_Coerce.vi

Supports all Tensor types, except complex types.

Supports all tensor dimensionalities except T0D.


CU_Max_Min.vi

Supports all Tensor types, except complex types.

Supports tensor dimensionalities: T1D, T2D.



Added “Lookup” subpalette in “Array” palette with the following functions.

CU_Array_Lookup_by_Index.vi.  Returns a Tensor containing elements of the input Tensor specified by Index Tensor.

Supports all tensor dimensionalities except T0D.

Supports all Tensor types.


CU_Array_Lookup_by_Bool.vi.

Description: Returns a Tensor containing elements of the input Tensor, that have a value 1 (TRUE) in Boolean input Tensor.

Supports all tensor dimensionalities except T0D.

Supports all Tensor types, except complex types.



Added new function in “Array” palette.

CU_Replace_Array_Elemenets_by_Index_Batch.vi  replaces elements in input Tensor with elements from SubTensor at indices specified in IndexTensor

Supports tensor dimensionalities: T1D, T2D.

Accepts all input Tensor types.



Added new functions in “Numeric” palette.

CU_Add_Broadcast.vi  Performs broadcast addition of T2D with T1D

Supported input types: SGL, DBL, CSG, CDB.


CU_Multiply_Broadcast.vi  Performs broadcast multiplication of T2D with T1D

Supported input types: SGL, DBL, CSG, CDB.



Added option for swapping inputs in tensorconstant operations for CU_Subtract.vi and CU_Divide.vi

Added the following function in “Conversion” subpalette.

CU_To_U64.vi


Added “Utilities” palette with the following function:

Get_Exec_Time.vi – returns the execution time.


Added the following function in “Device Management” subpalette.

CU_Get_CUDA_Version.vi – returns CUDA version.

CU_Reset_GPU.vi  Destroy all allocations and reset all states.


Added GPU info tool in help menu.

Requirement for Runtime licensing has been added to this version of the toolkit.
â€‹
Optimizations

Greatly optimized the efficiency of data movement between CPU and GPU which leads to significant (3040% for common benchmarks) improvement of toolkit overall performance.

Optimized the execution of numeric conversion functions.

Optimized the execution of CU_Array_Subset.vi

Significantly improved the performance of CU_Tensor_Create_Push.vi and CU_Tensor_Push.vi.

Optimized the memory, context, and other resource management functionalities.

Other optimizations.
â€‹
Extended Functionalities

The FIR filter specification has been incorporated into CU_Rational_Resample.vi for both singlechannel and multichannel (batch) inputs.

The following functions have been updated to accept a constant as a second input.

CU_Logarithm_Base_X.vi

CU_Power_Of_X.vi


All numeric conversion functions now support all tensor types.

Error Dialog Box returns the full path for call chain.

CU_Tensor_Create_Push.vi and CU_Tensor_Push.vi check if Input Tensor and CPU Data Array dimensions match before pushing the data to GPU.

Automated the process of adding dependency DLLs when building applications.

The help file was updated to reflect the updated functionalities.
â€‹
Bug Fixes

Fixed input tensor types for T1D:DBL instance of CU_Inverse_Tangent_2_input.vi.

The functionality of T4D instances of CU_Array_Subset.vi have been corrected.

Fixed array max dimension (65535) issue in CU_Power_Spectrum.vi.

CU_Decimate_Single_Shot.vi connector pane changed to match with LabVIEW Decimate Single Shot vi.

Renamed the following functions to conform with the common naming conventions.

From “CU_Square Root.vi” to “CU_Square_Root.vi”

From “CU_Add Array Elements.vi” to “CU_Add_Array_Elements.vi”


Fixed the bug of incorrect results in CU_Square.vi for complex inputs.

Fixed the CPU memory leakage bug in CU_Tensor_Destroy.vi.

Other minor fixes.
â€‹â€‹
â€‹
V2.2.1.55
â€‹
Backward Compatibility
This is a minor update which does not break backward compatibility with v6.x.x versions of the toolkit.
Featuresâ€‹â€‹

Added “Boolean Operation” palette with the following functions.

CU_Boolean_2in.vi – compound function for different binary(twoinput) logical operations.

Supported Boolean Operations:

AND

OR

XOR

NAND

NO

XNOR

Select X

Select Y


CU_Boolean_Not.vi

CU_And_Array_Elements.vi

CU_Or_Array_Elements.vi



Added “Comparison Operation” palette with the following functions.

CU_Compare_1_Input.vi – compound function for different unary (singleinput) comparison operations.

Supported Comparison Operations:

Equal To 0?

Not Equal To 0?

Greater Than 0?

Greater Or Equal To 0?

Less than 0?

Less Or Equal To 0?


Supports all Tensor types, except complex types.


CU_Compare_2_Inputs.vi – compound function for different binary (twoinput) comparison operations.

Supported Comparison Operations:

Equal?

Not Equal?

Greater?

Greater Or Equal?

Less?

Less Or Equal?


Supports all Tensor types.

Accepts Comparison with a Constant.


CU_In_Range_and_Coerce.vi

Supports all Tensor types, except complex types.

Supports all tensor dimensionalities except T0D.


CU_Max_Min.vi

Supports all Tensor types, except complex types.

Supports tensor dimensionalities: T1D, T2D.



Added “Lookup” subpalette in “Array” palette with the following functions.

CU_Array_Lookup_by_Index.vi.  Returns a Tensor containing elements of the input Tensor specified by Index Tensor.

Supports all tensor dimensionalities except T0D.

Supports all Tensor types.


CU_Array_Lookup_by_Bool.vi.

Description: Returns a Tensor containing elements of the input Tensor, that have a value 1 (TRUE) in Boolean input Tensor.

Supports all tensor dimensionalities except T0D.

Supports all Tensor types, except complex types.



Added the following function in “Array” palette

CU_Replace_Array_Elemenets_by_Index_Batch.vi  replaces elements in input Tensor with elements from SubTensor at indices specified in IndexTensor

Supports tensor dimensionalities: T1D, T2D.

Accepts all input Tensor types



Added “Utilities” palette with the following function:

Get_Exec_Time.vi – Returns the relative current time in seconds, as well as time difference between execution of two instances of this VI.


Changed the “FFT” subpalette name to “Transforms” and added the following functions.

CU_Hilbert_Transform.vi  Computes the fast Hilbert transform of the input Tensor.

Accepted input Tensor types: SGL, DBL

Supported dimensionalities: T1D


CU_Analytic_Signal.vi  Computes the complex Analytic Signal of the realvalued input Tensor.

Accepted input Tensor types: SGL, DBL

Return Tensor types: CSG, CDB

Supported dimensionalities: T1D


â€‹
Optimizations

Significantly improved the performance of CU_Tensor_Create_Push.vi and CU_Tensor_Push.vi.

Optimized execution of CU_to_SGL.vi function.

Other optimizations.
â€‹
Extended Functionalities

Error Dialog Box returns the full path for call chain.

CU_Tensor_Create_Push.vi and CU_Tensor_Push.vi check if Input Tensor and CPU Data Array dimensions match before pushing the data to GPU.

The help file was updated to reflect the updated functionalities.
â€‹
Bug Fixes

Fixed array max dimension (65535) issue in CU_Power_Spectrum.vi.

CU_Decimate_Single_Shot.vi connector pane changed to match with LabVIEW Decimate Single Shot vi.

Changed the name of "CU_Add Array Elements.vi" to "CU_Add_Array_Elements.vi".

Other minor fixes.
â€‹â€‹
â€‹
V2.1.1.50
â€‹
Backward Compatibility
This is a minor update which does not break backward compatibility with previous version of the toolkit.
â€‹
Features

Added Signal Operation subpalette with the following functions.

CU_Decimate_Single_Shot.vi

Supports 1Ch1Ch, MCh1Ch, MChMCh modes.

Accepts (SGL, DBL, CSG, CDB) Tensor Types.


CU_Rational_Resampl.vi

Supports single channel mode.

Accepts (SGL, DBL, CSG, CDB) Tensor Types.


CU_Convolution.vi

Supports 1Ch1Ch, MCh1Ch, MChMCh modes.

Accepts (SGL, DBL, CSG, CDB) Tensor Types.


Cross_Correlation.vi

Supports single channel mode.

Accepts (SGL, DBL, CSG, CDB) Tensor Types.



Added a function to the Complex subpalette.

CU_Interleaved_to_Complex.vi.

Description: Converts interleaved sampled IQ data into complex representation. Designed to minimize data copy overhead during conversion.

Supports all tensor dimensionalities except T0D.

Accepted input datatypes (I8, U8, I16, U16, I32, U32, I64, U64, SGL, DBL)

Supported output datatypes (CSG, CDB)



Added following functions to the Device Management subpalette.

CU_Get_GPU_List.vi

Returns list of Nvidia GPUs available on the PC.


CU_Get_GPU_Properties.vi

Returns the properties for the selected GPU ID.


CU_Set_GPU.vi

Sets the selected GPU to be active.



Extended Functionalities

CU_Sine.vi, CU_Cosine.vi and CU_exponential.vi accept tensors with complex data representations (CSG, CDB) as input.


The help file was updated to reflect the updated functionalities.â€‹â€‹
â€‹
â€‹Bug Fixes

Fixed array max dimension issue in CU_Transpose_2D_Array.vi.

Fixed “Dest in” memory allocation issue in CU_Power_Spectrum.vi.

Fixed issue when CU_Square_Root.vi returned incorrect results when imaginary part is 0, and when run in inplace mode.

Other minor fixes.
â€‹
â€‹
V2.0.1.39
General Description
This version of CuLab toolkit brings new functionalities and improves the performance of existing ones.
â€‹
Backward Compatibility
This is a minor update which does not break backward compatibility with v6.x.x versions of the toolkit.
Features

Added Trigonometric functions

â€‹Sine

Cosine

Tangent

Cotangent

Inverse Sine

Inverse Cosine

Inverse Tangent

Inverse Tangent 2 Input (atan2)

Inverse Cotangent

Sine & Cosine

Sinc


Added Exponential functions

Exponential

Exponential Arg 1

Logarithm Base 10

Logarithm Base 2

Logarithm Base X

Natural Logarithm

Natural Logarithm Arg +1

Power Of 10

Power Of 2

Power Of X

Yth Root of X


Added Hyperbolic functions

Hyperbolic Sine

Hyperbolic Cosine

Hyperbolic Tangent

Hyperbolic Cotangent

Inverse Hyperbolic Sine

Inverse Hyperbolic Cosine

Inverse Hyperbolic Tangent

Inverse Hyperbolic Cotangent


Added missed complex APIs

Complex to Polar

Polar to Complex

Polar to Re/Im

Re/Im to Polar


Added function for Quotient & Remainder

Added support for TensorConstant operations for binary numeric operations. This allows to choose a CPU based constant as a second operand

Added new function generation functions

Ramp Pattern

Sine Pattern

Power Spectrum


Added support for missing numeric types for numeric operations

Redesigned colors for tensor wires to make them distinguishable across numeric types and dimensionality

Added batched versions of FFT and IFFT

Added spectrum shifting functionality single channel and batched FFT
â€‹â€‹
Optimizations

Greatly improved the performance of RealtoComplex (R2C and D2Z) FFTs

Optimized execution times for Complex functions

Other optimizations
â€‹Bug Fixes

Fixed automatic instance selection issue in Array Max Min polymorphic function

Fixed memory leakage issue

Fixed an error in BLAS benchmarking example

Corrected typos

Other bug fixes
â€‹
â€‹