Microsoft released a .NET Kinect interface in early 2011 to much celebration from the hacker community. It's release presented an opportunity to use the existing LabVIEW infrastructure to design a simple interface in LabVIEW using the built in .NET functionality which would allow LabVIEW users to experiment with the Microsoft Kinect. This interface was rapidly developed and demonstrated at NI Week both in the Hack the Kinect and Other Cool Sensors segment and on the Clearpath Husky in the Robotics Pavilion.
The Microsoft Kinect has both audio and video interfaces. This API only features access to the Depth and Video images (with the capability to easily include the Skeletal Tracking component in a future release). Not a full-features API, the release is intended as a starting point for developers interested in using the Kinect in applications. The library has been tested in both LabVIEW 2010 and 2011 and requires the NI Vision library to operate.
There are two primary objects of concern in the API. One is the Runtime, and the other is the ImageStream. The Runtime object provides a simple method of initializing the Kinect to output whatever streams are desired. It contains both an Initialize and Uninitialize method which must be called before and after execution respectively. You can also access the various ImageStream objects as properties of the Runtime. This, of course, limits the number of Kinects you can currently operate with the API to one.
The ImageStream objects can then be Opened with whatever settings are desired from the user. For the sake of this API, only the configuration with 320x240 Depth and 640x480 Video were tested. Other implementations are, of course, possible but have not been tested in this implementation. The framework is in place for those configurations to operate with only minor modifications.
Once the appropriate ImageStream objects are Opened, the application can register event callbacks which will be executed whenever a frame is ready. The SubVIs (which are not used in the main application) Handle Depth Image.vi and Handle Video Image.vi are used as callbacks for the DepthFrameReady and VideoFrameReady event respectively. Each time the Kinect signals that there is a new frame ready, that event is executed.
The actual processes within these .NET event callbacks are very simple. The event arguments are processed from the ImageFrame object and PlanarImage object and converted into an IMAQ image. The image is then available to the user. The current design uses a Queue as a structure to store images as they arrive. These Queues can then be managed as desired (as will be demonstrated in a later section) to keep memory usage to a minimum and allow for the smoothest camera operation.
A new Event Callback could be registered for the Skeletal information which would follow the same architecture as the other two events and would allow the user to interface with that part of the Microsoft Kinect API. The Skeletal Frame is not included in this release.
Many LabVIEW users often find it convenient to use the Abort button when operating their applications for testing purposes. It is VERY important that all .NET references are appropriately closed when accessing the Kinect. If the references aren't closed, unrecoverable errors (which would require a restart of LabVIEW) as well as crashes can happen. Please always save your applications while working with .NET references which may be leaked due to premature application termination.
Two Get functions are included with the API which allow the user to access the image frames as they become available. These frames are, as mentioned previously, being loaded into a queue which is accessible to the user. The Get Video Image.vi and Get Depth Image.vi functions allow for the simple management of those queues.
Because the availability of an image is asynchronous to the Get calls the user is making on the front end, there is a possibility of both buffer overflow and underflow problems when operating the front-end VIs. This is handled by providing the user with the Wait For Next Image? boolean input. The following is a simple explanation of the values obtained given the state of the buffer and the value of that boolean variable:
This design applies to both the Depth and Video Get functions and should be followed if the Skeletal Frame is added to avoid incongruous architectures.
The Image Is New? output provides the user with a simple boolean value which tells them if the image they have been returned is a new image. The user is also provided with a simple Flip Image? input which will flip the image to the camera normal reflection (the Kinect is naturally flipped so you view yourself as if in a mirror). This option should minimally effect performance and both of the options can be tuned (as well as the buffer size for each queue) to give the best output for an individual system.
The Demo provides a basic overview of the proper implementations of the various functions in the library as well as a performance monitor. The two images (Depth and Video) are provided to the user via IMAQ Picture Controls and the basic Flip and Wait options are presented as buttons. The New Image? displays for each queue are also given to the user as a reference (notice that when the application is set to wait for next images the New Image? indicators will always be true).
The performance measurements show iteration delay, mean delay over 100 iterations and approximate FPS. The iteration time is displayed on the graph at the bottom. The mean delay is calculated as a standard moving average.
Notice that the code is fairly well defined into three sections. The top most section is the performance measurements:
The next section down is the 3D Conversions (described in the next section):
The final (and most important) section demonstrates the program flow for using the API:
Used as part of the Clearpath Robotics Husky Robot demoed in the Robotics Pavilion at NI Week, a conversion from the Depth Image to a 3D representation of that image was devised and included as an easter egg inside the library and demo. It is disabled by default, but if the Diagram Disable Structure is removed you can access the 3D representation of the depth image.
This 3D conversion is particularly convenient for robotics applications where obstacle avoidance is desired.
A single optimization parameter is included in this function which allows for the removal of 0 values (values which are too close or too far from the Kinect to appear in the Depth Image) from the processing. This often substantially improves the performance of the 3D conversion.
A single configuration parameter is included in this function which allows for the custom application of the camera Focal Length. This value depends on the resolution of the camera and is configured correctly for the current depth image settings. If the resolution of the depth image is changed, the focal length parameter will need to be reset.
Although the API works very well, initial setup can sometimes cause issues. The following is a list of suggestions for steps to address problems with initially running the demo as well as some known issues:
There are significant expansions necessary in order to make this a fully implemented Kinect API. The following is a short list of known improvements and feature additions which could be added at a later date by any interested parties:
Example code from the Example Code Exchange in the NI Community is licensed with the MIT license.