As a component of this project, I have worked on developing an easy to use system to allow users to control various processes within the Industrial Abstraction System. As previously mentioned, the motivating goal for this development is to integrate organic input with procedural systems, in order to make environments more interactive, and to create animations with potential to present more depth and emotion. The Organic Motion Camera is the most direct way to achieve these goals, and is a major focus of this stage in project development.
Here is a video showing the use of the organic motion camera system.
Initially the system must run through a simple set up procedure in order
to correctly interpret data from the current user. This system still
needs improvement, but it is coming along with some pleasing results.
General Function
The system uses built in skeletal tracking to process hand positions for input control. Each hand has an offset point which the user can modify. The difference between the processed hand position and its respective offset point is the control data that can be most effectively used for system control. With the camera system, these offset points can be interpreted as rate of change values, where a constant distance away from the offset will continue to move the camera at a respective constant speed.
Initially these offset points were predetermined, which required the user to stand at a specific distance and have hands resting at certain points. This created many problems, as it was difficult to completely stop camera movement, and hand movement would often intersect, and thus interfeer with the data taken from the skeleton.
To solve these problems, I implemented a process in which each control point would only be active if the user points their hands forward, and inactive when their hand points upwards. When hands are in the inactive position, the offset point is set to the current position of the hand, so the control point will be the Zero Vector. When the hands are in the active position, the offset remains at the last in active point. Additionally, if a user's hand motions intersect, they can simply reset the offset points to avoid collision.
Both hands can operate independently, so a user can choose only to pan or move instead of both at once. It is also easy to stop movement, as the user needs to only put their hands up, in a natural stopping pose.
The rgb video display is used as a reference guide for the user, where red circles indicate that the hand is in the inactive position, white circles are the active offset, and the green and blue cicles are the active hand positions. This video, along with control sliders, are normally contained in a separate frame, in order to have a clean program display for animation capturing.
Data Processing
The technique used for gesture recognition is to count the samples from the depth map which are within a specified radius from the associated hand points. These values are then used to determine if the hand is pointed forward (less samples) or pointed up (more samples) based on a determined threshold.
The set up process is neccesary to establish a linear function to determine this threshold based on the current distance of each hand. First the close position is set, where the threshold value is set between the samples taken from the active and inactive positions. After the back position is set, these state thresholds along with the average distance points take, are used to determine a linear function to calculate the threshold for each hand depending on the current depth.
There is a lot of noise from the kinect sensor from both the depth and skeleton data. To mitigate unwanted motion, the data taken from each hand is stored in a matrix which contains previous values, in which the current control value is obtained by averaging the samples in this these arrays. Though this smoothing is adjustable, the more samples per matrix will generate latency with respect to response time.
The amount of noise from the kinect sensor can still be problematic, as the control points occasionally jump, affecting not only the current motion of the camera, but also the current number of samples taken from the depth map to determine the state of each hand. For example, when in motion, sometimes the system takes additional samples which exeed the given threshold, which resets the offset point and abruptly stops the camera motion. Simple solutions to mitigate this false recognition of an inactive state are to increase the state threshold, or to store Boolean values of the current state in a sample array, where the state will only change if the array contains the same (entirely true or false) values.
Evaluation
This system is likely to be the corner stone of this project in the context of Computer Graphics Development. I plan to write a technical paper focusing on user interaction with this system through a virtual environment.
I ran a sample user evaluation to determine problems with the current system, as well as to receive feed back for added functionality and ease of use. General feed back was positive, though I definitely need to work on ease of use as well as implementing features to assist navigation as well as hand signals to reset the position of the camera if the user becomes lost.
For the tests, each user went through the set up process, and then was allowed to use the system freely in order to feel comfortable with the controls. After they indicated they were ready, they then would run through a sequence of objectives requiring them to move to various positions on a map, look at certain targets, as well as demonstrate control of the zoom in and out functionality.
Though this system takes much more concentration and is far more challenging to use than a mouse/keyboard camera, I found users to be more engaged with the virtual environment, as there is no physical hardware interaction between them and their actions. The camera motions are simple enough that users should eventually be able to use the system somewhat subconsciously, being in a mindset of engagement where their thoughts are more directly translated into action.
Mouse/keyboard control is more intuitive and comfortable to use, and users would be able to complete the tasks more easily and efficiently than with the Kinect system. Having such imediate control seems to give users the ability to explore environments much more quickly, with less time to absorb the present information. The Organic Motion Camera system is more challenging and requires more careful concentration, but I believe its function creates a unique user experience, as well as engaging the user more deeply with the given environment. I believe that successful use of the Kinect System for control requires more direct willful suspension of disbelief, and thus engages the user more directly with the given virtual environment. This essentially is a major goal of this project that successfully integrates the motivating components of technical development, artistic interaction, and environmental/industrial education.
This assumption will likely be the focus of the system paper in which I will be writing- The next phase of testing will try to prove that use of the kinect system is more engaging, and that users absorb more data from the current environment through its use.