"VIVR” (Vibrating Virtual Reality) is a research project at the SOPI research group in Aalto University.
In this project, we are implementing a Virtual Reality Musical Instrument focusing on music interaction between the performer and the 3D environment.
Our attention focuses on the 3D environment and not on a particular 3D model of an instrument because we want to exploit the immersion feature of VR. Therefore, a 3D environment can be seen as a resonating body the musician is trapped within and that excites from the inside.
Within this vision, we are focusing on fours factors that are crucial in order to relate immersion and music interaction: 3D audio, spatial interaction, the performer and the sound world.
I'm starting this blog in order to change my experiences in developing an instrument for VR and to keep track of how ideas develop over the process.
The technology we are working with is a HTC Vive setup as hardware, Unity 3d as game engine and SuperCollider as sound engine. The communication between HTC Vive and Unity 3d is done through the SteamVR SDK. The communication between Unity 3d and SuperCollider is done via OSC.
For this blog I won't post the full code as usual but rather, snippets of what might be relevant to the topic.
// 2018.04.11 The beginning of the idea
Since I started in Media Lab Helsinki I have been interested in sound and movement interaction. I'd like to graduate in this academic year so I was looking for a thesis idea. Back in 2014 or something, when I used VR for the first time I wasn't even triggered by it. Even though it is still pretty much focus in killing zombies games, I have to admit that nowadays there are some applications that bring new experiences out of it.
When using the VR set up at school, we made jokes about how disconnected we would get from reality. This immersion made me think that VR could be a perfect platform for an audiovisual installation. I thought of different ideas of how this installation could be. The one that sounded more interesting involved a sentient environment that would react audiovisually to user input. The objective would be to try to establish an audiovisual communication between the user and the environment until both would resonate on the same level.
At the same time, SOPI begun to work on a VR musical instrument. When exchanging ideas, the leader of the group, Koray Tahiroğlu, thought that it would be useful to combine forces and come up together with a VR musical instrument that would have certain degree of autonomy. In our discussions, we looked into the affordances of VR and how these could benefit music interaction. In combination with the ideas proposed by previous research on the field we came to the conclusion that 3D audio, spatial interaction, the performer and the sound world were essential factors in the relation of VR and music interaction.
// 2018.04.27 3D audio as a music feature
One of the first try outs I did with Unity and SuperCollider was simple test of sound spatialisation. I tried three different Ambisonic quarks/plugins in SuperCollider: ATK, SC-HOA, and AmbIEM. Although these three implementations have their own potential, I found that SC-HOA and ATK were the most respectful ones towards the sound source. AmbIEM seemed to filter the sound source more than the other two. Even though SC-HOA is specially good and well documented, I feel that it is taking a while for the SuperCollider developers to release it as a official plug-in. In the meantime and because I'm working on a university computer where having a non-supported software like SuperCollider is already a hassle, I have decided to use ATK.
It was during this simple test that I came across my first problem. My first attempt was to use Unity's cartesian coordinates directly. After trying to figure out why the translation of cartesian to spherical coordinates wouldn't work I realised that in spherical coordinates the axis are different. Therefore, the variables should be as follows:
Nevertheless, when working in VR we expect the user to move or at least rotate the head. Thus, we should get user's position and head rotation and apply them to the function above.
The above idea works when trying to represent external sound sources. But for VIVR we wanted the user to be inside the instrument. Thus, my approach to make the 3D audio feature a musical feature was to give the user control on the sound spatialisation. In this manner, the user would be able to freely move the sound in the 3D the space. Unity's Physics.RayCast comes really handy in this sense. Taking into consideration that almost every 3D object that one would add to the environment would have a collider, we can spatialise the audio along the surfaces of those colliders. Unity's function is as follows:
Notice that the above function refers to the Left controller. This same script should be renamed as "RayCastLocationR" and added to the other HTC Vive controller. As it can be seen, I have added the OSC messages in this excerpt. I'm using this OSC script by Thomas Fredericks in order to get and send OSC in Unity.
One useful parameter that Raycasting provides us with is the distance between the collision point and the Raycast source. This parameter can be mapped to the ambisonic's proximity feature as well as to the amplitude. This allows us to have better virtual representation of how sound sources have a different volume and are filtered depending on their distance.
The following code excerpt displays the DSP implementation of the ambisonic module as well as the updated OSC function.
Finally, we can push the potential of the ATK by using it as a processing module at the same time. Following the chain: input (n channel signal) -> encoding (n channel signal) -> decoding (W, X, Y, Z channel signal) -> process (W, X, Y, Z channel signal) -> encoding (W, X, Y, Z channel signal) -> spatial transformation (W, X, Y, Z channel signal) -> decoding (n channel signal), one can distribute the signal process spatially, thus getting an output with a vibrating deepness and sonic texture that is not possible to achieve by processing the signal before the encoding or after the decoding. In the case of VIVR, this chain is implemented by adding a delay unit before the second encoding.
At the same time, because the action is happening within an environment it's useful to add a reverb module after the ambisonic process.
// 2018.05.07 Setting up to deform the environment
The concept of VIVR is to make the users feel that they are inside the instrument. That's why the environment feels like being inside a room. Nevertheless, as soon as users get hands on VIVR they can realise that the notion of a cubed room can be completely destroyed. I set up a space of 6 surfaces (floor, ceiling and 4 side walls) using custom 3D shapes based on the cobination of two tutorials I found on the site http://catlikecoding.com/unity/tutorials/ .
In this website Jasper Flick wrote this Rounded Cube tutorial on how to make rounded cubes that can change shape according to the parameters of X, Y, Z sizes and roundness. By setting this parameters in the Unity Editor to X Size=55; Y Size=2; Z Size 55; and Roundness=2 one can get a procedural 3D shaped wall.
Using the script from the rounded cube tutorial inside the Mesh Deformation tutorial instead of the Cube Sphere proposed by the autor one can deform the walls according to user input. The core scripts of this turorial are the MeshDeformer.cs script attached to the Game Object that one wants to deform and the MeshDeformerInput.cs attached to Game Object that is intended to be the deformation source. The MeshDeformerInput.cs script has two parameters to control: the force applied to the deformed object, and a force offset to make the the deformation follow the direction of the force input. On the deformed object, the MeshDeformer.cs script has public paramters of Spring Force and Damping. Spring Force sets how big is leap of the vertices jumping back and forth. The Damping paramter sets how smoothly this bouncing happens.
In VIVR the MeshDeformerInput.cs script is attached to each HTC Vive Controller but the force comes from the FFT values of the sound sources. I'm passing the audio output of my sound sources through a FFT chain to read raw FFT values, a Loudness chain and an Amplitude tracker. The values of each module are multiplied and scaled by a forceIndex variable in order to get the total magnitude of FFT values that would be output as force. A few code examples would help to explain this better.
On the Unity side, the MeshDeformerControllerInputL.cs script receives the FFT values from OSC, sets them as force and sends them to each Game Object with a MeshDeformer.cs script attached to it. We would have to do the same for the MeshDeformerControllerInputR.cs script but chaninging the OSC receiving address to "/fft2" and the sending address to "/rayCastLocationR".
By using the FFT magnitude as the force of deformation it is possible to get a visual feedback that matches the sound output, thus making the experience more coherant and immersive. Still, there are many features of music and visual interaction that have to be taken into account and that I'll save for future entries.
// 2018.05.15 Enhancing interaction I
So far I covered basic 3D audio and mesh deformation based on movement interaction. Nevertheless, in order to make this interaction more meaningful to the users, I added several different parameter controls based on the same kind of interaction. Thus, at the same time that users distort the environment and move the sound around, they can play with reverberation and buffer position and grain size of a granular synth or freeze the whole audio through an FFT module.
If you have been reading this blog, you might have noticed that I have barely mentioned the DSP modules I am using as sound sources. This is because at this moment they are being used as a placeholder. Once we have made a whole sense of our interaction system, we would like to use the capabilities of a VR workstation to develop heavy sound sources modules.
The following code example displays the DSP modules I am currently using.
For each controller I created 3 synths, a grain synth, an FFT Freeze synth and an Ambisonics one. This way users could have indepent music control on each extremity. The granular synth is controlled in two different ways. One follows the RayCastLocation, so depending where users are pointing they can change the buffer position. In this OSC Definition I am also measuring how far is the controller from the user and how big is the velocity of this movements. This is used to control the amplitude and lag time of both the grain module and the Ambisonics module. The other mapping of the grain synth is based on the velocity of the FFT loudness measurement. The bigger the difference between the new loudness value and the old value the bigger the grain size and the trigger rate are.
Notice that in the following excerpt I have edit the OSCdef(\amp) just to focus on the loudness interaction and that I did not add the OSCdef(\rayCastLocationR) as it's the same as the left one, but changing the synths' ID's.
The FFT Freeze module required some mixing so even though it is activated just by pressing the grip of the Vive Controllers and sending a value of 1 or 0 to SuperCollider, it does affect the other two modules amplitude. When it is active, the amplitude of the FFT module rises as well as the one from the Ambisonics module, and it decreases when it isn't active. Furthermore, because the user is not only able to freeze the sound, but also the 3D meshes, it's useful to keep track of what mesh is being deformed so we can keep an audio loop in it. The freezing of the 3D meshes is simply done by turning the spring force to 0.
The following excerpt displays how this freezing happens inside unity. Once more, the same script should be adapted to the second controller.
With the values send by Unity we can keep track in SuperCollider of the meshes being frozen and by which controller. As mentioned, this is useful so we can resolve which and when a frozen should be frozen. In order to keep it clear to myself, I wrote some boolean conditions. Probably it will same not really necessary when working with just two controllers, but in future entries this logic will make more sense.
All these small additions to direct interaction between the user and the environment make the whole experience more immersive. Giving users control over parameters that they would expect to change, but that does not require an extra effort helps bringing a more intuitive and effective interaction.
In the following entry I will cover how interaction can be enhanced further by giving the environment some sort of autonomy.
// 2018.06.02 Enhancing interaction II
Giving the system some autonomy so it acts on its own can bring an upper level of connection with the user. One can set the system to act or react in different ways in order to estimulate user's activity by messuring the activity of the user. For example, when user activity diminishes the environment can propose new musical gestures. On the contrary, when the user seems engaged enough the environment can potentiate this engagement by adding new layers of sound or visual interaction.
So far in VIVR I implemented this two types of system autonomy. In this entry, I'll explain how the system reacts when it feels that the user is not engaged anymore and in the next one, I'll describe my approach to keep a higher level of engagement. In both cases it's necessary to give the environment its own synths and 3d objects.
I'll start with the SuperCollider synth. In this case, the enviornment holds a similar version of the granular synth, but with amplitude modulation because other LFO's are what are gonna give it movement/life. Nevertheless, this synth would not go through the FFT freezing module.
One can determine how active the user is at the moment by measuring the average rate of activity of both controllers. If the user is not active for a period of time that seems too long to be considered a musical silence then the environment will start acting and thus inducing the user to restart the musical activity.
In the following excerpt I calculate this controllers' magnitude average. If it is too low, then it starts a counter, if this counter gets to an X amount of time, the enviornment starts acting on its own.
Finally, in the same way that I did with the controllers, I send the total FFT magnitude values to Unity.
In Unity I created a new 3d Capsule, but without a Mesh Renderer so it becomes invisible. This object holds two scripts, a movement script and a Mesh Deformer script. Thus, the object can be active all the time and it won't be until it starts moving that it will become noticeable. The Mesh Deformer script is the same as the one used for the controllers, but chaninging the osc address. The EnviornmentCamMovement.cs is rather simple. I get the scaled amplitude values from the environmentAmpMod1 synth and set them as axis of position and rotation.
In this case, when the environment feels that the user has lost engagement it starts travelling deforming itself in a slow manner, thus the LFO's. The amplitude modulator that goes to the granular module of the environment is a Sine Oscillator, so it fades in and out in a way that makes it easy to notice. I consider this sonic gesture to induce new slow gestures from the user that will give place to a new musical part.
In the following entry I'll describe how I set new levels of interaction when the user's level of activity is high.
// 2018.07.09 Enhancing interaction III
Lately, I've been working and revising this project and I realized of some events and functions that would have worked better in a different way. This entry explains how VIVR's third level of interaction works and updates some functions that have been presented before.
Up until now, the sonic landscape of VIVR holds just two synths the user can interact with + the environment's one. This can be enough if one is just looking for an express experience users can interact with. Nevertheless, when one is looking for user's engagement, providing an accomplishment for user's effort that would extend the experience that has happened so far turns out to be effective.
In VIVR's case, I measured the level of activity of the user based on how long the users keeps a high level of acceleration in their movements. When a high level of acceleration is kept during an X amount of time, a series of 4 new polygons appear inside the environment. These polygons hold their own instance of the granular synth, with a different sample each. However, the mapping of the sound parameters of these synths is different from the controllers. Instead of reacting to Raycast hit location of the controllers they react to users position in the space. An arppegiator is set to their trigger rate and pitch scale so they can provide the user with a different form of musical interaction.
In addition, two new sub levels of interaction are attached to the user's acceleration constant. When under this parent level the user performs another constant time of high acceleration, the polygons will start rotating around the user. Moreover, when this constant rate of acceleration is bigger than X value, the polygons would not only rotate in one dimension but in 3 dimensions. Taking into consideration that the instrument uses 3D audio, these rotations not only provide a new sonic landscape because of the interaction with user's position. They also bring new transformations to the 3D sonic environment.
The following excerpt displays an updated version of the granular synth and the one that holds the LFO's that direct the polygons' movement.
The following code excerpt shows the creation of the polygons' synths, the arpeggiators patterns and how the different levels of interaction are triggered by the controller's acceleration.
On the Unity side, each polygon is an empty object that holds a RoundedCube.cs, a MeshDeformer.cs, a MeshRenderedSwitch.cs, and Polygon++"polygon index"++PosSendOSC.cs script. All of them are under a parent that holds the PolyginRotate.cs script. I have already posted the RoundedCube.cs and MeshDeformer.cs scripts, the only difference in the Polygons game objects is that they hold values of "2" for the four public variables of the RoundedCube.cs script: xSize, ySize, zSize and Roundness.
The following script shows how the mesh renderer of the polygons is activated.
The next script displays how the location of the polygons is sent via OSC. This case is a simpler script than the controllers' one because it's a constant communication.
And here is how the polygons rotate in a group. This script belongs to the parent that holds the 4 polygons.
Coming back to the SuperCollider side, the following excerpt shows how the parameters of the polygons' synths are handled. It also displays a couple of functions that set the boolean variables so the system knows if the user is pointing to the polygons or the walls. Moreover, I had to integrate some mixing functions in order to keep a balanced output level when activating the freezing module.
Finally, in order to apply the deforming force or the freezing state to the game object the controllers are pointing to, another series of boolean functions are set in Unity. The following script belongs to the left controller but the same should be apply to the right controller.
This series of entries has presented how one can enhance music development by adding adding different layers of interaction that are triggered by a series of special actions. On the next entry, I will add and update some of the dsp functions which will also make new sonic transformations possible.
// 2018.07.10 Update and new audio features
A month ago or so I was able to demo the by then current state of the project. Many people came to try it out and gave feedback of what else could be implemented. Many users agreed that two more functions that should have been implemented were a looper module and a way to change to contrasting samples on the controllers' synths.
The following excerpt displays the looper function and how it is activated when the user presses the trigger.
Fulfilling the idea of changing to a contrasting sample could be easily done by changing the buffer index by a button action. Nonetheless, in SOPI we thought that applying spatial interaction to this feature could not only bring a divergent change sound wise. Moreover, it can be extended so that these two contrasting samples of each controller can be exponentially interpolated according to user's position in the space. To do so, I created to more instances of the granular synth for each controller and routed their output to a xFader module that will interpolate between these 4 synths.
This implementation brings no changes to the unity code because we are using data that already was being sent.
Because the Unity code is divided in too many scripts, I will avoid posting all of them here. Nevertheless, the following text box contains an up-to-date version of the full SuperCollider code.
// 2018.07.11 Conclusion... for now
Specialization and categorical delineation is strong within computer music research where developments of musical practices are partitioned by their related musical technology and separation of the performer, instrument and environment. On the contrary, considering entity relationships and factoring common features in VR environments, the lines between these factors get blurred. All these actors become active agents that feedback into each other through means of musical content and interaction. In this manner, it is the relation of interactions between the environment and the user together what drives the musical content, but not any of them separately.
The notion of music interaction in VR has been the main focus of our ideas while developing VIVR. In our design process we found that 3D audio, bodily and spatial interaction as well as a relation of autonomy between the user and the environment are considerations that bring strong support to music interaction in VR. The result of this combination answers the questions of how musician's presence becomes an entity that has to work in collaboration with the environment to create music; how musician's bodily and spatial interactions are translated into musical responses; and how 3D audio becomes a new music feature for the musician.
Nevertheless, a further development of Virtual Reality Musical Instruments should be supported in order to bring this platform to concerts and performances so as to study the discrepancies between the experience of the VRMI's performers and audience.