// 2018.04.11
"VIVR” (Vibrating Virtual Reality) is a research project at the
SOPI research group in Aalto University.
In this project, we are implementing a Virtual Reality Musical Instrument focusing on music interaction
between the
performer and the 3D environment.
Our attention focuses on the 3D environment and not on a particular 3D model of an instrument because we want
to exploit
the immersion feature of VR. Therefore, a 3D environment can be seen as a resonating body the musician is trapped
within
and that excites from the inside.
Within this vision, we are focusing on fours factors that are crucial in order to relate immersion and music
interaction:
3D audio, spatial interaction, the performer and the sonic world.
I'm starting this blog in order to change my experiences in developing an instrument for VR and to keep track
of how
ideas develop over the process.
The technology we are working with is a HTC Vive setup as hardware, Unity 3d as game engine and SuperCollider
as sound
engine. The communication between HTC Vive and Unity 3d is done through the SteamVR SDK. The communication between
Unity
3d and SuperCollider is done via OSC.
For this blog I will post code snippets of what might be relevant to the topic.
// 2018.04.11 The beginning of the idea
Since I started in Media Lab Helsinki I have been interested in sound and movement interaction. I'd like to
graduate
in this academic year so I was looking for a thesis idea. Back in 2014 or something, when I used VR for the first
time
I wasn't even triggered by it. Even though it is still pretty much focus in killing zombies games, I have to admit
that
nowadays there are some applications that bring new experiences out of it.
When using the VR set up at school, we made jokes about how disconnected we would get from reality. This
immersion made
me think that VR could be a perfect platform for an audiovisual installation. I thought of different ideas of how
this
installation could be. The one that sounded more interesting involved a sentient environment that would react
audiovisually
to user input. The objective would be to try to establish an audiovisual communication between the user and the
environment
until both would resonate on the same level.
At the same time, SOPI begun to work on a VR musical instrument. When exchanging ideas, the leader of the
group, Koray
Tahiroğlu, thought that it would be useful to combine forces and come up together with a VR musical instrument that
would
have certain degree of autonomy. In our discussions, we looked into the affordances of VR and how these could
benefit
music interaction. In combination with the ideas proposed by previous research on the field we came to the
conclusion
that 3D audio, spatial interaction, the performer and the sound world were essential factors in the relation of VR
and
music interaction.
// 2018.04.27 3D audio as a music feature
One of the first try outs I did with Unity and SuperCollider was simple test of sound spatialisation. I tried
three
different Ambisonic quarks/plugins in SuperCollider:
ATK,
SC-HOA, and
AmbIEM. Although these three
implementations have their own potential, I found that SC-HOA and ATK were the most
respectful ones towards the sound source. AmbIEM seemed to filter the sound source more than the other two. Even
though
SC-HOA is specially good and well documented, I feel that it is taking a while for the SuperCollider developers to
release
it as a official plug-in. In the meantime and because I'm working on a university computer where having a
non-supported
software like SuperCollider is already a hassle, I have decided to use ATK.
It was during this simple test that I came across my first problem. My first attempt was to use Unity's
cartesian coordinates
directly. After trying to figure out why the translation of cartesian to spherical coordinates wouldn't work I
realised
that in spherical coordinates the axis are different. Therefore, the variables should be as follows:
Nevertheless, when working in VR we expect the user to move or at least rotate the head. Thus, we should get
user's
position and head rotation and apply them to the function above.
The above idea works when trying to represent external sound sources. But for VIVR we wanted the user to be
inside the
instrument. Thus, my approach to make the 3D audio feature a musical feature was to give the user control on the
sound
spatialisation. In this manner, the user would be able to freely move the sound in the 3D the space. Unity's
Physics.RayCast
comes really handy in this sense. Taking into consideration that almost every 3D object that one would add to the
environment
would have a collider, we can spatialise the audio along the surfaces of those colliders. Unity's function is as
follows:
Notice that the above function refers to the Left controller. This same script should be renamed as
"RayCastLocationR"
and added to the other HTC Vive controller. As it can be seen, I have added the OSC messages in this excerpt. I'm
using
this
OSC script by Thomas
Fredericks in order to get and send OSC in Unity.
One useful parameter that Raycasting provides us with is the distance between the collision point and the
Raycast source.
This parameter can be mapped to the ambisonic's proximity feature as well as to the amplitude. This allows us to
have
better virtual representation of how sound sources have a different volume and are filtered depending on their
distance.
The following code excerpt displays the DSP implementation of the ambisonic module as well as the updated OSC
function.
Finally, we can push the potential of the ATK by using it as a processing module at the same time. Following
the chain:
input (n channel signal) -> encoding (n channel signal) -> decoding (W, X, Y, Z channel signal) -> process (W, X,
Y,
Z channel signal) -> encoding (W, X, Y, Z channel signal) -> spatial transformation (W, X, Y, Z channel signal) ->
decoding
(n channel signal), one can distribute the signal process spatially, thus getting an output with a vibrating
deepness
and sonic texture that is not possible to achieve by processing the signal before the encoding or after the
decoding.
In the case of VIVR, this chain is implemented by adding a delay unit before the second encoding.
At the same time, because the action is happening within an environment it's useful to add a reverb module
after the
ambisonic process.
// 2018.05.07 Setting up to deform the environment
The concept of VIVR is to make the users feel that they are inside the instrument. That's why the environment
feels
like being inside a room. Nevertheless, as soon as users get hands on VIVR they can realise that the notion of a
cubed
room can be completely destroyed. I set up a space of 6 surfaces (floor, ceiling and 4 side walls) using custom 3D
shapes
based on the cobination of two tutorials I found on the site
http://catlikecoding.com/unity/tutorials/ .
In this website
Jasper Flick wrote this Rounded Cube tutorial on how to make
rounded cubes that can change shape according to the parameters of X, Y, Z sizes and roundness. By setting this parameters in the Unity Editor to X Size=55; Y Size=2; Z Size 55; and Roundness=2 one can get a procedural 3D shaped wall.
Using the script from the rounded cube tutorial inside the
Mesh Deformation tutorial instead of the Cube Sphere proposed by the autor one can deform the walls according to user input. The core
scripts of this turorial are the MeshDeformer.cs script attached to the Game Object that one wants to deform and the MeshDeformerInput.cs attached to Game Object that is intended to be the deformation source. The MeshDeformerInput.cs
script has two parameters to control: the force applied to the deformed object, and a force offset to make the the deformation follow the direction of the force input. On the deformed object, the MeshDeformer.cs script has public
paramters of Spring Force and Damping. Spring Force sets how big is leap of the vertices jumping back and forth. The Damping paramter sets how smoothly this bouncing happens.
In
VIVR the MeshDeformerInput.cs script is attached to each HTC Vive Controller but the force comes from the FFT
values
of the sound sources. I'm passing the audio output of my sound sources through a FFT chain to read raw FFT
values, a
Loudness chain and an Amplitude tracker. The values of each module are multiplied and scaled by a forceIndex
variable
in order to get the total magnitude of FFT values that would be output as force. A few code examples would help
to explain
this better.
On the Unity side, the MeshDeformerControllerInputL.cs script receives the FFT values from OSC, sets them as
force and
sends them to each Game Object with a MeshDeformer.cs script attached to it. We would have to do the same for the
MeshDeformerControllerInputR.cs
script but chaninging the OSC receiving address to "/fft2" and the sending address to "/rayCastLocationR".
By using the FFT magnitude as the force of deformation it is possible to get a visual feedback that matches
the sound
output, thus making the experience more coherant and immersive. Still, there are many features of music and visual
interaction
that have to be taken into account and that I'll save for future entries.
// 2018.05.15 Enhancing interaction I
So far I covered basic 3D audio and mesh deformation based on movement interaction. Nevertheless, in order to
make this
interaction more meaningful to the users, I added several different parameter controls based on the same kind of
interaction.
Thus, at the same time that users distort the environment and move the sound around, they can play with
reverberation
and buffer position and grain size of a granular synth or freeze the whole audio through an FFT module.
If you have been reading this blog, you might have noticed that I have barely mentioned the DSP modules I am
using as
sound sources. This is because at this moment they are being used as a placeholder. Once we have made a whole sense
of
our interaction system, we would like to use the capabilities of a VR workstation to develop heavy sound sources
modules.
The following code example displays the DSP modules I am currently using.
For each controller I created 3 synths, a grain synth, an FFT Freeze synth and an Ambisonics one. This way
users could
have indepent music control on each extremity. The granular synth is controlled in two different ways. One follows
the
RayCastLocation, so depending where users are pointing they can change the buffer position. In this OSC Definition
I
am also measuring how far is the controller from the user and how big is the velocity of this movements. This is
used
to control the amplitude and lag time of both the grain module and the Ambisonics module. The other mapping of the
grain
synth is based on the velocity of the FFT loudness measurement. The bigger the difference between the new loudness
value
and the old value the bigger the grain size and the trigger rate are.
Notice that in the following excerpt I have edit the OSCdef(\amp) just to focus on the loudness interaction and
that
I did not add the OSCdef(\rayCastLocationR) as it's the same as the left one, but changing the synths' ID's.
The FFT Freeze module required some mixing so even though it is activated just by pressing the grip of the
Vive Controllers
and sending a value of 1 or 0 to SuperCollider, it does affect the other two modules amplitude. When it is active,
the
amplitude of the FFT module rises as well as the one from the Ambisonics module, and it decreases when it isn't
active.
Furthermore, because the user is not only able to freeze the sound, but also the 3D meshes, it's useful to keep
track
of what mesh is being deformed so we can keep an audio loop in it. The freezing of the 3D meshes is simply done by
turning
the spring force to 0.
The following excerpt displays how this freezing happens inside unity. Once more, the same script should be
adapted
to the second controller.
With the values send by Unity we can keep track in SuperCollider of the meshes being frozen and by which
controller.
As mentioned, this is useful so we can resolve which and when a frozen should be frozen. In order to keep it clear
to
myself, I wrote some boolean conditions. Probably it will same not really necessary when working with just two
controllers,
but in future entries this logic will make more sense.
All these small additions to direct interaction between the user and the environment make the whole experience
more
immersive. Giving users control over parameters that they would expect to change, but that does not require an
extra
effort helps bringing a more intuitive and effective interaction.
In the following entry I will cover how interaction can be enhanced further by giving the environment some
sort of autonomy.
// 2018.06.02 Enhancing interaction II
Giving the system some autonomy so it acts on its own can bring an upper level of connection with the user.
One can
set the system to act or react in different ways in order to estimulate user's activity by messuring the activity
of
the user. For example, when user activity diminishes the environment can propose new musical gestures. On the
contrary,
when the user seems engaged enough the environment can potentiate this engagement by adding new layers of sound or
visual
interaction.
So far in VIVR I implemented this two types of system autonomy. In this entry, I'll explain how the system
reacts when
it feels that the user is not engaged anymore and in the next one, I'll describe my approach to keep a higher level
of
engagement. In both cases it's necessary to give the environment its own synths and 3d objects.
I'll start with the SuperCollider synth. In this case, the enviornment holds a similar version of the granular
synth,
but with amplitude modulation because other LFO's are what are gonna give it movement/life. Nevertheless, this
synth
would not go through the FFT freezing module.
One can determine how active the user is at the moment by measuring the average rate of activity of both
controllers.
If the user is not active for a period of time that seems too long to be considered a musical silence then the
environment
will start acting and thus inducing the user to restart the musical activity.
In the following excerpt I calculate this controllers' magnitude average. If it is too low, then it starts a
counter,
if this counter gets to an X amount of time, the enviornment starts acting on its own.
Finally, in the same way that I did with the controllers, I send the total FFT magnitude values to Unity.
In Unity I created a new 3d Capsule, but without a Mesh Renderer so it becomes invisible. This object holds
two scripts,
a movement script and a Mesh Deformer script. Thus, the object can be active all the time and it won't be until it
starts
moving that it will become noticeable. The Mesh Deformer script is the same as the one used for the controllers,
but
chaninging the osc address. The EnviornmentCamMovement.cs is rather simple. I get the scaled amplitude values from
the
environmentAmpMod1 synth and set them as axis of position and rotation.
In this case, when the environment feels that the user has lost engagement it starts travelling deforming
itself in
a slow manner, thus the LFO's. The amplitude modulator that goes to the granular module of the environment is a
Sine
Oscillator, so it fades in and out in a way that makes it easy to notice. I consider this sonic gesture to induce
new
slow gestures from the user that will give place to a new musical part.
In the following entry I'll describe how I set new levels of interaction when the user's level of activity is
high.
// 2018.07.09 Enhancing interaction III
Lately, I've been working and revising this project and I realized of some events and functions that would
have worked
better in a different way. This entry explains how VIVR's third level of interaction works and updates some
functions
that have been presented before.
Up until now, the sonic landscape of VIVR holds just two synths the user can interact with + the environment's
one.
This can be enough if one is just looking for an express experience users can interact with. Nevertheless, when one
is
looking for user's engagement, providing an accomplishment for user's effort that would extend the experience that
has
happened so far turns out to be effective.
In VIVR's case, I measured the level of activity of the user based on how long the users keeps a high level of
acceleration
in their movements. When a high level of acceleration is kept during an X amount of time, a series of 4 new
polygons
appear inside the environment. These polygons hold their own instance of the granular synth, with a different
sample
each. However, the mapping of the sound parameters of these synths is different from the controllers. Instead of
reacting
to Raycast hit location of the controllers they react to users position in the space. An arppegiator is set to
their
trigger rate and pitch scale so they can provide the user with a different form of musical interaction.
In addition, two new sub levels of interaction are attached to the user's acceleration constant. When under
this parent
level the user performs another constant time of high acceleration, the polygons will start rotating around the
user.
Moreover, when this constant rate of acceleration is bigger than X value, the polygons would not only rotate in one
dimension
but in 3 dimensions. Taking into consideration that the instrument uses 3D audio, these rotations not only provide
a
new sonic landscape because of the interaction with user's position. They also bring new transformations to the 3D
sonic
environment.
The following excerpt displays an updated version of the granular synth and the one that holds the LFO's that
direct
the polygons' movement.
The following code excerpt shows the creation of the polygons' synths, the arpeggiators patterns and how the
different
levels of interaction are triggered by the controller's acceleration.
On the Unity side, each polygon is an empty object that holds a RoundedCube.cs, a MeshDeformer.cs, a
MeshRenderedSwitch.cs,
and Polygon++"polygon index"++PosSendOSC.cs script. All of them are under a parent that holds the PolyginRotate.cs
script.
I have already posted the RoundedCube.cs and MeshDeformer.cs scripts, the only difference in the Polygons game
objects
is that they hold values of "2" for the four public variables of the RoundedCube.cs script: xSize, ySize, zSize and
Roundness.
The following script shows how the mesh renderer of the polygons is activated.
The next script displays how the location of the polygons is sent via OSC. This case is a simpler script than
the controllers'
one because it's a constant communication.
And here is how the polygons rotate in a group. This script belongs to the parent that holds the 4 polygons.
Coming back to the SuperCollider side, the following excerpt shows how the parameters of the polygons' synths
are handled.
It also displays a couple of functions that set the boolean variables so the system knows if the user is pointing
to
the polygons or the walls. Moreover, I had to integrate some mixing functions in order to keep a balanced output
level
when activating the freezing module.
Finally, in order to apply the deforming force or the freezing state to the game object the controllers are
pointing
to, another series of boolean functions are set in Unity. The following script belongs to the left controller but
the
same should be apply to the right controller.
This series of entries has presented how one can enhance music development by adding adding different layers
of interaction
that are triggered by a series of special actions. On the next entry, I will add and update some of the dsp
functions
which will also make new sonic transformations possible.
// 2018.07.10 Update and new audio features
A month ago or so I was able to demo the by then current state of the project. Many people came to try it out
and gave
feedback of what else could be implemented. Many users agreed that two more functions that should have been
implemented
were a looper module and a way to change to contrasting samples on the controllers' synths.
The following excerpt displays the looper function and how it is activated when the user presses the trigger.
Fulfilling the idea of changing to a contrasting sample could be easily done by changing the buffer index by a
button
action. Nonetheless, in SOPI we thought that applying spatial interaction to this feature could not only bring a
divergent
change sound wise. Moreover, it can be extended so that these two contrasting samples of each controller can be
exponentially
interpolated according to user's position in the space. To do so, I created to more instances of the granular synth
for
each controller and routed their output to a xFader module that will interpolate between these 4 synths.
This implementation brings no changes to the unity code because we are using data that already was being sent.
Because the Unity code is divided in too many scripts, I will avoid posting all of them here. Nevertheless,
the following
text box contains an up-to-date version of the full SuperCollider code.
// 2018.07.11 Conclusion... for now
Specialization and categorical delineation is strong within computer music research where developments of
musical practices
are partitioned by their related musical technology and separation of the performer, instrument and environment. On
the
contrary, considering entity relationships and factoring common features in VR environments, the lines between
these
factors get blurred. All these actors become active agents that feedback into each other through means of musical
content
and interaction. In this manner, it is the relation of interactions between the environment and the user together
what
drives the musical content, but not any of them separately.
The notion of music interaction in VR has been the main focus of our ideas while developing VIVR. In our
design process
we found that 3D audio, bodily and spatial interaction as well as a relation of autonomy between the user and the
environment
are considerations that bring strong support to music interaction in VR. The result of this combination answers the
questions
of how musician's presence becomes an entity that has to work in collaboration with the environment to create
music;
how musician's bodily and spatial interactions are translated into musical responses; and how 3D audio becomes a
new
music feature for the musician.
Nevertheless, a further development of Virtual Reality Musical Instruments should be supported in order to
bring this
platform to concerts and performances so as to study the discrepancies between the experience of the VRMI's
performers
and audience.