They asked people to conduct however they wanted, to build a data set. Focus on the relationship between motion and loudness.
25 subjects conducting along to a recording. Used kinect to sample data. Used libxtract to measure loudness in the recordings.
Users listen to the recording twice and then conduct it 3 times
Got joint descriptors; velocity, acceleration and jerk; distance to torso.
Got general descriptors about quality of motion, maximum hand height.
they looked for descriptors highly correlated to loudness. they found none. some participants said they didn't follow dynamics. 8 subjects were removed.
Some users used hand height for loudness, others used larger gestures. they separated users into two groups.
They have been able to find tendencies across users. However,a general model may not be the right approach.
How do they choose users?
People with no musical training were in the group that raised hand height.