Sounding the Net: Interview with Jesse Gilbert

Peter Traub:In InteraXis, you work with distributed performers in a manner seemingly similar to previous works such as Finding Time. Is InteraXis a refinement of the technologies and aesthetic ideas you were working on in previous works, and if so, how does it refine those ideas?

Jesse Gilbert: I'd say the interaXis represents the culmination of a line of inquiry into distributed performance techniques that I have been investigating since 1997. Finding Time did have similar elements in terms of its use of a distributed graphical score, but structurally it was quite different. While Finding Time was primarily a work designed for an on-line audience, interaXis privileged the audience perspective in the performance space while still trying to provide an on-line experience of some kind. In the latter case, real-time streaming was employed as part of a broader technological strategy aimed at enabling an interaction between two ensembles in two performance spaces; the former was primarily concerned with mixing disparate streams together and providing a structural framework that allowed us to constitute an ensemble on multiple continents.

I think that interaXis reflects an increasing desire on my part to move away from looking to Internet streaming as a replacement for traditional broadcast media (which presently it does quite poorly) and moving more towards a notion of constructing site-specific installations that are network enabled to create interactions between people in those spaces that are accessible to a live audience. Not only do I feel that the work can be presented more fully in a designed environment (as opposed to the undefined at-home experience where there may not even be dedicated speakers to listen through) but that the audience for on-site events is potentially much greater. I am still quite interested in the network, but less for what it might represent than for what it can actually do.

interaXis was a milestone for me in that it synthesized the work that I had been pursuing in developing a graphical language for improvising musicians with an entirely different stream of work - the use of audio spatialization as a means of providing the off-site musicians with a mobile "presence" in the performance space. The latter concept had been developed in the Transient series of netcast performances I organized with Tim Boykett of the Austrian collective Time's Up, who had been experimenting at that time with spatial processing as a strategy of presentation for on-line works. This move away from a proscenium or stereo pair model towards a multi-channel, dynamic and algorithmically controlled model is something that was developed further in interaXis, and continues to occupy me today.

    PT: I'm wondering if you could discuss the spatialization aspect a little bit more? In a situation with multiple performers at remote locations being broadcast into a performance space, how is their location in the spatialized audio considered? Also, I'm curious if you're familiar at all with Chris Chafe's work with the SoundWire group (http://ccrma.stanford.edu/groups/soundwire/) at CCRMA. They wrote a paper a while back on (if I understand it correctly) using net latencies between performance spaces to create a common reverb between the spaces.

    JG: I am not familiar with the SoundWire project but at first glance it seems quite interesting, although conceptually a bit parallel to what we were doing in interaXis.

    The spatialization of the remote performers was not designed as a means to recreate their physical presence (i.e. mapping their positions on the stage to specific spatial positions in the virtual sound field) but rather as a way to create a dynamic sound field. I don't think I mentioned that the on-site musicians were being amplified through a distinct sound system that was intentionally static - their audio did not move at all. So while local sound was fixed in identifiable positions in the room, the remote audio was being moved through a spatial processing system that essentially can divide a room into zones and move audio according to generalized rules that are evaluated by the software at run-time. These were fairly simple -- crossfades executed within a certain time range (i.e. speed of movement), variable moments of stasis, rules for determining the next spatial target (i.e. location). Through the course of the piece I could recall various presets that would produce variable behaviors that I thought would work with the kinds of ensemble configurations that were being indicated by the score (i.e. a duet section vs. the entire ensemble playing together).

    I also did not mention that an effort was made to mirror the positioning of the speakers at Engine27 in the concert hall at CalArts. Although this was not entirely possible it did mean that the spatial system was consistent between the spaces, so that a sound in speaker A would appear stage left in both spaces.

    As I wrote earlier, I am less interested in trying to reproduce a wholly "accurate" reproduction of sonic reality between the sites than I am in using spatial processing to indicate the impact of the network on a visceral level to an audience. I find spatial systems intriguing as a means of "unbalancing" a listeners ears, taking them out of the comfort zone of familiar ways of hearing. For many reasons this seems to me a good parallel to the experience of performing across the network, where the temporal dislocations and the non-presence of the body force performers to reach for each other across data space. This is not always comfortable for a performer that is used to a certain level of reinforcement or feedback from their collaborating musicians, but it can produce very interesting results.

PT: In terms of the streaming aspect of InteraXis, were you using the common Internet, or Internet 2? I ask because Internet 2 seems to be a popular research point for people working on low-latency audio and video collaboration and performance. Are you interested in low-latency collaboration and working over a medium such as Internet 2 which is still unavailable to the general public? If so, why? (and if not, why not?)

JG: Were were using the common Internet because Internet 2 was not available at any of the sites that we were presenting the piece in. I have not worked with Internet 2 networks and frankly low-latency is not a critical issue to me. No matter how low it can be taken a network interaction can never be said to be real-time in the way that a traditional concert can, and frankly I'm not sure that's the point. For a performer or audience member, the moment of the performance is the real-time moment, and latent events become creative components that can be layered with the live. I think accepting that working on the network involves the representation of asynchronous moments in time is much more interesting than trying to obsessively re-create traditional modes or methods of performance.

I don't have a moral objection to the use of Internet 2 technology because it is not publicly available - I think research in the field is important and for those who are interested in using this new network it is a valuable resource. The private aspect likely also increases the reliability of the streams, which is something that I think is critical in developing a diverse set of practices for streaming performance.

    PT: A lot of people working with sound over the net are concerned with ideas of audience and composer and playing with those lines. Works such as Chris Brown's 'Eternal Network Music', Jason Freeman's 'N.A.G. (Network Auralization for Gnutella)', and Max Neuhaus's 'Auracle' (all of these being works that are also discussed in the article) play with notions of collaboration amongst end users and not necessarily performers or people with musical skills. This doesn't really seem to be a focus in your work (this not a criticism of course, just an observation of difference), and I'm curious if such issues are important to you? My impression of your work is that it is, to some degree, about using technology and the net to open up the spaces and possibilities available to performers, as opposed to bringing the untrained users into the musical world through the interactive possibilities that the net offers. Would you agree with this observation? And if this is the case, I'm curious if you could discuss your thinking about performance, the net, and the encompassing scope of your work? (I hope that is not too general :-) )

    JG: This is a big question, but I think it's accurate to say that I am less concerned in my own work with end users, though I know and admire all of the pieces that you mention. I think there is a place for a hands-on experience of networked art and I'm sure that the audience gets something out of that which my work will never provide them.

    Although I don't think it's been a conscious choice to move away from that kind of work, but I do come out of a concert music background and therefore have an admiration for performers and their specific talents. I am often driven by a desire to hear the musical result of certain combinations of players, and to create environments that are conducive to a mutual exploration. While I'm interested in the results of the experiments of casual users as an indication of a given system's usefulness or creative potential, I also think there is a place for the artist here. I say that with a specific agenda in mind -- that I think in general artists in this culture are valued less and less. And while I think there is a natural creative spark and impulse in all of us, I think the category of artist is still valid, as an indication of a kind of internal commitment and dedication, a recognition of sacrifice and a willingness to proceed in spite of it. So I believe that systems built for artists to work within can reveal other aspects of their utility, and also point to deeper meanings or stir emotional reactions that I have to admit I seldom feel when experiencing work made by a casual user.

PT: From reading the overview, it seems that part of your compositional interest in InteraXis is not to overcome the network latency issue by using a faster network, but rather to incorporate the latency and network artifacts into the body of the improvisation. How did the visual score system, Ankhrasmation, work to incorporate these factors into the piece? (Also, do you have any examples of the visual score? Graphics and pictures would be greatly appreciated if you have them).

JG: Your impression is correct - we were definitely interested in the creative use of the latency between the two ensembles. Ultimately this is something that impacts the performers most, and is not generally perceived directly by an audience.

While I can't speak directly about Ankhrasmation as a system (it was developed by Wadada Leo Smith long before the interaXis project) I can tell you that the visual score was essentially synchronous in both locations, while the streams were delayed by the typical buffering schemes employed by the encoder/server/client architecture of that transmission. Unfortunately there is not current information on the website about the second performance of interaXis, in which the visual score was created by my colleague Carole Kim, a wonderfully creative video artist here in Los Angeles. This was a far more realized performance on many levels - it was presented at Engine27 in New York City, which has a fantastic array of speakers in a system built for audio spatialization at a very high level, and at CalArts as before.

The score was in this case projected as a live video mix in both sites, controlled via a networked application that I designed for Carole. Because control data was sent over very low latency UDP datagrams, the score was essentially synchronous in the two performance spaces, while the audio broadcasts were delayed substantially. This meant that sectional transitions took place over extended time periods, and that the local audio mix in each space became critical as a way to shift focus and move the audience through what I sometimes think of as a "folded" time structure. It also meant that Carole's materials were limited in the sense the images had to be consistent enough in their general identity to be recognized by the performers as specific sectional cues, but also be variable enough to be interesting as aesthetic objects to the audience.

    PT: You mention a "folded" time structure above, and earlier you mention low latency not really being an issue for you and also the creative use of latency in interaXis. There are clearly many aspects of temporality that one must deal with in creating a work that uses the internet or exists on the internet. In a sense, the irregularity that one can encounter with streaming audio encourages composers to think on different time scales and to focus on different aspects in their performance and music since perfect synchronization is not currently possible. How do you think the temporal (and temperamental?) nature of the net has influenced your compositional thought? Has your thinking about a work's form and structure changed over the years as you have become more familiar and/or comfortable with the temporal idiosynchrasies of streaming media?

    JG: There's definitely more of a comfort level that comes with having worked within this medium since 1997. I think that the biggest difference has to be the approach towards working with, rather than trying to overcome, the temporal dislocations that working on the net imposes. A piece like Conformed_Bits could not have been conceived without this kind of approach. And while I think that there are many working today with this positive approach I don't think it's been embraced or even really understood by the majority of artists who are working on the network now.

    An engagement with the idea of temporal dislocation has led me to think further about what I'd call the material conditions of the network, and the extent to which they shape all network based interactions. Part of this is, as I've said, thinking more clearly about how to present such work to an audience, and the possibilities around presentation, or the sonification of an otherwise invisible process. There are ways in which streaming media is dominated by older concepts around broadcast media (i.e. television or radio) and its success is generally judged by how closely it can replicate the user experience of these forms. My own thinking has grown through the recognition that streaming media can be used as part of a multi-faceted strategy to create real-time environments that create an experience that I think will be interesting to myself, the performers, and hopefully an audience as well. Moving beyond traditional models, but also recognizing that an audience needs familiar guideposts - this is the challenge of working in this field. So part of my practice is to think about how such work can be better understood, or at least how to create situations that audiences intuitively can take in without fully understanding the technical work that makes the piece possible. Early on we were all seduced by the technical possibilities of the medium - now I find myself much more interested in what the technology enables for live performers.

    I think also the idea of streaming media as a destructive force -- or, perhaps better put, the transformative impact of streaming technologies -- is an interesting area to explore. As digital technologies become ever more pervasive I think it's an issue we need to address. Streaming creates temporal dislocation because of its general and very flexible architecture - there is a tradeoff inherent in this approach, just as there are implications of traditional broadcast media (i.e. the geographic limitations of broadcast television). As a composer, working within this field allows one to create pieces that can encompass multiple moments in time and composite them into a present that is both synthetic and somehow quite natural to us now. Indeed, all network media contain elements of synchronous and asynchronous information, read by the end user as a composite. So perhaps the best way to say this is that by working with streaming structures we can perhaps create work that is linked more closely with the experience of the digital age - the live event becomes the interface through which the user engages this new reality, just as the graphical interface of a modern OS is the interface to the vast variety of data contained both within and without the local context. It's perhaps an obvious statement that streaming itself involves transformation and resynthesis, but as I've explored this I realize it's a key issue in asking questions about what it means to be interacting via this new mediated landscape. As our communication becomes increasingly networked, what is the role of the imagination in recontextualizing information? Where does technology leave off, and where does the mind fill in the gaps? These issues have become more central to my overall practice, and I'm sure that my thinking about them has been shaped extensively through my work with streaming.

PT: What software did you use for streaming the audio in InteraXis? How much of it was out-of-the-box and how much was custom written? Do you have particularly strong feelings with respect to using out-of-the-box software for such applications versus writing your own custom packages?

JG: The audio/video strreaming was all done via Real. The video feed (distinct from the score discussed above) was a live camera feed of the performers, also projected in the remote site. We used this to give the performers a physical presence in the space beyond just their sounds, and to give the performers a way to make "eye contact" -- delayed significantly, of course.

There is another significant data stream at work in interaXis: control data used to position the remote audio stream in the performance space via a custom-designed spatial processing application. I have been working on such programs for years now, and extended this to include a network component that allowed me to mirror the spatial process in both performance spaces.

So, in the second interaXis performance both Carole and I were in New York, both using custom designed applications and sending control data to the Los Angeles site. In her case the generated control data performed live video compositing of pre-recorded media elements that were present at both locations. In my case the data was used to control spatial processing that would determine the placement of the remote audio stream (i.e. the sounds from LA as they appeared in New York) at any given moment. In this case the position of the audio channels was consistent and synchronous in the two sites, but the content of the audio streams different.

    PT: This is interesting. If the content of the streams at the NY and LA sites was different, why was it important to have the spatial placement the same? This idea reminds me a bit of Cage's 'Imaginary Landscape No. IV', where the radios all have specific tunings and volumes as specified in the score, but the content of channels would vary by performance. Curious if you agree with that comparison? Did you have an underlying structure or score for the spatialization process?

    JG: Yes, I tihnk there is a parallel. There was a general structure for the spatial processing that was developed through the rehearsals of the piece. As I said above, this was often designed in relation to the ensemble configurations at a given stage of the performance.

    I think that conceptually the reason behind wanting to mirror the spatial placement was to create a consistent physical landscape for the performers to engage in. The blend of synchronous (score, spatial position) with asynchronous or contrasting elements (audio and video content) also allows the performers a channel through which they can envision the remote site, and provides a kind of sympathetic structure that I believe has a lot of potential. The fact that all of the performers in this piece are fantastic improvisers certainly helps in this respect - they are well versed in reacting and responding on a non-verbal level to their environments. By giving some solidity to their perceptions through the use of synchronous and mirrored elements it was my hope to make the interaction a bit less alienating for them.

All of the custom software was designed using the program Max/MSP/Jitter. My feelings about custom development tend to be practical -- if I need to do something that commercial software cannot do then I enjoy being able to create what I need. If commercial programs suffice, which they rarely do, then I use them. In other words, I'm not dogmatically opposed to using them if they work... I am generally focused on network software as a means of enabling an interaction, and if everything works as it should then it becomes less prominent. Of course I do recognize the impact of the technologies that are used in shaping our notion of what's possibile in a given piece, and so I usually do develop custom software in such projects. Often it is the only way to accomplish the specific vision of a piece, and it is a skill that I am glad to have.

PT: In Conformed_Bits, you describe a feedback process that degrades the video and audio. Could you elaborate on how the feedback system works with respect to the network? How do you think about the network in compositional terms (i.e., a filter, a data source, semi-chaotic number generator, artifact generator, etc.)?

JG: Conformed_Bits, also primarily developed with Carole Kim, employs a feedback structure in which we broadcast from and to ourselves. In other words, the input of the encoder is the output of the player that is receiving the stream that was sent out by the encoder at some point in the past. Usually this is a constant number of seconds, between 10 and 15.

Here we are using the network in a similar manner as analog tape loop compositions, with some significant differences. By exploiting the buffering structure of streaming we are essentially performing the digital equivalent of positioning the read and write heads of the tape machine. However, since all streaming formats employ some form of digital data compression, I tend to think of streaming feedback as an interplay between this variable time space (network) and repeated filtering and transformation. The agent in the latter is the modern codec, which is actually a complex set of algorithms created with a distinct aim: to achieve the best possible likeness of the original media at a fraction of its original bitrate. In real-time streaming this is a two-step process -- analog to digital conversion, and then compression and formatting for streaming transport. The compression is "lossy" - meaning that details of the original signal are lost. The precise details lost are selected according to the rules defined in the codec, and these can be quite complex depending on the source material.

The interesting aspect to me as a composer is that a codec represents a culmination of research in psycho-acoustics, particularly around the sensitivities of the human ear, and which frequencies are most important in establishing the essential character of a given sound. Through this a generic picture of the ear is developed, a profile of sorts that the codec uses to make decisions about what frequencies to discard and what to emphasize. The compression process, like any process of conversion for transmission, introduces characteristic noise into the signal that the broadcaster hopes is ignored by the listener, as the trasmitted media is "close enough" to the original to be accepted as a reasonable fascimile.

The feedback structure of Conformed_Bits amplifies this noise over time, and also brings the codec to its limits by reinforcing the selected frequencies. Thus we can use the codec as a variable filter that "rings" or resonates in certain characteristic ways as the original material is introduced and ultimately destroyed by repeated processing. I find this interesting to experience over time. I tend to think of this as representing the breakdown of a way of thinking about the world that posits that anything can be represented in discrete digital form, or that human perception can be quantified as an "average" result that fits us all. No doubt there is much that is useful and productive about the move towards pervasive digital representation -- Conformed_Bits points out that this is an inherently destructive act that can, ultimately, be used for creative purposes.

    PT: Have you played with different codecs in the work to see if the feedback quality differs in significant ways, and if so, have you found some codecs more aesthetically pleasing than others for this process? I'm just using Ogg Vorbis in my net feedback/percussion piece, but I'd be curious to hear how other codecs sound over the long degradation process.

    JG: Yes, there are significant differences between the codecs that have to do with their particular algorithms and prioritization schemes. I've worked a lot with the Real codecs, also MP3 and some of the QuickTime codecs as well. While they vary inevitably there are strong characteristics that they share. I do, however, find some to be more "musical" than others, in a very abstract sense of course :) The particulars of the degradation signature, the kinds of ringing and blurring that can occur, vary both with the content of the source material, the volume level that the encoder is set to, and the quality and structure of the audio devices that are used to form the feedback chain. I've found that actually the specific mixer I use has a huge impact on the clarity of the loop and its sonic character. Often this is due to the intensity of certain frequencies that are emphasized by the codec itself, and how these frequencies expose the stability of the filters being used in a mixer's EQ section - a feedback loop within a feedback structure.

    Interestingly, I've found extensive sites devoted to the analysis of video codecs and their characteristic tendencies when an specific image is compressed and recompressed over many generations. These sites use standardized input images that allow one to compare the codecs and how they deal with challenging graphical issues, such as hard edging and moire patterns. To my knowledge there is no site devoted to this kind of systemic study of audio codecs, though it certainly would be interesting to see/hear this.

    The inherent differences in video codecs are much more dramatic and probably also have to do with the combination of both factors inherent in the codec's design and the specific hardware being used to capture the video image in the first place. The quality of A/D conversion and the weaknesses of any intervening hardware will all be amplified and exposed fairly quickly. Also, the data thinning in video codecs is so much higher than with audio that it normally will degrade much faster and more obviously in the first few generations.

PT: Several interview respondents have talked about the fallibility of networks or the imperfections in networks as being a point of interest for them artistically. In a piece I'm currently working on, the degradation of feedback through audio streaming is also a focal point of the work. This seems to tie into the description on the Conformed_Bits site regarding the feedback process and the conversion and reconversion of analog to digital removing data and introducing noise. In effect, its what happens at the edge of that transformation that is interesting: the artifacts. Is this a fair assessment? If so, why do you think there is such a great interest amongst electronic artists and artists working with networks to exploit imperfections, artifacts, and failures?

JG: I think that this is a fair assessment. Any analog to digital conversion will have a specific bit resolution, and therefore can be said to be imperfect on an absolute level. Many have pointed out, however, that at this point the imperfections are beyond our perceptive abilities when we're working at the highest resolutions (although this certainly is not true in net streaming).

When we work with networks I think that we are generally dealing with a medium that is rife with errors, and which works quite well in spite of them. I think that artists generally would be attracted to the errors in the system because they represent another view of the digital medium -- a vision that shows that the technologies are imperfect, and therefore, perhaps, a bit more human. Since most telecommunications is fundamentally concerned with human communication, and since we all have experienced the kinds of breakdowns that happen between people, I think that it's natural for artists to want to work at the edges as you say, where the fiction of the smooth and perfect digital world is exposed as a (perhaps necessary) fiction. I'm interested in how experience is constructed via technological design, and a good way to get access to this is to watch/listen to it as it malfunctions or decays over time.

    PT: I've also heard a few people just say flat out that failure is more interesting than success in terms of this type of work. I'm curious if you also feel that way?

    JG: Many times this is true - I think that they can be incredibly instructive and push one to think harder about what the problems may be in the approach being taken. Experimentation can lead to "failure" but also to a deeper understanding and perhaps even to innovative approaches in this field. In general, though, I prefer "success" :)

PT: Could you describe the adaptation of the Tim Perkis work in greater detail? What was the original implementation like and why have you chosen to do an adaptation of it?

JT: The original Perkis composition is entitled Wax Lips, and was created for the networked music group called The Hub. It defined a set of rules for reception, mapping, and transmission of messages on a network that would trigger musical activity. Each member of the ensemble created a synthesis instrument that performed two functions: receiving and playing the "note" that it received from another network node, and mapping that note to another note that is then sent on to a different node in the network after a specified delay. In other words, the piece consisted of a network of nodes that each had static maps of every possible incoming message to 1) a musical output; and 2) a new message to be passed on to another node. This structure is a closed loop -- once a message is "seeded" into the network it should bounce around indefinitely. When several seeds are introduced the resulting musical activity can be quite chaotic and unpredictable - essentially showing that even static mappings can produce complex behavior.

I chose to adapt this project for use in a Java programming course for visual artists. In the original composition the output was specified in terms of MIDI notes (pitch, velocity) but the specific sound played was left to the original composer. In addition, there was no pre-ordained technical solution - each member of the group came up with their own implementation as long as it adhered to the original specification. This approach seemed well suited to a short term project-oriented class in which I was introducing concepts around networks and user interface programming.

Rather than work with musical messages I chose to use a core text (a quote from Francis Cook's translation of the Avatamsaka Sutra concerning Indra's net) that, when disassembled and broken into constituent parts, formed the universe of possible messages to be exchanged between the network nodes. Each node created a lookup table mapping a given word to another word in the text, and also to an implicit network destination. The students' work was to develop a "back end" module that accomplished this mapping, and also a "front end" GUI that displayed the incoming messages to the screen. The back end module had to conform to a rigid technical spec, while the GUI was entirely of their own design.

The project was quite successful in teasing out issues of interface design, particularly the possiblities of disparate interfaces to the same process. I think we would have benefited from more time (the quarter was only 10 weeks long) to further develop the ideas...