Intro

Everything that follows will make a lot more sense if you first take a look at the video of our livestream, here.  You can look at any ten minutes out of the middle of it and get a pretty good idea of what the results looked like in practice.  This took a lot of work the first time — but now that the work is done, very little would need to be changed for future shows.  Doing a new “show reel” of motion video takes about an hour, assuming you start out knowing what you want.  Two new show cards would take about the same amount of time, probably less.  So everything described here is reusable.

And it certainly could be improved.

A word about complexity

The setup described below is complex.  A solo performer could get by with a lot less — most likely just one computer running OBS to run a couple of USB cameras and manage graphics and motion video being played and generated by the same computer.  It would need a fairly heavy-duty CPU and a good chunk of RAM, but you could do it on a single, capable modern machine, and I’ll describe what that might look like at the end of this post.

I couldn’t do that.  Why? Using JamKazam (JK) kicks over a whole row of dominos:

  • JK completely takes over the machine that’s running it, both in terms of the audio system and in terms of CPU usage, so
  • If you want to use JK’s video system to let people see the band — and you do, because the alternatives are all worse, especially for your non-technical bandmates — you have to send the video out of the machine to something that can take on the extra computing load. The only practical way to do that is to find something that can a) handle the streaming for you, and b) accept an HDMI video input from the computer running JK. You want your JK machine to see that device as a second display, so
  • Already you’re talking about some some kind of capture-and-stream device, probably something made by El Gato or by ATEM, and
  • You’re going to need a second computer to generate graphics, because no way will that coexist with JK on the same box, and
  • Now that you’re in this deep, you may as well go whole-hog and buy a little $35 device that will play movies off a USB stick so you can mix them into the video, although you could also just use VLC or another media player on the graphics-generation machine.

See what a slippery slope that is?  I went down it headlong, giggling all the way.  The results follow, divvied up by subsystem.

System components

The components of the system are:

  • A live performance music rig of ridiculous complexity, ending in a stereo audio feed, analog in this case, although it could equally well be SP/DIF via coax or lightpipe.
  • An audio interface to convert that analog or digital audio and feed it into a computer that runs JamKazam. I use a MOTU Microbook IIc, though it could equally well be a $30 Behringer UCA-202, which would work just as well.
  • A computer to run JamKazam, in this case a Mac mini of considerable vintage
  • A small headphone distribution amplifier to send audio around to different things, including your performance monitoring system, the streaming device, and the graphics-generation machine.  You can find these from $25 up.  I use one made by Pyle; there’s a Mackie for around $40 that looks nice.
  • A computer to run graphics generation, in this case an equally elderly Mac mini running Project MilkSyphon (graphics), G-Force (graphics), and OBS (video manipulation/ streaming)
  • Optionally, an audio interface to feed the graphics generator, although I suspect you could just run the audio in through the built-in mic jack and get results good enough to make musically-responsive, pretty pictures with the graphics software.  If you go for this, use a cheapie Behringer like the UCA-202.
  • A little media player like the inexpensive AGPtEK HD Player Mini 1080, which costs 35 bucks.  It needs to have HDMI output.
  • A video switching and streaming device. I use the ATEM Mini Pro — a VERY capable device for the price.  We’ll encounter a few of its flaws later, but basically it offers incredible bang for the buck.  One caveat right up front: it won’t stream to restream.io [actually, it will — see a later post on this] — or at least I haven’t figured out how to do it yet.  Works brilliantly with Twitch, YouTube and Facebook.
  • Some way to record the show (both audio and video).  The ATEM has that built in, but you need a fast external hard drive and a USB-C adapter to use it.

You’ll notice a certain fondness for Mac minis. That’s because you can find them on the used market for very little money (typically around $250). Even older models perform well enough for our purposes here.

Audio overview

Audio for this just ain’t that hard.  You’ve got a pile of gear that makes funny noises, ending up with a stereo feed that has to get to JamKazam, and then you need to get the output of your JK session (because, y’know, you want everyone to hear your bandmates) to the audio-reactive graphics software, to the video stream, and also to your ears. The latter is particularly important with JK because whoever’s hosting the session is really the only person who can control balance among players with any confidence that what they hear is also what the audience will hear.  We need not go into the reasons for that — they’re complex and somewhat shrouded in mystery by JK — but the point is the guy who’s running the stream needs to have final say on what the “front of house’ mix is going to sound like.  So you get to be in charge of the sound check, too!  Lucky you.

As a practical matter, you feed JK’s output to a headphone distribution mixer and use that to set levels for your headphones, for the graphics generation machine, and for the switcher. The ATEM has an audio mixer with enough bells and whistles for podcast or TV-interview-style audio, but you should plan on doing any audio manipulation with your respective performance rigs, and then balance levels in a sound check.

Now on to the trickier stuff.

Video overview

Let’s start by talking about what I was trying to achieve.

A year or more ago, Jim and I had done a show for the Cosmic Crossings series down near Princeton.  Ken Palmer did the visuals for it, and we were later given a sort of overview recording that had shots from all three cameras used for the shoot superimposed on each other.  It was weird, and we liked it. 

I wanted to try for a similar look, and I also wanted to manipulate the viewer’s focus in a way that would simulate the way your eyes might wander from the performers to the projections during a live show.  In practical terms, this boiled down to manipulating and layering three video sources:  

  • video of the band performing, which was supplied by JamKazam’s video subsystem
  • audio-responsive motion graphics generated by software, and
  • a motion-video “show reel” with material designed to somehow reflect the personality of the band and of the performance

I realized that with two exceptions — the start and end of the show — it wasn’t really necessary to synchronize any of these layers with each other.  At the beginning, I wanted to put up a “show card” that invites the audience to wait around for the start. At the end, I wanted to put up a card thanking  them for attending. We also  wanted to do a brief announcement introducing the band at the start.  To accomplish those, I’d need to manipulate the switcher manually.  Other than that, the whole show could run as layered, un-synchronized loops that would interact with each other in interesting ways.  

Seeing the band

JamKazam allows each band member to run one or two USB cameras.  The JK video subsystem combines those into a tiled video window showing all of the band members in various combinations.  Each player can manually switch between cameras, or send both, and you can manually switch around between different arrangements of performer tiles in JamKazam’s video window.  

Bitter experience had taught us that using multiple cameras per performer resulted in very high CPU loads on the machines that were running the cameras.  In the case of the “session host” computer — the one providing audio to the livestream —  the spikes in CPU usage were bad enough to affect the audio quality of the outgoing stream.  So unfortunately multiple cameras per player were not really a possibility.  

Early on, I spent an hour or two trying to automate camera-switches to make the “band view” more interesting.  That was a complete waste of time, for two reasons. Any attempt to automate window switching using AppleScript caused JamKazam to crash.  But really there wasn’t really anything to be gained by automation.  Because we could only use a single camera per performer, we weren’t offering different shots of each player, so there wasn’t really much point in changing things up anyway.   So we could just content ourselves with setting up an arrangement of video tiles we liked before the show started, and leaving it throughout.

To a Mac mini, an ATEM switcher just looks like a second display, so you can use the Mac’s display-arranging settings to get what you want, and then just shove the JK video window into the second display, maximized.

Graphics generation

For no good reason, I decided to use two different audio-responsive graphics-generation programs.  I thought that the interaction between the two might be interesting — and it was, but probably not enough to justify the trouble involved.  I combined the two together into a single display using OBS, which also served to add titling that would show the band’s name from time to time for the benefit of people encountering the stream at random.   

The two programs I used were G-Force and the OS/X variation of ProjectM, called Project MilkSyphon.  G-Force excels in its responsiveness to changes in the audio, but involves a moderately expensive license, and Project M is free and has a rather different look and feel. Both offer huge variety and a lot of configurability.  I found that I had to do some weeding of the ProjectM presets — some were either far too dark to play well over video streaming, and some were so light that they either washed out the images they were combined with or caused flickering in the stream display. The weeding took an hour or so, but it was a one-time-only problem.  In reality, you could get away with ProjectM alone — and it runs on all platforms, whereas GForce is only available for OS/X and Windows. 

OBS was used to switch back and forth between the two and to add titling.  OBS acquired the graphics by capturing the output windows of the two graphics generation programs; unfortunately, this means that they have to run in the same display and workspace as OBS, but it’s manageable.  You can create text-titling resources in OBS itself, and I just did one that put the name of the band in a simple font in the bottom-right corner of the screen.

Show-running was a matter of setting up four scenes:  GForce alone, GForce with title text, ProjectM alone, and ProjectM with title text.  I used the Advanced Scene Switcher to rotate them at 90 second intervals.   Links to all of the graphics-generation software and to OBS and various setup tutorials for it are at the end of this post.

As with the JK setup, the Mac mini sees the ATEM as a second display, and OBS has the ability to “project” its output onto that second display, so problem solved. 

Creating motion video

The third visual layer was motion video.  I had discovered on earlier shows that it’s important that the video be  “plotless” — that is, that it not divert the audience’s attention to the details of what they imagine is an ongoing story.  That argued for short video clips.  And I wanted them to be in black-and-white, because the color graphics would layer better that way.  In the end, the visuals were of two types — industrial videos showing manufacturing processes or training, and videos of various different types of “trios”.  The latter could be anything from three women playing samisens to the Three Stooges, and in one case the score for a Ravel trio.

Resource gathering and transformation

All of the source material was found either in YouTube or in the Internet Archive. The latter is easy — the Internet Archive provides downloads in multiple formats, at least some of them edit-friendly.  YouTube is a different matter.  For that, there’s a very useful Python script called youtube-dl that will download video, given its URL.   (I’ve also created a variation on it that downloads the video and strips out the audio for use in voice clips and the like).

Video material that was in color was dropped to black and white using the Swiss Army knife of video processing, ffmpeg.   It’s an astonishingly versatile command line tool that allows pretty much any kind of transcoding, conversion, or manipulation you want — and does it VERY quickly.  I then used ffmpeg to break all of the motion video into segments about three minutes long.  I then took the three-minute clips and assembled them in a random order using a video editor.  I used Openshot, which is the most user-friendly of the Linux video editors, but you could use iMovie or anything else that will assemble clips end to end.  I suspect you could do it with ffmpeg, too ;).  The resulting file was placed on a stick drive and loaded into the video player.

ATEM show-running

We had no video operator, so to make the show run while we were playing, we needed some automation. Fortunately, the ATEM Mini Pro gives you a way to do that using macros.  Just like word-processing macros, ATEM macros are recorded sequences of actions you perform on the video switcher.  Before going any further, I suggest you take a look at this video, which explains them rather thoroughly (a few more videos on ATEM macros are in the reference list at the end).

By the time I got done watching the videos, I’d learned two things.  One was that I could string together a bunch of small macros and play the assembled macros in a loop.  The advantage of that approach is that it’s a lot easier to record different small, simple sequences of actions and then string them together than it is to do a single recording of a long, complex sequence.  The second thing I learned was that if I was really going to get the whole thing looking smooth and seamless, I was going to have to edit the macros by hand, which is the subject of the next section.

The first step was to figure out what one pass through the “show loop” would look like.  My first list looked something like this:

  • Band is shown all by themselves
  • Generated graphics appear over the band
  • Band fades out leaving generated graphics
  • Band comes back in a PIP window
  • PIP window goes away
  • Generated graphics get motion video faded up, run combined for a while
  • Generated graphics only
  • Back to the beginning

I could record each of those steps as a separate macro, and I did.  I then made a “mega-macro” by turning on the macro recorder and running each individual macro in the proper order with a pause in between.  That gave me my show loop.

ATEM macro editing

The big lesson of the first ATEM macro video — you did watch it, didn’t you? — was that I was unlikely to get the timing I wanted for fades and other actions unless I edited the macros by hand, and in any case I needed to put in appropriate pauses in different places where I wanted things to remain the same for a while.   Fortunately, I’d had quite a bit of experience working on XML files in my day job, and it proved not to be all that difficult — the macros are really straightforward lists of actions taken by the switcher.   

I started out using the stock Mac TextEdit application.  It turned out to be a lot easier to use an editor that is smart about XML, and I suggest that you do that — the best freeware editor available now for this purpose is Atom, which will run on any platform you can think of.  It’s smart about the indentation and layout of XML files, and its syntax highlighting is a big help in cutting and pasting.  Believe me, installing it will end up saving you time, even though it has a small learning curve of its own. 

ATEM library-building

It turns out that a lot of the editing work you do with ATEM macros is repetitive — if you’ve seen one crossfade, you’ve seen them all — so it pays to put reusable XML snippets aside in a separate file.  I built up a small library of retimed fades and other stuff that I can use again and again, and eventually I’ll put it out where others can use it.

ATEM software controller

ATEM has created a remote-control app for the ATEM Mini Pro that will control the switcher from anywhere on the same Ethernet subnet.  I was able to use it to control the switcher from the same laptop I use for audio routing and clip playback on my performance rig, which certainly helped with starting, ending and running the show.  The software runs on Windows and OS/X, and can be made to work on Linux.

Conclusion

That’s basically it.  I’ve left out details of cabling and other minutiae in the hope of maintaining some perspective while writing about something pretty complicated.   Questions are, of course, welcome in the comments, and I strongly encourage you to look at the references below if you’re thinking about doing this yourself.

What a single-machine version might look like

Others, especially my friend Jeremy dePrisco, have written extensively about getting livestreaming working for a solo performer.  Restream.io has also done a good beginner’s guide. For my purposes here, it’s enough to point out that you really could make all of this work on a single machine with a fairly powerful CPU so long as you are NOT running JamKazam. You can run Project M or another visualizer and use OBS to capture its output; you can do the same sort of  thing with motion video and VLC.  It can be fussy and frustrating the first time through, but it’s not super-difficult.

References

Audio

Video mangling

  • ffmpeg , the king of video and audio manipulation tools, especially for batch processing

Graphics generators

OBS

ATEM and ATEM Macro tutorials

Tom’s scripts

  • ATEM macro file used for this show is available on request.  Just e-mail me – tom–dot–bruce–dot–trb–at–gmail–dot–com.
X