Core Concepts¶
Supriya’s documentation and code assumes that you have some familiarity with the core concepts underpinning working with SuperCollider, and working in digital signal processing environments in general. Very little of what follows in the documentation will make sense until you develop a good grasp of these concepts.
Client and server¶
SuperCollider is a collection of tools working in concert together.
The most important of these are the client (sclang
) and the server
(scsynth
and supernova
).
The server produces audio as an output, potentially using audio as an input.
Like many other kinds of servers, scsynth
listens for messages from
clients, takes actions based on those messages, and potentially sends messages
back to the client (or clients).
The client orchestrates what messages are sent to the server towards some (usually musical) end. IT controls what commands are sent, and - importantly - when they are sent. The client may take action based on messages sent back by the server, but is not responsible for generating the audio itself.
SuperCollider’s client sclang is its own programming language with standard
library affordances for communicating with the SuperCollider server. Similarly,
Supriya is, as a client, a set of library affordances for communicating with
the SuperCollider server via Python, and leans on the rest of the Python
ecosystem to fill in the gaps that sclang
has to build from scratch (e.g.
packaging, documentation tooling, testing, scientific computing, UIs, etc.).
There are many non-sclang
clients beyond Supriya for a wide variety of
languages, but all follow the very broad pattern of orchestrating
communications with a running scsynth
or supernova
server. Most client,
including Supriya and sclang
, provide a class (or the equivalent in that
language) to model the concept of a server. In both Supriya and sclang
this
class is called, unsurprisingly, Server
! However, this isn’t actually the server, just a proxy to it as a
convenience for communications and process management.
Because of the communication and memory boundaries between the client and the server, it’s not always easy or even possible to know the exact state of the server.
Open sound control¶
How does the client communicate with the server? For historical reasons, the server listens for messages using the Open Sound Control (aka OSC) wire format.
Like WebSockets or stdin/stdout process pipes and unlike HTTP’s paired
request/response model, OSC communications are bidirectional. Messages go out
from the client to the server, and are not explicitly matched by responses from
the server back to the client. Both client and server may send each other
messages at any moment. By convention, scsynth
sends replies to many
requests, but this is an explicit design decision by the SuperCollider
developers and not a feature of OSC. Sending these response messages is not
enforced by OSC or the UDP or TCP protocols those OSC messages are sent over.
OSC allows for sending messages
and (potentially timestamped) bundles
of other messages or bundles. An OSC message
is basically a list of simple data types. By convention they start with a
string (the address) and any subsequent values must be integers, floats,
booleans, other strings, etc. You can think of it as a very, very strict subset
of JSON.
Some OSC messages to the server are considered synchronous. The command they encodeare is enacted immediately.
Other OSC messages to the server are considered asynchronous. The command
they encode may take a few moments to come to pass, typically because the
server must perform additional memory allocation or processing whose time
complexity is unknown. Most asynchronous messages to scsynth
may contain
a completion message: another OSC message or bundle to be handled once the
initial command has completed. You can think of these as a sort of callback:
“do Y once X completes, however long X takes.”
Supriya models all messages from the client to the server, and from the server
back to the client, explicitly as classes, and provides methods on its
Server
class for constructing and
sending these messages (or bundles thereof) transparently.
Nodes, groups, and synths¶
What sorts of commands do OSC messages encode, anyways? And what is the server even doing when it processes audio?
The server contains a number of entities. Perhaps the most crucial of these are
nodes
, which are organized into a
rooted acyclic digraph, a tree. There are two kinds of nodes:
groups
and synths
. Group nodes contain other nodes (either
other groups, or synths) while synths perform audio processing.
When processing audio, the server walks the node tree depth-first, starting from the root node, visiting each node in turn. The synth nodes process audio one after another until every node has been visited. Audio processing in this framework is relatively deterministic.
The alternative server supernova
introduces parallelism for special groups
called parallel groups, and therefore introduces indeterminism. Parallel
groups process their immediate children in parallel which means care must be
taken to handle how audio data is read and written by parallelized synths.
Buses and buffers¶
But where do synths read audio data from and write audio data to?
Synths typically read from and write to buses
: placeholders for signals. Buses come in two
flavors: audio-rate and control-rate. Control-rate buses are conceptually
equivalent to control voltage in modular synthesizer systems, and audio-rate
buses as conceptually similar to channels on a gigantic mixer. Audio-rate buses
allow us to manage processing signals sample-by-sample, while control-rate
buses only allow us to process signals once per sample-block. For many
applications, this arrangement is perfectly fine, especially given that the
default sample block size is 64 samples - typically barely more than a
millisecond at most common sample rates.
Synths can also read from and write to buffers buffers
: fixed-size arrays of floating point
values. Buffers are used for a variety of purposes: as wavetables, as
envelopes, as samples for samplers, for delay lines, as large-scale parameter
holders, etc. Unlike buses, which change and may (in the case of audio-rate)
zero out every sample block unless touched, buffers maintain their data until
changed.
Also unlike buses - which are instantiated en masse when the server boots and cannot be increased or decreased in number after booting, buffers must be explicitly allocated on the server as they require allocating an unknown amount of additional memory. Buffer allocation is one of the asynchronous actions mentioned above.
Synth definitions and unit generators¶
OK, but how do synths process audio?
Just like the node tree is a graph, synths are themselves graphs too. But
instead of being graphs of nodes, they’re graphs of unit generators
(colloquially called UGens). Unit generators
perform discrete audio operations: generate a sine tone, process and input
signal through a filter, multiply two inputs together, etc. Some unit
generators can read from audio or control buses, some can write back to those
buses. Some can read from buffers, and others can write back to those buffers.
Unit generators are composed together into graphs called synth
definitions
(colloquially called SynthDefs).
When a synth is visited during audio processing, each unit generator governed
by that synth is visited in turn, processing its inputs into its outputs.
When instantiating a synth into the node tree, we need to tell it what synth definition to use. These definitions are conceptually templates for synths. Unlike the node tree’s dynamic graph, graphs of unit generators are static. Once a synth definition has been defined, it is fixed. Need a different version with more channels? Make a new synth definition. Like buffer allocation, synth definition allocation is an asynchronous action. Synth definitions can be arbitrarily large, from as simple as “add a sine tone to an audio bus” to as complex as “simulate an entire DX-7 hardware synthesizer”.
Realtime and non-realtime¶
Finally, when does audio processing take place?
The server actually has two time modes: realtime (the default) and non-realtime.
When discussing interacting with the server, we typically think in realtime terms: audio is playing live, right now. Audio inputs are going in live, audio outputs are going out live. The server is responding to our commands in realtime. We are literally performing live, now. In realtime, a server receives blocks of samples from your sound card, chops those blocks into sample blocks and then processes each block through the node tree, handing back the sample block to the soundcard which eventually bundles that sample block into a hardware output block for performance on your speaker or headphones.
But the server can also perform offline in non-realtime mode (colloquially NRT or NRT mode). As with realtime, non-realtime performance requires OSC commands to tell the server what to do. Unlike realtime, those commands must be timestamped OSC bundles because a server in NRT mode has no concept of what “now” means, and those bundles must be passed to the server at startup as a file called a score.
Like sclang
, Supriya has a class called
Score
for modeling constructing this
sequence of OSC bundles. Unlike sclang
, Supriya attempts to make the
Score
interface as close as possible
to the Server
interface to facilitate
writing logic which is unaware of the current time mode.
NRT servers will process audio as fast as possible or as slow as necessary to perform all the computations necessary to generate the desired output. This makes them ideal for doing audio analysis (e.g. find the maximum amplitude in a soundfile) or for generating compositions too computationally complex to run in realtime without causing buffer under-runs (i.e. the server cannot process audio quickly enough to deliver to the soundcard for playback).