Advanced soundtracking

Getting the mp3 file using soundtrack endpoint

For more complex requests and full control of your soundtrack please use /soundtrack endpoint. It accepts two parameters which is the list (array) of regions and a normalization directive.

Generates and bounces a soundtrack

POST /soundtrack

Request Body

// Sample response

{
    "apiVersion":"1.2.0",
    "mp3": "https://api.muzaic.ai/result/2024-01-31/st_3b12463ea3ecc474e22ebc302e70a0b3.mp3",
    "regions":[{"number":0,"hash":"6cc0b6861d5d369ec475d7236e9bfb63"},{"number":1,"hash":"9191593eb3992ba7c4307455c0e1b5ca"}],
    "status": "region 0 normalize: none | region 1 no params object | soundtrack normalized: auto",
    "tokensUsed": 65,
    "executionTime": 5.111389875412
}

Example requests

Good to know: For fields tags and params similar rules and limitations apply for regions as for a single file generation.

Request with two regions

Here's a typical request for a 48-second soundtrack, starting with a generated region using the keyframes feature, followed by fixed jingle at the 20.2-second time mark.

{
    "regions": [
        {
            "time":0,
            "duration":20.2,
            "tags":[17],
            "params": {
                "intensity":[[0,1],[100,9]],
                "tempo":5,
                "rhythm":[[0,9],[100,1]],
                "tone":[[0,9],[100,1]],
                "variance":[[0,5],[100,5]]
            },
            "method":"adjust_end"
        },
        {
            "time":20.2,
            "duration":27.8,
            "sourceHash":"9191593eb3992ba7c4307455c0e1b5ca",
            "action":"copy",
            "method":"adjust_start"
        }    
    ]
}

Understanding regions

Regions are distinct segments of a soundtrack, each with its own defined start time and duration. The structure and logic of these regions are crucial for shaping the overall music effect. Muzaic API relies on the provided request data to perform the generation process and won't be making any changes to logical structure of the music piece (except some overall time adjustments, when certain method values are set).

Good to know: Regions offer a wide range of possibilities. Need to make room for a speaking person? Define a low-intensity region. Want to introduce silence? Adjust the time field accordingly and push forward the next region in time.

Understanding keyframes arrays

Muzaic API allows you to shape the music the way you desire. If you wish for the music to have variations between the beginning and end of a piece, you should utilize the keyframes array. This array is specially designed to describe the changes in specific values over time.

Good to know: In the current Muzaic API version, providing keyframes array for tempo parameter is not supported.

Example keyframe array

"intensity":[[0,5],[100,9]]

Keyframe arrays consist of pairs, where the first number represents a relative percentage of the total duration of a music region or file, and the second number represents a parameter value. While you can include an unlimited number of keyframes, it is mandatory to have a starting keyframe with a first number of 0 and an ending keyframe with a first number of 100. The percentage can be an integer or a float, while the parameter value must be an integer between 1 and 9. In the example mentioned above, the intensity parameter will increase from 5 to 9 during the duration of a music fragment.

Good to know: In the current version of the Muzaic API, keyframe arrays yield the best results for music files or regions longer than 60 seconds.

Hashes

Each Muzaic generation receives a unique hash which represents it in the system. Use these hashes to link your calls to previously generated audios. Hashes are needed when you want to insert a part of music into your soundtrack, regenerate some regions, or extend composition that you've liked.

Example of hash usage for extending a region

{
    "regions": [
        {
            "time":0,
            "duration":50,
            "sourceHash":"9191593eb3992ba7c4307455c0e1b5ca",
            "action":"extend"
        }    
    ]
}

Good to know: Obtain hashes with every response.

Using the action field

The action field controls the generation process, enabling you to maximize the use of previously generated music. Want to copy an entire passage? Create more music based on this fragment? Or maybe just regenerate with the same settings?

Good to know: For all actions except "generate", you need to pass a hash value in your request with sourceHash parameter.

Action field values

Good to know: Remember to include the duration parameter in your request, even when using the "copy" or "regenerate" action. You can freely change the duration, even with these specific actions, to create a slightly longer or shorter file.

Using the method field

The method field is responsible for interpreting the structure of the soundtrack. Sometimes, there's a need for exactly 20 seconds of an audio file, and sometimes for 20 seconds of music. Where's the difference? Consider having a video that lasts for 60 seconds, but that doesn't mean the accompanying audio is also 60 seconds long. Often, there's a 'tail' that includes and conveys other messages. You may want the music to end at 60 seconds, but not abruptly. Naturally, you would say, 'with decay.' Muzaic understands this if you specify it.

This is also crucial when merging different regions – you can use the method values to achieve the desired effect.

Good to know: Method field is not required.

Method field values

Good to know: You can only use "adjust_end" or "strict" method values when region time value is set really low. The forced change of method may occur for a time value up to 4 seconds.

Normalizing your soundtrack

With the Muzaic API, you can normalize your generated soundtrack using normalize field. Use one of three values:

  • none – This mode leaves the soundtrack as it was generated. This is the best option when there's an additional layer of audio, such as someone talking, original sound effects, or other background sounds.

  • auto – This mode will normalize the soundtrack according to the EBU R128 standard. It's best for videos or parts of videos where the music should be in the foreground.

  • high – This mode will normalize the soundtrack to a higher level of loudness.

Example request with normalization feature

{
    "normalize": "auto",
    "regions": [
        {
            "time":0,
            "duration":16,
            "tags":[17]
        },
        {
            "time":16,
            "duration":17.4,
            "tags":[2,11]
        }    
    ]
}

Response handling

If your request is valid, the response from the /soundtrack endpoint resembles that of a singleFile . It includes a URL to the created MP3 file (mp3), as well as status, tokensUsed, executionTime fields (in seconds) and apiVersion. Additionally, it contains a regions array for obtaining hashes of the created music pieces

Example response

{
    "apiVersion":"1.2.0",
    "mp3": "https://api.muzaic.ai/result/2024-01-31/st_3b12463ea3ecc474e22ebc302e70a0b3.mp3",
    "regions":[{"number":0,"hash":"6cc0b6861d5d369ec475d7236e9bfb63"},{"number":1,"hash":"9191593eb3992ba7c4307455c0e1b5ca"}],
    "status": "region 0 normalize: none | region 1 no params object | soundtrack normalized: auto",
    "tokensUsed": 65,
    "executionTime": 5.111389875412
}

List of statuses

With a Muzaic API call, you will always be provided with a music response. However, depending on your request, you may receive different statuses, and these statuses can stack.

Last updated