Update UVR.py

Update requirements.txt
Update README.md
2023-04-16 15:55:23 -05:00 · 2023-04-12 04:44:07 -05:00 · 2023-04-12 04:40:55 -05:00 · 2023-04-12 03:26:00 -05:00 · 2023-04-12 02:13:30 -05:00 · 2023-04-11 00:48:35 -05:00
8 changed files with 140 additions and 175 deletions
--- a/README.md
+++ b/README.md
@@ -83,11 +83,11 @@ In order to use the Time Stretch or Change Pitch tool, you'll need Rubber Band.
 - Download the UVR dmg for MacOS via one of the links below:
    - Mac M1 (arm64) users:
       - [Main Download Link](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.5.0/Ultimate_Vocal_Remover_v5_5_MacOS_arm64.dmg)
-       - [Main Download Link mirror](https://www.mediafire.com/file/n6gkjo2l4v51ro2/Ultimate_Vocal_Remover_v5_5_MacOS_arm64.dmg/file)
+       - [Main Download Link mirror](https://www.mediafire.com/file_premium/o0tfneebhqw554e/Ultimate_Vocal_Remover_v5_5_MacOS_arm64.dmg/file)

    - Mac Intel (x86_64) users:
       - [Main Download Link](https://github.com/Anjok07/ultimatevocalremovergui/releases/download/v5.5.0/Ultimate_Vocal_Remover_v5_5_MacOS_x86_64.dmg)
-       - [Main Download Link mirror](https://www.mediafire.com/file/bcyxhy9ygxy8ks5/Ultimate_Vocal_Remover_v5_5_MacOS_x86_64.dmg/file)
+       - [Main Download Link mirror](https://www.mediafire.com/file_premium/m19wucslk9uzpcc/Ultimate_Vocal_Remover_v5_5_MacOS_x86_64.dmg/file)

 <details id="CannotOpen">
  <summary>MacOS Users: Having Trouble Opening UVR?</summary>
@@ -184,8 +184,8 @@ pip3 install -r requirements.txt
 - Fixed Download Center model list issue.
 - Fixed audio clip in ensemble mode.
 - Fixed output model name issue in ensemble mode.
- Added "Batch Mode" for MDX-Net to increase preformance.
-  - Batch Mode is more memory efficent.
+- Added "Batch Mode" for MDX-Net to increase performance.
+  - Batch Mode is more memory efficient.
  - Batch Mode produces the best output, regardless of batch size.
 - Added Batch Mode for VR Architecture.
 - Added Mixer Mode for Demucs.
@@ -195,9 +195,9 @@ pip3 install -r requirements.txt

 - The progress bar is now fully synced up with every process in the application.
 - Drag-n-drop feature should now work every time.
- Users can now drop large batches of files and directories as inputs. When directoriesare dropped, the application will search for any file with an audioextension and add it to the list of inputs.
- Fixed low resolution icon.
- Added the ability to download models manually if the application can't connect to the internet on it's own.
+- Users can now drop large batches of files and directories as inputs. When directories are dropped, the application will search for any file with an audio extension and add it to the list of inputs.
+- Fixed low-resolution icon.
+- Added the ability to download models manually if the application can't connect to the internet.
 - Various bug fixes for the Download Center.
 - Various design changes.

@@ -208,16 +208,16 @@ pip3 install -r requirements.txt

 ### New Options:

- "Select Saved Settings" option - Allows the user to save the current settings of the whole application. You can also load a saved setting or reset them to the default.
+- "Select Saved Settings" option - Allows the user to save the current settings of the whole application. You can also load saved settings or reset them to the default.
 - "Right-click" menu - Allows for quick access to important options.
 - "Help Hints" option - When enabled, users can hover over options to see pop-up text that describes that option. The right-clicking option also allows copying the "Help Hint" text.
- Secondary Model Mode - This option is an expanded version of the "Demucs Model" option that was only available to MDX-Net. Except now, this option is available in all three AI Networks and for any stem. Any model can now be Secondary, and the user can choose the amount of influence it has on the final result.
+- Secondary Model Mode - This option is an expanded version of the "Demucs Model" option only available to MDX-Net. Except now, this option is available in all three AI Networks and for any stem. Any model can now be Secondary, and the user can choose the amount of influence it has on the final result.
 - Robust caching for ensemble mode, allowing for much faster processing times.
- Clicking the "Input" field will pop-up a new window that allows the user to go through all of the selected audio inputs. Within this menu, users can:
+- Clicking the "Input" field will pop up a new window that allows the user to go through all of the selected audio inputs. Within this menu, users can:
    - Remove inputs.
    - Verify inputs.
    - Create samples of selected inputs.
- "Sample Mode" option - Allows the user to process only part of a track to sample settings or a model without running a full conversion.
+- "Sample Mode" option - Allows the user to process only part of a track to sample settings or a model without running a complete conversion.
    - The number in the parentheses is the current number of seconds the generated sample will be.
    - You can choose the number of seconds to extract from the track in the "Additional Settings" menu.

@@ -229,14 +229,14 @@ pip3 install -r requirements.txt

 ### MDX-NET:

- "Denoise Output" option - When enabled, this option results in cleaner results, but the processing time will be longer. This option has replaced Noise Reduction.
- "Spectral Inversion" option - This option uses spectral inversion techniques for a cleaner secondary stem result. This option may slow down the audio export process.
+- "Denoise Output" option results in cleaner results, but the processing time will be longer. This option has replaced Noise Reduction.
+- "Spectral Inversion" option uses spectral inversion techniques for a cleaner secondary stem result. This option may slow down the audio export process.
 - Secondary stem now has the same frequency cut-off as the main stem.

 ### Demucs:

- Demucs v4 models are now supported, including the 6 stem model.
- Ability to combine remaining stems instead of inverting selected stem with the mixture only when a user does not select "All Stems".
+- Demucs v4 models are now supported, including the 6-stem model.
+- Combining remaining stems instead of inverting selected stem with the mixture only when a user does not select "All Stems."
 - A "Pre-process" model that allows the user to run an inference through a robust vocal or instrumental model and separate the remaining stems from its generated instrumental mix. This option can significantly reduce vocal bleed in other Demucs-generated non-vocal stems.
  - The Pre-process model is intended for Demucs separations for all stems except vocals and instrumentals.

--- a/UVR.py
+++ b/UVR.py
@@ -2821,7 +2821,7 @@ class MainWindow(TkinterDnD.Tk if is_dnd_compatible else tk.Tk):
        
        credit_label(place=16,
                     frame=credits_Frame,
-                     text="Audio Separation and CC Karokee & Friends Discord Communities",
+                     text="Audio Separation and CC Karaoke & Friends Discord Communities",
                     message="Thank you for the support!")

        more_info_tab_Frame = Frame(tab2, highlightthicknes=30)
--- a/gui_data/change_log.txt
+++ b/gui_data/change_log.txt
@@ -1,12 +1,23 @@
-Change Log:
+Most Recent Changes:

-Fixes & Changes:
+~ Fixed Download Center model list issue.
+~ Fixed audio clip in ensemble mode.
+~ Fixed output model name issue in ensemble mode.
+~ Added "Batch Mode" for MDX-Net to increase performance.
+    ~ Batch Mode is more memory efficient.
+    ~ Batch Mode produces the best output, regardless of batch size.
+~ Added Batch Mode for VR Architecture.
+~ Added Mixer Mode for Demucs.
+    ~ This option may improve separation for some 4-stem models.
+
+Fixes & Changes going from UVR v5.4 to v5.5:

 ~ The progress bar is now fully synced up with every process in the application.
 ~ Fixed low-resolution icon.
 ~ Added the ability to download models manually if the application can't connect 
   to the internet.
 ~ Drag-n-drop is functional across all os platforms.
+~ Resolved mp3 tag issue in MacOS version.

 Performance:

@@ -16,7 +27,7 @@ Performance:
 MacOS M1 Notes:

 ~ The GPU Conversion checkbox will enable MPS for GPU acceleration. However,
-   only the VR Architecture models are compatible with it at this time.
+   only the VR Architecture models are currently compatible with it.

 New Options:

@@ -24,7 +35,7 @@ New Options:
   of the whole application. You can also load a saved setting or reset them to 
   the default.
 ~ Right-click menu - Allows for quick access to important options.
-~ Help Hints option - When enabled, users can hover over options to see pop-up 
+~ Help Hints option - When enabled, users can hover over options to see a pop-up 
   text that describes that option. The right-clicking option also allows copying 
   the "Help Hint" text.
 ~ Secondary Model Mode - This option is an expanded version of the "Demucs Model" 
@@ -32,8 +43,7 @@ New Options:
   in all three AI Networks and for any stem. Any model can now be Secondary, and 
   the user can choose the amount of influence it has on the final result.
 ~ Robust caching for ensemble mode, allowing for much faster processing times.
-~ Clicking the "Input" field will pop-up a new window that allows the user to go 
-   through all of the selected audio inputs. Within this menu, users can:
+~ Clicking the "Input" field will pop up a window allowing the user to review the selected audio inputs. Within this menu, users can:
    ~ Remove inputs.
    ~ Verify inputs.
    ~ Create samples of chosen inputs.
@@ -54,15 +64,15 @@ VR Architecture:

 MDX-NET:

-~ Denoise Output option - When enabled, this option results in cleaner results, 
+~ Denoise Output option results in cleaner results, 
   but the processing time will be longer. This option has replaced Noise Reduction.
-~ Spectral Inversion option - This option uses spectral inversion techniques for a 
+~ Spectral Inversion option uses spectral inversion techniques for a 
   cleaner secondary stem result. This option may slow down the audio export process.
 ~ Secondary stem now has the same frequency cut-off as the main stem.

 Demucs:

-~ Demucs v4 models are now supported, including the 6 stem model.
+~ Demucs v4 models are now supported, including the 6-stem model.
 ~ Ability to combine remaining stems instead of inverting selected stem with the 
   mixture only when a user does not select "All Stems".
 ~ A Pre-process model that allows the user to run an inference through a robust 
--- a/lib_v5/spec_utils.py
+++ b/lib_v5/spec_utils.py
@@ -5,8 +5,9 @@ import math
 import random
 import math
 import platform
+import traceback
 from . import pyrb
-
+#cur
 OPERATING_SYSTEM = platform.system()
 SYSTEM_ARCH = platform.platform()
 SYSTEM_PROC = platform.processor()
@@ -18,7 +19,7 @@ else:
    from . import pyrb

 if OPERATING_SYSTEM == 'Darwin':
-    wav_resolution = "polyphase" if SYSTEM_PROC == ARM or ARM in SYSTEM_ARCH else 'sinc_fastest'
+    wav_resolution = "polyphase" if SYSTEM_PROC == ARM or ARM in SYSTEM_ARCH else "sinc_fastest" 
 else:
    wav_resolution = "sinc_fastest"

@@ -35,8 +36,6 @@ def crop_center(h1, h2):
    elif h1_shape[3] < h2_shape[3]:
        raise ValueError('h1_shape[3] must be greater than h2_shape[3]')

-    # s_freq = (h2_shape[2] - h1_shape[2]) // 2
-    # e_freq = s_freq + h1_shape[2]
    s_time = (h1_shape[3] - h2_shape[3]) // 2
    e_time = s_time + h2_shape[3]
    h1 = h1[:, :, :, s_time:e_time]
@@ -116,6 +115,8 @@ def normalize(wave, is_normalize=False):
        if is_normalize:
            print(f"The result was normalized.")
            wave /= maxv
+        else:
+            print(f"The result was not normalized.")
    else:
        print(f"\nNormalization Set {is_normalize}: Input not above threshold for clipping. Max:{maxv}")
    
@@ -128,11 +129,14 @@ def normalize_two_stem(wave, mix, is_normalize=False):
    max_mix = np.abs(mix).max()
    
    if maxv > 1.0:
-        print(f"\nNormalization Set {is_normalize}: Primary source above threshold for clipping. The result was normalized. Max:{maxv}")
-        print(f"\nNormalization Set {is_normalize}: Mixture above threshold for clipping. The result was normalized. Max:{max_mix}")
+        print(f"\nNormalization Set {is_normalize}: Primary source above threshold for clipping. Max:{maxv}")
+        print(f"\nNormalization Set {is_normalize}: Mixture above threshold for clipping. Max:{max_mix}")
        if is_normalize:
+            print(f"The result was normalized.")
            wave /= maxv
            mix /= maxv
+        else:
+            print(f"The result was not normalized.")
    else:
        print(f"\nNormalization Set {is_normalize}: Input not above threshold for clipping. Max:{maxv}")
    
@@ -205,75 +209,51 @@ def reduce_vocal_aggressively(X, y, softmask):
    return y_mag * np.exp(1.j * np.angle(y))

 def merge_artifacts(y_mask, thres=0.01, min_range=64, fade_size=32):
-    if min_range < fade_size * 2:
-        raise ValueError('min_range must be >= fade_size * 2')
-
-    idx = np.where(y_mask.min(axis=(0, 1)) > thres)[0]
-    start_idx = np.insert(idx[np.where(np.diff(idx) != 1)[0] + 1], 0, idx[0])
-    end_idx = np.append(idx[np.where(np.diff(idx) != 1)[0]], idx[-1])
-    artifact_idx = np.where(end_idx - start_idx > min_range)[0]
-    weight = np.zeros_like(y_mask)
-    if len(artifact_idx) > 0:
-        start_idx = start_idx[artifact_idx]
-        end_idx = end_idx[artifact_idx]
-        old_e = None
-        for s, e in zip(start_idx, end_idx):
-            if old_e is not None and s - old_e < fade_size:
-                s = old_e - fade_size * 2
-
-            if s != 0:
-                weight[:, :, s:s + fade_size] = np.linspace(0, 1, fade_size)
-            else:
-                s -= fade_size
-
-            if e != y_mask.shape[2]:
-                weight[:, :, e - fade_size:e] = np.linspace(1, 0, fade_size)
-            else:
-                e += fade_size
-
-            weight[:, :, s + fade_size:e - fade_size] = 1
-            old_e = e
-
-    v_mask = 1 - y_mask
-    y_mask += weight * v_mask
-
-    return y_mask
-
-def mask_silence(mag, ref, thres=0.1, min_range=64, fade_size=32):
-    if min_range < fade_size * 2:
-        raise ValueError('min_range must be >= fade_area * 2')
-
-    mag = mag.copy()
-
-    idx = np.where(ref.mean(axis=(0, 1)) < thres)[0]
-    starts = np.insert(idx[np.where(np.diff(idx) != 1)[0] + 1], 0, idx[0])
-    ends = np.append(idx[np.where(np.diff(idx) != 1)[0]], idx[-1])
-    uninformative = np.where(ends - starts > min_range)[0]
-    if len(uninformative) > 0:
-        starts = starts[uninformative]
-        ends = ends[uninformative]
-        old_e = None
-        for s, e in zip(starts, ends):
-            if old_e is not None and s - old_e < fade_size:
-                s = old_e - fade_size * 2
-
-            if s != 0:
-                weight = np.linspace(0, 1, fade_size)
-                mag[:, :, s:s + fade_size] += weight * ref[:, :, s:s + fade_size]
-            else:
-                s -= fade_size
-
-            if e != mag.shape[2]:
-                weight = np.linspace(1, 0, fade_size)
-                mag[:, :, e - fade_size:e] += weight * ref[:, :, e - fade_size:e]
-            else:
-                e += fade_size
-
-            mag[:, :, s + fade_size:e - fade_size] += ref[:, :, s + fade_size:e - fade_size]
-            old_e = e
-
-    return mag
+    mask = y_mask
    
+    try:
+        if min_range < fade_size * 2:
+            raise ValueError('min_range must be >= fade_size * 2')
+
+        idx = np.where(y_mask.min(axis=(0, 1)) > thres)[0]
+        start_idx = np.insert(idx[np.where(np.diff(idx) != 1)[0] + 1], 0, idx[0])
+        end_idx = np.append(idx[np.where(np.diff(idx) != 1)[0]], idx[-1])
+        artifact_idx = np.where(end_idx - start_idx > min_range)[0]
+        weight = np.zeros_like(y_mask)
+        if len(artifact_idx) > 0:
+            start_idx = start_idx[artifact_idx]
+            end_idx = end_idx[artifact_idx]
+            old_e = None
+            for s, e in zip(start_idx, end_idx):
+                if old_e is not None and s - old_e < fade_size:
+                    s = old_e - fade_size * 2
+
+                if s != 0:
+                    weight[:, :, s:s + fade_size] = np.linspace(0, 1, fade_size)
+                else:
+                    s -= fade_size
+
+                if e != y_mask.shape[2]:
+                    weight[:, :, e - fade_size:e] = np.linspace(1, 0, fade_size)
+                else:
+                    e += fade_size
+
+                weight[:, :, s + fade_size:e - fade_size] = 1
+                old_e = e
+
+        v_mask = 1 - y_mask
+        y_mask += weight * v_mask
+        
+        mask = y_mask
+    except Exception as e:
+        error_name = f'{type(e).__name__}'
+        traceback_text = ''.join(traceback.format_tb(e.__traceback__))
+        message = f'{error_name}: "{e}"\n{traceback_text}"'
+        print('Post Process Failed: ', message)
+        
+
+    return mask
+
 def align_wave_head_and_tail(a, b):
    l = min([a[0].size, b[0].size])  
    
@@ -386,11 +366,11 @@ def mirroring(a, spec_m, input_high_end, mp):
        
        return np.where(np.abs(input_high_end) <= np.abs(mi), input_high_end, mi)

-def adjust_aggr(mask, is_vocal_model, aggressiveness):
-    aggr = aggressiveness.get('value', 0.0) * 4
+def adjust_aggr(mask, is_non_accom_stem, aggressiveness):
+    aggr = aggressiveness['value']

    if aggr != 0:
-        if is_vocal_model:
+        if is_non_accom_stem:
            aggr = 1 - aggr
    
        aggr = [aggr, aggr]
@@ -403,6 +383,9 @@ def adjust_aggr(mask, is_vocal_model, aggressiveness):
            mask[ch, :aggressiveness['split_bin']] = np.power(mask[ch, :aggressiveness['split_bin']], 1 + aggr[ch] / 3)
            mask[ch, aggressiveness['split_bin']:] = np.power(mask[ch, aggressiveness['split_bin']:], 1 + aggr[ch])

+        # if is_non_accom_stem:
+        #     mask = (1.0 - mask)
+        
    return mask

 def stft(wave, nfft, hl):
@@ -442,36 +425,20 @@ def spec_effects(wave, algorithm='Default', value=None):
            
    return wave      

-def spectrogram_to_wave_bare(spec, hop_length=1024):
-    spec_left = np.asfortranarray(spec[0])
-    spec_right = np.asfortranarray(spec[1])
-    wave_left = librosa.istft(spec_left, hop_length=hop_length)
-    wave_right = librosa.istft(spec_right, hop_length=hop_length)
-    wave = np.asfortranarray([wave_left, wave_right])
-
-    return wave
-
-def spectrogram_to_wave_no_mp(spec, hop_length=1024):
-    if spec.ndim == 2:
-        wave = librosa.istft(spec, hop_length=hop_length)
-    elif spec.ndim == 3:
-        spec_left = np.asfortranarray(spec[0])
-        spec_right = np.asfortranarray(spec[1])
-
-        wave_left = librosa.istft(spec_left, hop_length=hop_length)
-        wave_right = librosa.istft(spec_right, hop_length=hop_length)
-        wave = np.asfortranarray([wave_left, wave_right])
+def spectrogram_to_wave_no_mp(spec, n_fft=2048, hop_length=1024):
+    wave = librosa.istft(spec, n_fft=n_fft, hop_length=hop_length)
+    
+    if wave.ndim == 1:
+        wave = np.asfortranarray([wave,wave])

    return wave

 def wave_to_spectrogram_no_mp(wave):
    
-    wave_left = np.asfortranarray(wave[0])
-    wave_right = np.asfortranarray(wave[1])
-
-    spec_left = librosa.stft(wave_left, n_fft=2048, hop_length=1024)
-    spec_right = librosa.stft(wave_right, n_fft=2048, hop_length=1024)
-    spec = np.asfortranarray([spec_left, spec_right])
+    spec = librosa.stft(wave, n_fft=2048, hop_length=1024)
+    
+    if spec.ndim == 1:
+        spec = np.asfortranarray([spec,spec])

    return spec

@@ -519,6 +486,8 @@ def ensembling(a, specs):
    return spec

 def ensemble_inputs(audio_input, algorithm, is_normalization, wav_type_set, save_path):
+
+    wavs_ = []
    
    if algorithm == AVERAGE:
        output = average_audio(audio_input)
@@ -528,10 +497,15 @@ def ensemble_inputs(audio_input, algorithm, is_normalization, wav_type_set, save
        
        for i in range(len(audio_input)):  
            wave, samplerate = librosa.load(audio_input[i], mono=False, sr=44100)
+            wavs_.append(wave)
            spec = wave_to_spectrogram_no_mp(wave)
            specs.append(spec)
        
+        wave_shapes = [w.shape[1] for w in wavs_]
+        target_shape = wavs_[wave_shapes.index(max(wave_shapes))]
+        
        output = spectrogram_to_wave_no_mp(ensembling(algorithm, specs))
+        output = to_shape(output, target_shape.shape)

    sf.write(save_path, normalize(output.T, is_normalization), samplerate, subtype=wav_type_set)

@@ -555,7 +529,7 @@ def to_shape_minimize(x: np.ndarray, target_shape):
    return np.pad(x, tuple(padding_list), mode='constant')

 def augment_audio(export_path, audio_file, rate, is_normalization, wav_type_set, save_format=None, is_pitch=False):
-    print('Rate: ', rate)
+
    wav, sr = librosa.load(audio_file, sr=44100, mono=False)

    if wav.ndim == 1:
--- a/lib_v5/vr_network/nets.py
+++ b/lib_v5/vr_network/nets.py
@@ -118,7 +118,7 @@ class CascadedASPPNet(nn.Module):

        self.offset = 128

-    def forward(self, x, aggressiveness=None):
+    def forward(self, x):
        mix = x.detach()
        x = x.clone()

@@ -155,17 +155,12 @@ class CascadedASPPNet(nn.Module):
                mode='replicate')
            return mask * mix, aux1 * mix, aux2 * mix
        else:
-            if aggressiveness:
-                mask[:, :, :aggressiveness['split_bin']] = torch.pow(mask[:, :, :aggressiveness['split_bin']], 1 + aggressiveness['value'] / 3)
-                mask[:, :, aggressiveness['split_bin']:] = torch.pow(mask[:, :, aggressiveness['split_bin']:], 1 + aggressiveness['value'])
+            return mask# * mix

-            return mask * mix
-
-    def predict(self, x_mag, aggressiveness=None):
-        h = self.forward(x_mag, aggressiveness)
+    def predict_mask(self, x):
+        mask = self.forward(x)

        if self.offset > 0:
-            h = h[:, :, :, self.offset:-self.offset]
-            assert h.size()[3] > 0
+            mask = mask[:, :, :, self.offset:-self.offset]

-        return h
+        return mask
--- a/lib_v5/vr_network/nets_new.py
+++ b/lib_v5/vr_network/nets_new.py
@@ -40,50 +40,32 @@ class BaseNet(nn.Module):

 class CascadedNet(nn.Module):

-    def __init__(self, n_fft, nn_architecture):
+    def __init__(self, n_fft, nn_arch_size, nout=32, nout_lstm=128):
        super(CascadedNet, self).__init__()
+
        self.max_bin = n_fft // 2
        self.output_bin = n_fft // 2 + 1
        self.nin_lstm = self.max_bin // 2
        self.offset = 64
-        self.nn_architecture = nn_architecture
+        nout = 64 if nn_arch_size == 218409 else nout

-        print('ARC SIZE: ', nn_architecture)
-
-        if nn_architecture == 218409:
-            self.stg1_low_band_net = nn.Sequential(
-                BaseNet(2, 32, self.nin_lstm // 2, 128),
-                layers.Conv2DBNActiv(32, 16, 1, 1, 0)
+        self.stg1_low_band_net = nn.Sequential(
+            BaseNet(2, nout // 2, self.nin_lstm // 2, nout_lstm),
+            layers.Conv2DBNActiv(nout // 2, nout // 4, 1, 1, 0)
            )
-            self.stg1_high_band_net = BaseNet(2, 16, self.nin_lstm // 2, 64)
+        
+        self.stg1_high_band_net = BaseNet(2, nout // 4, self.nin_lstm // 2, nout_lstm // 2)

-            self.stg2_low_band_net = nn.Sequential(
-                BaseNet(18, 64, self.nin_lstm // 2, 128),
-                layers.Conv2DBNActiv(64, 32, 1, 1, 0)
+        self.stg2_low_band_net = nn.Sequential(
+            BaseNet(nout // 4 + 2, nout, self.nin_lstm // 2, nout_lstm),
+            layers.Conv2DBNActiv(nout, nout // 2, 1, 1, 0)
            )
-            self.stg2_high_band_net = BaseNet(18, 32, self.nin_lstm // 2, 64)
+        self.stg2_high_band_net = BaseNet(nout // 4 + 2, nout // 2, self.nin_lstm // 2, nout_lstm // 2)

-            self.stg3_full_band_net = BaseNet(50, 64, self.nin_lstm, 128)
+        self.stg3_full_band_net = BaseNet(3 * nout // 4 + 2, nout, self.nin_lstm, nout_lstm)

-            self.out = nn.Conv2d(64, 2, 1, bias=False)
-            self.aux_out = nn.Conv2d(48, 2, 1, bias=False)
-        else:
-            self.stg1_low_band_net = nn.Sequential(
-                BaseNet(2, 16, self.nin_lstm // 2, 128),
-                layers.Conv2DBNActiv(16, 8, 1, 1, 0)
-            )
-            self.stg1_high_band_net = BaseNet(2, 8, self.nin_lstm // 2, 64)
-
-            self.stg2_low_band_net = nn.Sequential(
-                BaseNet(10, 32, self.nin_lstm // 2, 128),
-                layers.Conv2DBNActiv(32, 16, 1, 1, 0)
-            )
-            self.stg2_high_band_net = BaseNet(10, 16, self.nin_lstm // 2, 64)
-
-            self.stg3_full_band_net = BaseNet(26, 32, self.nin_lstm, 128)
-
-            self.out = nn.Conv2d(32, 2, 1, bias=False)
-            self.aux_out = nn.Conv2d(24, 2, 1, bias=False)
+        self.out = nn.Conv2d(nout, 2, 1, bias=False)
+        self.aux_out = nn.Conv2d(3 * nout // 4, 2, 1, bias=False)

    def forward(self, x):
        x = x[:, :, :self.max_bin]
--- a/requirements.txt
+++ b/requirements.txt
@@ -26,6 +26,7 @@ pydub==0.25.1
 pyglet==1.5.23
 pyperclip==1.8.2
 pyrubberband==0.3.0
+pytorch_lightning==2.0.0
 PyYAML==6.0
 resampy==0.2.2
 scipy==1.9.3
@@ -38,4 +39,4 @@ wget==3.2
 samplerate==0.1.0
 screeninfo==0.8.1
 PySoundFile==0.9.0.post1; sys_platform != 'windows'
-SoundFile==0.9.0; sys_platform == 'windows'
+SoundFile==0.9.0; sys_platform == 'windows'
--- a/separate.py
+++ b/separate.py
@@ -392,7 +392,8 @@ class SeperateMDX(SeperateAttributes):
    
    def stft(self, x):
        x = x.reshape([-1, self.chunk_size])
-        x = torch.stft(x, n_fft=self.n_fft, hop_length=self.hop, window=self.window, center=True)
+        x = torch.stft(x, n_fft=self.n_fft, hop_length=self.hop, window=self.window, center=True,return_complex=True)
+        x=torch.view_as_real(x)
        x = x.permute([0,3,1,2])
        x = x.reshape([-1,2,2,self.n_bins,self.dim_t]).reshape([-1,self.dim_c,self.n_bins,self.dim_t])
        return x[:,:,:self.dim_f]
@@ -402,6 +403,8 @@ class SeperateMDX(SeperateAttributes):
        x = torch.cat([x, freq_pad], -2)
        x = x.reshape([-1,2,2,self.n_bins,self.dim_t]).reshape([-1,2,self.n_bins,self.dim_t])
        x = x.permute([0,2,3,1])
+        x=x.contiguous()
+        x=torch.view_as_complex(x)
        x = torch.istft(x, n_fft=self.n_fft, hop_length=self.hop, window=self.window, center=True)
        return x.reshape([-1,2,self.chunk_size])

@@ -936,4 +939,4 @@ def save_format(audio_path, save_format, mp3_bit_set):
        try:
            os.remove(audio_path)
        except Exception as e:
-            print(e)
+            print(e)
Author	SHA1	Message	Date
Anjok07	6bf69b2756	Update UVR.py	2023-04-16 15:55:23 -05:00
Anjok07	793acdee2a	Update requirements.txt	2023-04-12 04:44:07 -05:00
Anjok07	7d42a324e1	Update README.md	2023-04-12 04:40:55 -05:00
Anjok07	459b38b3b9	Add files via upload	2023-04-12 03:26:00 -05:00
Anjok07	6ffd7a244e	Add files via upload	2023-04-12 02:13:30 -05:00
Anjok07	18d32660db	Merge pull request #480 from 233lol/patch-1 fix stft and istft in pyotrch 2.0.0	2023-04-11 00:48:35 -05:00
233lol	535225172e	fix stft and istft in pyotrch 2.0.0 fix stft and istft in pyotrch 2.0.0 in pytorch 2.0.0 not support real output(stft)and real input(istft)	2023-04-06 09:55:10 +08:00
Anjok07	5cef5eebbe	Update requirements.txt	2023-04-05 19:06:44 -05:00
Anjok07	58a2daa7fa	Add files via upload	2023-04-03 21:09:54 -05:00
Anjok07	ef024ef8e8	Update README.md	2023-04-03 21:01:32 -05:00