Audio-JEPA

Public

Audio-JEPA is an adaptation of the Joint-Embedding Predictive Architecture (JEPA) for self-supervised audio representation learning. Built upon the I-JEPA paradigm, it uses a Vision Transformer (ViT) backbone to predict latent representations of masked spectrogram patches.

MIT License

Updated Dec 22, 2025

Created May 20, 2025

39 stars

1 forks

2 watchers

1 open issues

Languages

Codebase composition by bytes

Top Contributors

LU LudovicTuncay

11 commits