Research
Research Methodology and Findings
Related Projects Review
Our project, Jamboxx-Infinite, is an advanced iteration of the original Jamboxx project. As developers with no prior music experience, we drew significant inspiration and knowledge from Jamboxx, particularly in understanding music-related functionalities and user interactions. Below is a detailed review of the existing project and how it informed our development process.
Jamboxx: Overview and Key Features
- Project Name: Jamboxx
- Main Features:
-
Play: Users can select and play a variety of virtual musical instruments.
-
Learn: Interactive mini-games designed to teach basic music theory and instrument skills.
-
Jam: A mode for freestyle music creation and experimentation.
-
Lessons Learned and Improvements in Jamboxx-Infinite
-
Virtual Instruments (Play Mode)
- Original Implementation: Jamboxx offered a functional but simplistic interface for instrument selection and playback.
- Our Enhancements:
- Redesigned the interface with larger, more accessible icons to accommodate users with disabilities, particularly those relying on motion input.
- Introduced a more graphical and intuitive UI, making it easier for school-aged children to navigate and engage with the instruments.
-
Educational Mini-Games (Learn Mode)
- Original Implementation: Jamboxx included basic games to teach music fundamentals.
- Our Enhancements:
- Expanded the variety of mini-games to cover a broader range of music concepts.
- Revamped the UI with cartoon-inspired visuals and a child-friendly color palette to increase appeal and engagement for younger users.
- Integrated progressive difficulty levels to cater to different age groups and skill levels.
By building on Jamboxx’s foundation, Jamboxx-Infinite not only preserves the core functionalities but also introduces significant accessibility and user experience improvements, making it more inclusive and engaging for its target audience.
Technology Review
Voice Cloning
-
Model Selection
- TTS (Text-to-Speech) vs. SVC (Singing Voice Conversion) Models:
- TTS generates speech from text but lacks flexibility for voice conversion in existing audio.
- SVC transforms the voice in an audio file while preserving pitch and timing, making it ideal for music applications.
- Chosen Model: DDSP-SVC (GitHub)
- Compared to simpler TTS models, DDSP-SVC uses neural signal processing to achieve high-quality voice conversion with minimal artifacts.
- Alternative Considered: So-VITS-SVC (GitHub)
- So-VITS-SVC is user-friendly but requires more training data. DDSP-SVC was chosen for its real-time performance and better stability with limited datasets.
- TTS (Text-to-Speech) vs. SVC (Singing Voice Conversion) Models:
-
Development Stack
- Backend: Python + FastAPI
- Why not Java/Spring Boot?
- Python has stronger ML library support (e.g., PyTorch, Transformers) and faster prototyping.
- Key Libraries:
demucs
: Audio source separation.librosa
/pyworld
: Pitch and audio analysis.torchcrepe
: Pitch estimation.transformers
: Integration with pre-trained models.
- Why not Java/Spring Boot?
- Backend: Python + FastAPI
AI Teacher
-
Framework: llama.cpp
- Why not Hugging Face Transformers?
- llama.cpp optimizes offline inference and reduces hardware demands, critical for accessibility.
- Model: Mistral-7B
- Outperforms similar-sized models (e.g., LLaMA-7B) in reasoning tasks and has a permissive license.
- Why not Hugging Face Transformers?
-
Compilation: Nuitka
- Why compile Python?
- Improves startup speed and hides proprietary logic.
- Why not PyInstaller/Cython?
- Nuitka produces faster, smaller binaries and supports more Python features.
- Why compile Python?
Summary of Technical Decisions
Component | Choice | Reason |
---|---|---|
Voice Cloning | DDSP-SVC | Real-time, stable, minimal artifacts |
Backend | Python + FastAPI | ML ecosystem, rapid development |
AI Teacher | Mistral-7B + llama.cpp | Offline support, performance |
Compilation | Nuitka | Efficiency, compatibility |
References
[1] DDSP-SVC GitHub Repository. (2024). Available: https://github.com/yxlllc/DDSP-SVC
[2] So-Vits-SVC GitHub Repository. (2024). Available: https://github.com/svc-develop-team/so-vits-svc
[3] Mistral 7B: A Language Model for Text Generation. (2023). Available: arXiv:2310.06825.