I need visual voice pitch tracking and recording app.
This is for evaluation purpose, i need just working prototype
App should do at the same time:
- play mp3/wav (from memory stream io)
- scroll smoothly long-wide image
- track accurate pitch (need to select one of existing free libs or ready to use code)
- draw pitch on that image (draw always at static position on screen)
- record into memory stream object in wav format, adjust delay to be synced with playable file
There are several libs available for free: crepe, google/REAPER (at github), vadymmarkov/Beethoven (at github)
Need to do some small research to select most suitable for real-time accurate detecion.
It should allow make a loop of range (in float seconds) with auto restart (clear generated image, clear recorded io stream, then pause for defined amount of seconds (float), and start from loop starting position). Stop by button, recorded stream should store only last try from loop start.
Scrolling and pitch tracking should look like in vocaberry app on android/ios
After playing/recording it should:
- allow to scroll by mouse/touchpad generated image (original + tracking)
- save wav file
- restart again
In future or if budget will allow to do in that project:
- after recording it should allow to playback mixed source mp3 with recorded one. at least hould be taken into account while developing code now.
- it should allow to playback while recording 2 mixed mp3/wav with different volume at the same time.
- use asio for play and recording, settings should be configured (for now just defined constants is fine 24bit, 48khz, read at once samples count should be read from asio device default/current settings).
- use python+qt or c, c++, delphi, not c#
- scrolling of image should be smooth, so need to use gl or directx or something
- scrollable image should be synced with samples count played (not timer)
- all should not be much cpu intensive
- definable volume level under which it will not track pitch (no draw on image)
Image will have header and footer size (definable), inner part splitted equally to definable notes range (frequency ranges). Notes specified as [CDEFGAB][1-6][#]. So with image will be image definition file with parameters:
- notes range, example e1-a2
- header size
- footer size
For debugging purpose asio4all may be used. You will need any cheap microphone connected to your sound card, web cam mic is fine.
Also before starting please try vocaberry to understand how it should work.
I need more precise tracking and minimized delay as much as possible. Also sometimes vocaberry mistakes for octave for few milliseconds, so need to somehow workaround this issue (sometimes frequency peak +-1 octave have more power for small amount of time).
In future in case of success (it will be tested on some sample of peoples) i need to port it to linux (jackd+qt/gl), so code should be ready to port, no need to do cross-platform for now, but better if it will use qt or gl
I will pay only after project is fully done and tested by me.