A part of my mini-project to fill up my in-house media server, I've been hunting down some older stuff I liked. One of the rareties that I found on Youtube was someone had uploaded some really low quality VHS rips of The Mary Whitehouse Experience and Newman and Baddiel in Pieces. Neither have been repeated, released on DVD etc, and in all probability never will be..
The Mary Whitehouse Experience footage was 240p, and fairly badly compressed. The In Pieces footage was 360p and slightly better. I used one of the myriad Youtube downloaders to pull out video files of them...I could just chuck them on the server as-is, however I wondered with the recent advances in AI Upscaling, was there any way of improving the footage?
A small amount of research leg me to Topaz AI Labs, and specifically their AI Enhance video product. There have been some examples of it being used to restore old footage, and in combination with DAIN (something co-developed with Google) to produce 4K60 footage of very old video...however looking at the source material those projects used, it was cleaner than the 240p VHS rips that I had...nevertheless, I thought I'd give it a go. I had a suitable machine sitting round (a rather middle-ground i3 processor with a GTX1660, with the required CUDA cores).
My knowledge of AI processes is limited, based on a Udacity Intro to Tensorflow...which taught me enough to understand that half the challenge of building a Machine-Learning AI is structuring it, the other half is providing suitable training data (and with my work hat on, I think most people have no idea that the training data is so important).
The Topaz AI Enhance product has a 30 day trial, and currently offers 2 Upscaling AI models...Gaia and Artemis. Each of these have then been trained on differing qualities of video (referred to as LQ, MQ and HQ, though exact resolutions etc were not provided). The marketing speil on their website clearly indicates that the primary use case is to upscale 1080p footage to 4k and beyond, so I was almost certainly working outside the training data. I did some test runs of 10-frame clips (a feature provided in the software), and quickly came to the conclusion that the Gaia model could not handle the extremely low quality of the footage...exceptionally bad artifacts around spectacles and mouthes (and in a stand-up show where at least half the performers are in glasses, that was an issue). I eventually settled on the Artemis model with LQ training, and with a target resolution of 720p...preserving the 4:3 ratio (so black bars), as zoom-to-fit was chopping off heads and lower-thirds text.
The entire process took about 3 days of constant processing, with just shy of 1,000,000 frames re-scaled. As I said, not exactly a powerhouse PC.
So, for In Pieces, that's a 200% upscale. For The Mary Whitehouse Experience, thats 310%.
So how did it go? I've picked similar clips from both programs (and one that is probably recognisable to anyone who watched TV in the early 90's)
For the 360p footage from Newman and Baddiel In Pieces, actually pretty good. This is a side-by-side with the original 360p footage on the left, and the 720p on the right...both scaled in the video player to 720.
Given the starting quality, I'm pretty impressed.
The 240p footage fared less well.
It immediately looks over-sharpened, and (to me) quite reminiscent of A Scanner Darkly, with a cell-shaded effect, or very bad green-screening. A close-up of a single individual shows that there simply was not enough information for the AI to infer any detail, and so faces/hair etc looks smoothed and sand-blasted.
That said, this is a rapidly evolving area, and I can well believe that in a year or two there will be models out there that can work with this ultra-low quality footage much better. There is another Topaz product specifically for pictures, and it could be that using that on indidivudal frames may return better results...though whether I currently have the hardware to do that sort of thing is less lightly.
Comments
Interesting results. This sort of tech has been popping up a lot recently in all sorts of applications and it seems almost like a real world version of the magical "Enhance!" that films used to do where they have some image from a satellite which is maybe 16 pixels and they repeatedly zoom in somehow getting better and better detail.
Definitely some hard limits there on what can be extracted from limited data the last image does remind me of those art restorations gone wrong like the monkey jesus from a few years back. Still an evolving field and hopefully better and more efficient versions of this technology to come.
Yeah, I think that in computer games this will be the norm pretty soon...it's significantly easier to generate data sets of high-res and low-res footage to train models on...I can seven see if where a game ships with the relevant upscaling AI model embedded in the code (rather than as part of the "game-ready" driver, which seems to be the current model). For the stuff I've tried here, I simply don't think there is enough training material, and (probably as important) enough of a market for it. The vast majority of stuff that has been squirrelled away on 80's VHS and Betamax tapes will have been re-released from the master tapes on DVD/Streaming...stuff like TMWE is real-edge case stuff where for various reasons (mainly Rob Newman) it's not been repeated/re-released...I actually can't think of many other programs (though obviously Wikipedia has a list of such items where the originals have been lost/destroyed ). These are not lost, just buried in a legal stalemate.
I can definitely see this area of technology improving, so I can keep the 240p versions I have and maybe give it another go in a couple of years. In the meantime I'll get to my current project of copying as much of Disney+ off onto Plex as I can within a single month subscription :-) (Note, Disney+'s web player is the worst one I've met so far, it has a memory leak issue that causes crashes after 3-4 hours of continuous use)
This was a great read, thank you!
It does definitely give it a photoshop filter look because of the nearest-neighbour sharpening. I think future algorithms will do more to guess the content of image by the definition of other images and then, as you say, it will get much much better!
So I watched the first episode of TMHE last night.
I can say with some confidence that if you were not of a certain age in the 90's, most of it would go right over your head. My memory of it is of the "timeless" sketches (History Today, Jarvis, Ray the man who can only speak sarcastically).
The first episode covered the Channel Tunnel (not yet complete), the build up to the Iraq War, and Blockbuster video store. Top Cat (and officer Dibble) were there, as was Jimmy Saville (who probably isn't allowed to be mentioned on the BBC anymore). It was remarkable how topical to the time it was.
Quick ressurection of this.
Linus Tech Tips put up a video yesterday, basically trying exactly the same as I had done here, using the same software. They got pretty much the same results with low quality video from 240p, with the weird artifacts, excessive sharpness and smoothing. They also got a bit of an explanation around why the results were not as good as could be hoped. It's not rocket science, it's simply trying to infer too much data from a pretty small initial pixel count (with added compression artifacts, that end up being a not-insignificant part of the image).
It's good to know I got about as good a result as could be hoped from the process. Since I tried this Topaz software, they've added another AI model to increase framerate...not really useful here, but I could see apllications for low-fraterate images (security footage, for example).