David Minnen

Email | CV | Scholar | Google Research | Photography

	Selected Publications (see my CV for a full list of publications)
	Channel-wise autoregressive entropy models for learned image compression David Minnen and Saurabh Singh Int. Conf. on Image Processing (ICIP) 2020 Better runtime and rate-distortion performance for learned image compression. We improve the entropy model with latent residual prediction and channel-wise conditioning instead of spatial context.
	Nonlinear Transform Coding Johannes Ballé, Philip A. Chou, David Minnen, Saurabh Singh, Nick Johnston, Eirikur Agustsson, Sung Jin Hwang, and George Toderici IEEE Journal of Selected Topics in Signal Processing (STSP 2020) (under review) A review of learned image compression framed as nonlinear transform coding (NTC). This paper analyzes rate-distortion performance using simple sources and natural images, introduces a novel variant of entropy-constrained vector quantization and a method for learning multi-rate models, and analyzes different forms of stochastic optimization techniques for compression models.
	Scale-space flow for end-to-end optimized video compression Eirikur Agustsson, David Minnen, Nick Johnston, Johannes Ballé, Sung Jin Hwang, and George Toderici Computer Vision and Pattern Recognition (CVPR) 2020 Learns "compressible flow" within an end-to-end optimized model for video compression. Optical flow is typically a 2D vector field representing motion. We generalize this to a 3D representation that holds spatial offsets plus a scale-space parameter. Larger scale values lead to more blurring before warping. The model learns to predict a small scall coupled with accurate flow and a large scale when accurate flow is not possible (or is too expensive to code relative to the target bit rate).
	Integer networks for data compression with latent-variable models Johannes Ballé, Nick Johnston, and David Minnen Int. Conf. on Learning Representations (ICLR) 2019 Avoids floating point non-determinism for entropy model parameters predicted by deep networks. Non-determinism typically doesn't matter for deep networks, but it is catastrophic for entry coding.
	Joint autoregressive and hierarchical priors for learned image compression David Minnen, Johannes Ballé, and George Toderici Advances in Neural Information Processing Systems (NeurIPS) 2018 Combines a hyperprior with spatial context to improve entropy modeling for learned image compression. By mixing forward and backward-adaptation, we achieve a new state-of-the-art for rate-distortion performance with neural compression models.
	Image-dependent local entropy models for image compression with deep networks David Minnen, George Toderici, Saurabh Singh, Sung Jin Hwang, and Michele Covell Int. Conf. on Image Processing (ICIP) 2018 We learn a dictionary of entropy models and allow the encoder to select the best distribution for each channel and each spatial tile. If none of the distributions match the local data, the encoder transmits a custom histogram. This spatially local and image-dependent modeling improves rate-distortin peroformance over earlier models and avoids floating point non-determinism that can break entropy models predicted on the fly.
	Variational image compression with a scale hyperprior Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston Int. Conf. on Learning Representations (ICLR) 2018 This model is the first to introduce a hyperprior for end-to-end optimized image compression with deep networks. The model learns a non-linear transform from pixels to a quantized latent space, which is jointly optimized with a hyperprior that predicts the parameters of the entropy model used to code the latents.
	Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks Nick Johnston, Damien Vincent, David Minnen, Michele Covell, Saurabh Singh, Troy Chinen, Sung Jing Hwang, Joel Shor, and George Toderici Computer Vision and Pattern Recognition (CVPR) 2018 Our team's best compression network based on recurrent neural network. I developed the spatially adaptive bit rate (SABR) component that allowed the encoder to adapt the local bit rate to the image content, which improves overall rate-distortion performance.
	Spatially adaptive image compression using a tiled deep network David Minnen, George Toderici, Michele Covell, Troy Chinen, Nick Johnston, Joel Shor, Sung Jin Hwang, Damien Vincent, and Saurabh Singh Int. Conference on Image Processing (ICIP) 2017 Deep neural networks are used for image intra-prediction. Each tile is predicted from neighboring tiles in the causal context, and then the residual is coded separately. By using a progressive model based on recurrent networks, the encoder can spatially adapt the bit rate to improve the overall rate-distortion performance.
[site layout adapted from Jon Barron]