Speakers
Full program can be viewed here.
Day 1 - Monday, February 3rd
The ability to capture bursts of images in quick succession with varying camera settings, and to process them quickly on device, has revolutionized photography on cell phones, as well as significantly disrupting the entire camera industry. In this talk I'll describe 2 such techniques: capturing bursts of underexposed frames to achieve high dynamic range imaging ("HDR+"), and using cameras with dual pixels, as well as multiple cameras, to compute synthetic shallow depth-of-field images ("Portrait mode"). My colleagues Yael Pritch and Peyman Milanfar will talk about multi-frame super-resolution ("Super Res Zoom") and mobile photography in very low light ("Night Sight"). Increasingly, these techniques incorporate machine learning - for auto exposure (AE), auto white balancing (AWB), face detection (FD), segmentation of people, and other uses.
Yael Pritch, GoogleHandheld Mobile Photography in Very Low Light
Taking photographs in low light using a mobile phone is challenging and rarely produces pleasing results. Aside from the physical limits imposed by read noise and photon shot noise, these cameras are typically handheld, have small apertures and sensors, use mass-produced analog electronics that cannot easily be cooled, and are commonly used to photograph subjects that move, like children and pets. I will describe a system for capturing clean, sharp, colorful photographs in light as low as 0.3 lux, where human vision becomes monochromatic and indistinct.
Peyman Milanfar, GoogleSuper-resolution: A brief history and recent progress on mobile
The first camera phone was sold in 2000, when taking pictures with your phone was an oddity, and sharing pictures online was unheard-of. Today, barely twenty years later, we can run multi-frame (and single-frame) super resolution on device. How did this come about? I'll give a bit of the history of the field, and describe our latest progress with some of the key elements of this technology.
Michael Brown, York UniversityRobust Color Imaging and a Dual-Purpose Camera ISP
Current camera ISPs use a photo-centric design that renders sensor-RGB values to produce pleasing photos. When mistakes occur in the in-camera color rendering, e.g., due to parameter estimation errors or incorrect manual settings, it can be challenging to correct strong color casts in the rendered photo. In this talk, strategies that apply simple ISP modifications to allow robust color correction are described. Such ISP modifications are also useful for applications that desire to use the camera not as a photo-centric device, but for tasks targeting scientific applications. To this end, a dual-purpose ISP is proposed that provides both photographic and scientific imaging capabilities.
Paolo Favaro, University of BernBlind Deconvolution: A Journey from Model-Based to Deep Learning Methods
Blind deconvolution has enjoyed quite a remarkable progress in the last few decades thanks to developments in optimization and machine learning. To a large extent today several algorithms allow to recover a sharp image from a blurry one without additional knowledge about the blur. This is a remarkable achievement given the extreme ill-posedness of this mathematical problem. Very interesting steps forward have been made in the last decade, when fundamental inconsistencies in the formulation, such as priors favoring the blurry solution, were exposed. This has led to the study of novel formulations that favor sharp over blurry images and result in state of the art performance with robustness to noise in real images. More recently, developments in deep learning have led to a fundamentally different approach to this problem, where enough data can adequately represent a realistic blur model and allow a neural network to learn how to remove blur from images. Approaches in deep learning have led to surprising results, where rather complex blur artifacts are removed effectively and efficiently. We give an account of the latest developments and show their strengths and weaknesses.
Felix Heide, Princeton UniversityDesigning Cameras to Detect the “Invisible”: Towards Domain-Specific Computational Imaging
Imaging has become an essential part of how we communicate with each other, how autonomous agents sense the world and act independently, and how we research chemical reactions and biological processes. Today's imaging and computer vision systems, however, often fail for the ``edge cases'', for example in low light, fog, snow, or highly dynamic scenes. These edge cases are a result of ambiguity present in the scene or signal itself, and ambiguity introduced by imperfect capture systems. In this talk, I will present several examples of computational imaging methods that resolve this ambiguity by jointly designing sensing and computation for domain-specific applications. Instead of relying on intermediate image representations, which are often optimized for human viewing, these cameras are designed end-to-end for a domain-specific task. In particular, I will show how to co-design automotive HDR ISPs, detection and tracking (beating Tesla's latest OTA Model S Autopilot), how to optimize thin freeform lenses for wide field of view applications, and how to extract accurate dense depth from three gated images (beating scanning lidar, such as Velodyne's HDL64). Finally, I will present computational imaging systems that extract domain-specific information from faint measurement noise using domain-specific priors, allowing us to use conventional intensity cameras or conventional Doppler radar to image ``hidden'' objects outside the direct line of sight at ranges of more than 20m.
Vivek Goyal, Boston UniversityDon't Fear the Dead(time): High Dynamic-Range Imaging with Single-Photon Detectors
Single-photon lidar (SPL) systems form range and reflectivity images using detectors with single-photon sensitivity. Ideally, an SPL system can maintain accuracy over a large range of signal strengths. In automotive SPL, for example, a nearby retroreflective road sign can provide many orders of magnitude greater incident flux than a distant, diffusely reflective object. This talk will discuss data processing to handle both extremes, with mitigation of dead time effects central to eliminating bias in the high-flux case.
We describe a computational microscope that encodes 3D information into a single 2D sensor measurement, then exploits sparsity or low-rank priors to reconstruct the volume with diffraction-limited resolution across a large volume. Our system uses simple hardware and scalable software for easy reproducibility and adoption. The inverse algorithm is based on large-scale nonlinear optimization combined with unrolled neural networks, in order to leverage the known physical model of the setup, while learning unknown parameters. As an example of end-to-end design, we optimize the encoding mask for a given task-based imaging application and demonstrate whole organism bioimaging and neural activity tracking in vivo.
Jon Krause, GoogleDeep Learning for Medical Imaging
Deep Learning models can be used to diagnose melanoma, breast cancer lymph node metastases and diabetic retinopathy from medial images with comparable accuracy to human experts. This talk covers work in applying deep learning to imaging for diabetic retinopathy, cancer screening & diagnosis, including recent work in using different reference standards and techniques to improve explainability. It will also cover how deep learning can be leveraged to make novel predictions such as cardiovascular risk factors and disease progression.
Yoav Shechtman, Technion Dense, volumetric and multicolor localization microscopy via Deep Learning
In localization microscopy, the positions of individual nanoscale point emitters (e.g. fluorescent molecules) are determined at high precision from their point-spread functions (PSFs). This enables highly precise single/multiple-particle-tracking, as well as super-resolution microscopy, namely single molecule localization microscopy (SMLM). In this talk I will describe how deep learning enables unprecedented capabilities in super-resolution localization microscopy; specific applications include dense emitter fitting, multicolor imaging from grayscale data, and volumetric multi-particle tracking/imaging. Notably, our use of neural nets is not limited to image processing; we use nets to design the optimal optical acquisition system, in a task-specific manner.
Jeff Fessler, University of MichiganMedical image reconstruction using data-driven methods
This talk will focus on contemporary data-driven signal models and their use as regularizers for solving medical inverse problems. Applications illustrated will include MRI and CT image reconstruction. Joint work with Sai Ravishankar, Il Yong Chung, and Raj Nadakuditi, among others.
Anat Levin, TechnionRendering speckle statistics in scattering media and its applications in computational imaging
We present a Monte Carlo rendering framework for the physically-accurate simulation of speckle patterns arising from volumetric scattering of coherent waves. These noise-like patterns are characterized by strong statistical properties, such as the so-called memory effect. These properties are at the core of imaging techniques for applications as diverse as tissue imaging,motion tracking, and non-line-of-sight imaging. Our rendering framework can replicate these properties computationally, in a way that is orders of magnitude more efficient than alternatives based on directly solving the wave equations. At the core of our framework is a path-space formulation for the covariance of speckle patterns arising from a scattering volume, which we derive from first principles. We use this formulation to develop two Monte Carlo rendering algorithms, for computing speckle covariance as well as directly speckle fields. While approaches based on wave equation solvers require knowing the microscopic position of wavelength-sized scatterers, our approach takes as input only bulk parameters describing the statistical distribution of these scatterers inside a volume. We validate the accuracy of our framework by comparing against speckle patterns simulated using wave equation solvers, use it to simulate memory effect observations that were previously only possible through lab measurements, and demonstrate its applicability for computational imaging task. In particular, we show an order of magnitude extension of the angular range at which one can use speckle correlations to see through a scattering volume.
Guillermo Sapiro, Duke UniversityComputational Behavioral Phenotyping
In this talk I will present a new angle in computational imaging, and present the needs, challenges, and societal contributions of developing computational imaging tools for automatic behavioral phenotyping. The technical contributions will be complemented with examples from the largest ever study of this kind in the field of developmental disorders, with tools deployed in numerous hospitals and hundreds to thousands of subjects participating or enrolled.
Time-of-flight imaging and LIDAR systems enable 3D scene acquisition at long range using active illumination. This is useful for autonomous driving, robotic vision, human-computer interaction and many other applications. The technological requirements on these imaging systems are extreme: individual photon events need to be recorded and time-stamped at a picosecond timescale, which is facilitated by emerging single-photon detectors. In this talk, we discuss a new class of computational cameras based on single-photon detectors. These enable efficient ways for non-line-of-sight imaging (i.e., looking around corners) and efficient depth sensing as well as other unprecedented imaging modalities.
Ren Ng, University of California - BerkeleyOz Vision - A New Principle for Color Display
This talk will introduce the Oz Vision project, an early-stage project that seeks to build a new type of color display based on scanning a laser over the human retina, to create perceptions of new colors impossible to see in the real world, to treat color blindness, and to enable a person to perceive higher dimensional color, e.g. 5D with IR and UV over RGB.
Steve Seitz, GoogleSlow Glass
Wouldn’t it be fascinating to be in the same room as Abraham Lincoln, visit Thomas Edison in his laboratory, or step onto the streets of New York a hundred years ago? We explore this thought experiment, by tracing ideas from science fiction through newly available data sources that may facilitate this goal.
Andrew Owens, University of MichiganLearning Photo Forensics
Today's image forensics methods are seemingly plagued with a generalization problem: if we train them to detect today's fake images, will they detect tomorrow's as well? In this talk, I'll discuss two of our efforts to address this issue. First, I'll ask whether it is possible to create a "universal" detector for telling apart real images from these generated by a CNN, regardless of architecture or dataset used. I'll show that with careful training, a standard image classifier trained on only one specific CNN generator is able to generalize surprisingly well to unseen architectures, datasets, and training methods -- a finding that suggests the intriguing possibility that today’s CNN-generated images share some common systematic flaws, preventing them from achieving realistic image synthesis. Second, I'll present a forensics method based on anomaly detection. This method uses the automatically recorded photo EXIF metadata as supervisory signal for training a model to determine whether an image is self-consistent -- that is, whether its content could have been produced by a single imaging pipeline. The model successfully learns to image splices, despite being trained entirely on real images.
Day 2 - Tuesday, February 4th
How do we choose a network architecture in deep-learning solutions? By copying existing networks or guessing new ones, and sometimes by applying various small modifications to them via trial and error. This non-elegant and brute-force strategy has proven itself useful for a wide variety of imaging tasks. However, it comes with a painful cost – our networks tend to be quite heavy and cumbersome. Could we do better? In my talk I would like to propose a different point of view towards this important question, by advocating the following two rules: (i) Rather than ""guessing"" architectures, we should rely on classic signal and image processing concepts and algorithms, and turn these to networks to be learned in a supervised manner. More specifically, (ii) Sparse representation modeling is key in many (if not all) of the successful architectures that we are using. I will demonstrate these claims by presenting three recent image denoising networks that are light-weight and yet quite effective, as they follow the above guidelines.
Michal Irani, Weizmann Institute “Deep Internal learning” -- Deep Leaning with Zero Examples
The strong recurrence of information inside a single natural image/video provides powerful internal examples, which suffice for self-supervision of Deep-Networks, without any prior examples or training data. This new paradigm gives rise to true “Zero-Shot Learning”. This approach has been successfully applied to a variety of computer vision problems, including blind super-resolution, blind image-dehazing, image-segmentation, transparent layer separation, image-retargeting, temporal super-resolution of video data, and more. In some of these problems, Internal-Learning also yields state-of-the-art results. I will show the power of this approach to a variety of computer vision problems, as time permits.
David Martin, GoogleComputational Photography in Next Generation Street View Camera Systems
Google Street View has historically operated in the mode of literally scraping the physical world, hoovering up massive quantities of raw data, including pixels, around the globe, for post-processing. However, product needs require a new approach. Street View is now asked to collect data in more and more challenging environments---in particular the pedestrian-accessible world. These environments are challenging for cameras, with artificial lighting and low light levels, and we must collect while moving to make collection efficient and economical. Mechanical and optical solutions only reach so far. By sensing and processing an order of magnitude more data than we can afford to transmit back to Google, we explore custom camera system designs that incorporate computational photography in order to enhance the capabilities of our cameras in the axes of dynamic range, sensitivity, resolution, and depth of field.
Bill Freeman, GoogleFeathers, Wings, and Future Directions in Vision
Borrowing just some aspects of primate and human vision systems (convolutional processing, layered structure) has led to a revolution in computer vision and image processing. Perhaps there are more aspects of human vision we should borrow? Doing so requires distinguishing which aspects of human vision are crucial for vision and which are not--distinguishing between wings and feathers, in the metaphor of flight. I'll lead a discussion about which aspects of human vision are "feathers" and which are "wings".
Ce Liu, GoogleDeep Image Imagination: Image Uncrop and 3D Ken Burn
We will present the latest techniques that we have been developing for expanding images in x, y and z axes. For the x and y plane, we will dive into image uncrop using GAN and demonstrate how images can be seamlessly expanded across the image border. For the z axis, we will show how can we utilize the inferred depth information to create compelling parallax effect to bring still images to life.
Antonio Torralba, MITDissecting neural nets
With the success of deep neural networks and access to image databases with millions of labeled examples, the state of the art in computer vision is advancing rapidly. Even when no examples are available, Generative Adversarial Networks (GANs) have demonstrated a remarkable ability to learn from images and are able to create nearly photorealistic images. The performance achieved by convNets and GANs is remarkable and constitute the state of the art on many tasks. But why do convNets work so well? what is the nature of the internal representation learned by a convNet in a classification task? How does a GAN represent our visual world internally? In this talk I will show that the internal representation in both convNets and GANs can be interpretable.
Photo Forensics From Rounding Artifacts
Many aspects of JPEG compression have been successfully used in the domain of photo forensics. I will describe a JPEG artifact that can arise depending upon seemingly innocuous implementation details in a JPEG encoder. I describe the nature of these artifacts and show how a generic JPEG encoder can be configured to explain a range of these artifacts found in commercial cameras. I will also describe an algorithm to simultaneously estimate the nature of these artifacts and localize inconsistencies that can arise from a range of image manipulations.
Feng Yang, GoogleDistortion Agnostic Deep Watermarking
Watermarking is the process of embedding information into an image that can survive under distortions, while requiring the encoded image to have little or no perceptual difference from the original image. Recently, deep learning based methods achieved impressive results in both visual quality and message payload under a wide variety of image distortions. However, these methods all require differentiable models for the image distortions at training time, and may generalize poorly to unknown distortions. This is undesirable since the types of distortions applied to watermarked images are usually unknown and non-differentiable. In this paper, we propose a new framework for distortion agnostic watermarking, where the image distortion is not explicitly modeled during training. Instead, the robustness of our system comes from two sources: adversarial training and channel coding. Compared to training on a fixed set of distortions and noise levels, our method achieves comparable or better results on distortions available during training, and better performance on unknown distortions.
Hossein Talebi, GoogleBetter Compression with Deep Pre-Editing
Could we compress images via standard methods while avoiding artifacts? The answer is obvious -- this is doable as long as the bit budget is generous enough. What if the allocated bit-rate for compression is insufficient? Then unfortunately, artifacts are a fact of life. Many attempts were made over the years to fight this phenomenon, with various degrees of success. In this work we aim to break the unholy connection between bit-rate and image quality, and propose a way to circumvent compression artifacts by pre-editing the incoming image and modifying its content to fit the given bits. We design this editing operation as a learned convolutional neural network, and formulate an optimization problem for its training. Our loss takes into account a proximity between the original image and the edited one, a bit-budget penalty over the proposed image, and a no-reference image quality measure for forcing the outcome to be visually pleasing. The proposed approach is demonstrated on the popular JPEG compression, showing savings in bits and/or improvements in visual quality, obtained with intricate editing effects.
This talk will present the methods and procedures used to produce the first image of a black hole from the Event Horizon Telescope, as well as future developments. It had been theorized for decades that a black hole would leave a "shadow" on a background of hot gas. Taking a picture of this black hole shadow would help to address a number of important scientific questions, both on the nature of black holes and the validity of general relativity. Unfortunately, due to its small size, traditional imaging approaches require an Earth-sized radio telescope. In this talk, I discuss techniques the Event Horizon Telescope Collaboration has developed to photograph a black hole using the Event Horizon Telescope, a network of telescopes scattered across the globe. Imaging a black hole’s structure with this computational telescope required us to reconstruct images from sparse measurements, heavily corrupted by atmospheric error. The talk will also discuss how we are developing machine learning methods to help design future telescope arrays.
Xiang Zhu, GoogleML based dehazing and super-resolution for remote sensing images
To remove atmospheric effects and recover fine detail from satellite images, we developed two ML-based algorithms: one for dehazing, and one for super-resolution. Haze effect on satellite images is spatially varying. We developed a MLP network to estimate local haze thickness from selected multi-scale features. Although training images are synthesized, the algorithm works very well on real hazy images with various ground content. To implement super-resolution we proposed a ResNet based end-to-end network, which contains a Pseudo-Huber constraint and a perceptual loss in its training process to reduce over-smoothness. This network shows a strong de-aliasing ability as well as resolution enhancement. So far it has been successfully applied to several satellite image products.
Yoav Schechner, TechnionScattering as Tomography Key: from Medical Imaging to Spaceborne Cloud Sensing
The power of modern computing leads us to revisit major problems of scientific imaging. This leads to new forms of computed tomography (CT). The talks shows the significance of this approach through upcoming spaceborne missions for atmospheric science (mainly CloudCT) and a novel form of medical Xray CT.
Charlie Bouman, Purdue UniversityIntegrating AI and Physics Models in Scientific Imaging
The emerging methods of AI promise to bring the next wave of change and innovation to every corner of society. But how will it change the endeavors of scientific imaging and sensing in particular? This talk explores a number of important recent directions in the integration of AI with imaging problems, and also speculates on some directions these innovations might take in the future. First, we introduce methods for integrating AI, sensor, and physics models using multi-agent consensus equilibrium (MACE) based on popular plug-and-play (PnP) methods. MACE provides a flexible framework for integrating a wide variety of models formulated as agents to solve difficult inverse problems. We also present examples of how MACE can be used to supplement or replace traditional physics-based models with AI/ML methods to enable the solution of difficult nonlinear inverse problems. Throughout the talk, we present state-of-the-art examples using imaging modalities including computed tomography (CT), transmission electron microscopy (STEM), synchrotron beam imaging, optical sensing, scanning electron microscopy (SEM), and ultrasound imaging. In each of these examples, we show how key advantages result from the integration of sensor, data, and physics models using emerging ML methods.
Eric Miller, Tufts UniversityWasserstein Regularized Sparse Coding for Space- and Time-Varying Materials Characterization
High energy monochromatic X-ray diffraction data collected in situ during thermo-mechanical loading experiments permits probing of the crystalline microstructure of a sample potentially as a function of both space and time. An elastoplastic deformation is associated with the development of heterogeneity in crystal orientation and lattice spacing—each manifesting as azimuthal broadening and radial broadening of diffraction peaks respectively. Quantifying the spreading effect is challenging, especially in cases where the sample has a granularity between that of a single crystal and fine grain or powder material. The approach developed in this talk begins by modeling the intensity signal in the vicinity of a Debye-Scherrer ring as a nonnegative superposition of Gaussian basis functions. Convolutional sparse coding (CSC) methods are employed to obtain a parsimonious model of the data as a function of time and space. The parameters of this model are used to define a feature quantifying the radial and azimuthal development of the data which effectively captures the internal state of the sample. To encourage an expected degree of smoothness in the CSC solutions at each space-time point where data are acquired, we propose an optimal transport regularizer constructed by interpreting the parameters of the basis functions selected by the CSC process as probability distributions. We discuss numerical methods for solving the resulting very large-scale inverse problem. We present results for a temporal and a spatial case. For the temporal study, an X-ray diffraction time series data were captured using the high-speed mixed mode pixel array detector (MM-PAD) 1 ; the time resolution permits observation of bursts of dislocation movement in a tensile Ti-7Al sample which we quantify using the AWMV. For the spatial study, a series of X-ray data were collected on two Dexela CMOS area detectors; the spatial map is designed to capture the relative magnitude of plastic deformation ahead of a fatigue crack in 316L stainless steel which we also quantify using the AWMV metric.
Beck Kamilov, Washington University (St. Louis)Online Regularization by Denoising with Applications to Intensity Diffraction
Regularization by denoising (RED) is a powerful framework for solving imaging inverse problems. Most RED algorithms are iterative batch procedures, which limits their applicability to very large datasets. In this paper, we address this limitation by introducing a novel online RED (On-RED) algorithm, which processes a small subset of the data at a time. We establish the theoretical convergence of On-RED in convex settings and empirically discuss its effectiveness in non-convex ones by illustrating its applicability to intensity diffraction tomography. Our results suggest that On-RED is an effective alternative to the traditional RED algorithms when dealing with large or streaming datasets.