Bhishan Bhandari: Deep learning for style transfer

The purpose of this writeup is to demystify the style transfer method in this famous paper Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization. It is an old paper but a very foundation for style transfer. Also on how you can train a model using this method with a custom dataset.

Style transfer is an image synthesis problem which consists of an input content image Ix and an input style image Iy with a goal to produce a processed image(Is) that preserves the content of Ix and adds the style of Iy.

A. The network

Following is the network representation used by the method. From the network diagram, we can see that the input is in pairs i.e content image and style image. Both the images are passed through the encoder separately.

B. Encoder

The encoder used is the first few layers of a pretrained VGG-19 network. The representations of the content image and style image in feature spaces is then passed to the novel Adaptive Instance Normalization method.

C. Adaptive Instance Normalization

AdaIN is an extension of the Instance Normalization that simply aligns the channelwise mean and variance of content image to match those of style image. The advantage of adaptive instance normalization in comparison to other types of normalizations such as batch normalization, instance normalization, conditional instance normalization is that it does not have learnable affine parameters thus making it adaptive to arbitrary styles. Now this is awesome. As we can see, it simply scales the normalized content image with the standard deviation of the style image followed by a shifting it with the mean of the style image.

D. Decoder

Since Adaptive Instance Normalization is performed in the feature space, the decoder network is a kind of mirror of the encoder network that takes the feature space back to image space thus the output with content preserved and styled with the style image.

E. Important network decisions

The decoder does not use any forms of normalization since the goal is to produce images with two different contexts(content and style).

Another design choice of the network is that it uses reflection padding. Reflection padding avoids having artifacts and introduction and unwanted content. Also, it preserves spatial continuity as opposed to zero padding.

F. Network Loss

Pre-trained VGG-19 is used to compute the loss to train the decoder. The loss(content and style loss) is weighted with a weight λ.

Content Loss

The content loss is the Euclidean distance between the target features and the features of the output image. The content target is the output of the Adaptive Instance Normalization.

Style Loss

Since only mean and standard deviation of the style image is transferred, thus only those statistics are relevant for loss propagation. Here style features are used as the target. The loss is Euclidean distance between the means and Euclidean distance between the standard deviations.

G. Training with a custom dataset

Training the network with a custom dataset is very easy since all we require is a flat folder image dataset. Two sets of dataset are required i.e. one of content images, another of style images. Following is an implementation of a data loader for a flat folder dataset.

class FlatFolderDataset(data.Dataset):
    def __init__(self, root, transform):
        super(FlatFolderDataset, self).__init__()
        self.root = root
        self.paths = list(Path(self.root).glob('*'))
        self.transform = transform

    def __getitem__(self, index):
        path = self.paths[index]
        img = Image.open(str(path)).convert('RGB')
        img = self.transform(img)
        return img

    def __len__(self):
        return len(self.paths)

    def name(self):
        return 'FlatFolderDataset'

Links

https://openaccess.thecvf.com/content_ICCV_2017/papers/Huang_Arbitrary_Style_Transfer_ICCV_2017_paper.pdf

https://colab.research.google.com/drive/1HiS92cRnBkJQ2dWS55iNMa-5Gx6o60oR?usp=sharing&fbclid=IwAR0dT0RINPSD_koHa7sBE1EJowvz3cRBOTb0CZyV97hSne_ppvEPNmOYNqQ#scrollTo=a4ht3WMBOpGG&line=2&uniqifier=1

Next?

Bhishan Bhandari: Deep learning for style transfer – Understanding baselines

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...