This work shows how the early stages of perturbation growth in a viscosity-stratified flow are different from those in a constant-viscosity flow, and how nonlinearity is a crucial ingredient. We derive the viscosity-varying adjoint Navier–Stokes equations, where gradients in viscosity force both the adjoint momentum and the adjoint scalar. By the technique of direct-adjoint looping, we obtain the nonlinear optimal perturbation which maximises the perturbation kinetic energy of the nonlinear system. While we study three-dimensional plane Poiseuille (channel) flow with the walls at different temperatures, and a temperature-dependent viscosity, our findings are general for any flow with viscosity variations near walls. The Orr and modified lift-up mechanisms are in operation at low and high perturbation amplitudes, respectively, at our subcritical Reynolds number. The nonlinear optimal perturbation contains more energy on the hot (less-viscous) side, with a stronger initial lift-up. However, as the flow evolves, the important dynamics shifts to the cold (more-viscous) side, where wide high-speed streaks of low viscosity grow and persist, and strengthen the inflectional quality of the velocity profile. We provide a physical description of this process and show that the evolution of the linear optimal perturbation misses most of the physics. The Prandtl number does not qualitatively affect the findings at these times. The study of nonlinear optimal perturbations is still in its infancy, and viscosity variations are ubiquitous. We hope that this first work on nonlinear optimal perturbation with viscosity variations will lead to wider studies on transition to turbulence in these flows.