Weak gravitational lensing is one of the most promising cosmological probes of the late universe. Several large ongoing (DES, KiDS, HSC) and planned (LSST, Euclid, WFIRST) astronomical surveys attempt to collect even deeper and larger scale data on weak lensing. Due to gravitational collapse, the distribution of dark matter is non-Gaussian on small scales. However, observations are typically evaluated through the two-point correlation function of galaxy shear, which does not capture non-Gaussian features of the lensing maps. Previous studies attempted to extract non-Gaussian information from weak lensing observations through several higher order statistics such as the three-point correlation function, peak counts, or Minkowski functionals. Deep convolutional neural networks (CNN) emerged in the field of computer vision with tremendous success, and they offer a new and very promising framework to extract information from 2D or 3D astronomical data sets, confirmed by recent studies on weak lensing. We show that a CNN is able to yield significantly stricter constraints of (σ8, Ωm) cosmological parameters than the power spectrum using convergence maps generated by full N-body simulations and ray-tracing, at angular scales and shape noise levels relevant for future observations. In a scenario mimicking LSST or Euclid, the CNN yields 2.4–2.8 times smaller credible contours than the power spectrum, and 3.5–4.2 times smaller at noise levels corresponding to a deep space survey such as WFIRST. We also show that at shape noise levels achievable in future space surveys the CNN yields 1.4–2.1 times smaller contours than peak counts, a higher order statistic capable of extracting non-Gaussian information from weak lensing maps.