Influenza epidemics are monitored using influenza-like illness (ILI) data reported by health-care professionals. Timely detection of the onset of epidemics is often performed by applying a statistical method on weekly ILI incidence estimates with a large range of methods used worldwide. However, performance evaluation and comparison of these algorithms is hindered by: (1) the absence of a gold standard regarding influenza epidemic periods and (2) the absence of consensual evaluation criteria. As of now, performance evaluations metrics are based only on sensitivity, specificity and timeliness of detection, since definitions are not clear for time-repeated measurements such as weekly epidemic detection. We aimed to evaluate several epidemic detection methods by comparing their alerts to a gold standard determined by international expert consensus. We introduced new performance metrics that meet important objective of influenza surveillance in temperate countries: to detect accurately the start of the single epidemic period each year. Evaluations are presented using ILI incidence in France between 1995 and 2011. We found that the two performance metrics defined allowed discrimination between epidemic detection methods. In the context of performance detection evaluation, other metrics used commonly than the standard could better achieve the needs of real-time influenza surveillance.