There is widespread interest in using student achievement data to evaluate the performance of individual schools and teachers for accountability. These evaluation systems can involve simultaneous decisions for thousands of individual educators. Although using standard statistical procedures can control the probability of type I error for each individual, it is the simultaneous error rate of the whole evaluation system across all involved individuals that should be of primary concern to stakeholders. In this paper, we adopt the concept of local false discovery rate (Efron et al., 2001) estimated from a two-component mixture model for p-values. We further note that the inference problem in the educational context has several distinct features: the proportion of true null hypothesis varies greatly among different evaluation purposes and is not necessarily close to 1; the null hypothesis is usually one-sided hypothesis test, which results in distinct shape restrictions in the empirical null distributions of p-values; and the sample sizes are highly unbalanced among tests, but the power of each test is usually moderately large. We discuss a parametric and a nonparametric shape-restricted density estimator based on the local fdr framework to account for these distinct characteristics in the educational applications. Finally, the proposed methods have been applied to evaluate gains in math proficiency rate of public schools in Pennsylvania.