Evaluating AI Agents: Techniques to Reduce Variance and Boost Alignment for LLM Judges