As noted earlier, any JTAG action takes many hundreds of system clock cycles. This is due to the serial nature of the protocol and the JTAG clock typically running ten times slower than the system clock.
There are thus two ways to improve performance:
Minimize the number of JTAG actions used.
Maximize the performance of the underlying cycle accurate model.