I see the heatmap in this paper Figure 10 and RMT Figure 6
I have some question:
-
many LLMs are casual. attention heatmap is bidirectional attention in RMT's Figure 6. But bidirectional attention generally is symmetric matrix . In the RMT's Figure 6, it show the casual feature.
-
what information you use to show that heatmap ? how to use these information . If these are some code or implementation principle,thank you very much for providing these .
I see the heatmap in this paper Figure 10 and RMT Figure 6
I have some question:
many LLMs are casual. attention heatmap is bidirectional attention in RMT's Figure 6. But bidirectional attention generally is symmetric matrix . In the RMT's Figure 6, it show the casual feature.
what information you use to show that heatmap ? how to use these information . If these are some code or implementation principle,thank you very much for providing these .