In this reading group meeting, Dr. Jonathan Dodge covers a very interesting paper on the presence of “super weights” in LLMs. These weights are characterized by having high magnitude and high activations, both to the point of being outliers. The paper is called The Super Weight in Large Language Models, by Yu et al. Unfortunately, reviewers rejected this version of this paper, however, it is still quite interesting and I strongly suggest checking out the directory of where these weights lie in various open source models, since that information will let you play with these concepts.
Presentation Link: https://psu.mediaspace.kaltura.com/media/Group+Meeting/1_und8nbc7