Anoop Gupta, Wolf-Dietrich Weber: Cache Invalidation Patterns in Shared-Memory Multiprocessors. IEEE Trans. Computers 41(7): 794-810 (1992). IEEE Xplore link |
Conclusion : Directory based schemes with 3-4 pointers per entry should work well for executing well-designed || programs.
cache line sizes
increase > larger invalidations
> data traffic goes up
> coherence traffic comes down
> overall traffic min when line size = 32.
Classification of data objects
Code and read-only data
Migratory data high proportion of single invalidations
Mostly-read data small invalidations
Freq read/written objects large inv, eg : number of processors waiting in a global queue
sync objects locks and barriers
low contention sync objects distributed locks, easy to implement, optimal for directory based
high contention sync objects
distinguish : large invalidation > write to a line cached in many processors ; frequent invalidation > ...
Effect of Cache line size
large size :
> better hardware efficiency, prefetching, inc in message traffic (increases min communication granularity between processors).
> parallel programs exhibit less spacial locality than sequential programs.
> false sharing significant.
> increase the number of proc sharing a cache line (false sharing) > increase in size of invalidations.
> spatiality depends on classes of objects
> fewer messages of each type (control/data), but size of each data increases.
Of course : best case cache line == size of the object thats shared.