BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20181221T160728Z
LOCATION:D175
DTSTART;TZID=America/Chicago:20181112T140000
DTEND;TZID=America/Chicago:20181112T143000
UID:submissions.supercomputing.org_SC18_sess176_ws_llvmf105@linklings.com
SUMMARY:Function/Kernel Vectorization via Loop Vectorizer
DESCRIPTION:Workshop\nProgram Transformation, Programming Systems, Worksho
 p Reg Pass\n\nFunction/Kernel Vectorization via Loop Vectorizer\n\nMasten,
  Tyurin, Mitropoulou, Saito, Garcia\n\nCurrently, there are three vectoriz
 ers in the LLVM trunk: Loop Vectorizer, SLP Vectorizer, and Load-Store Vec
 torizer. There is a need for vectorizing functions/kernels: 1) Function ca
 lls are an integral part of programming real world application code and we
  cannot always rely on fully inlining them. When a function call is made f
 rom a vectorized context such as vectorized loop or vectorized function, i
 f there are no vectorized callees available, the call has to be made to a 
 scalar callee, one vector element at a time. At the programming model leve
 l, OpenMP declare simd is a standardized syntax to address this problem. L
 LVM needs a vectorizer to properly vectorize OpenMP declare simd functions
 . 2) Also, in the GPGPU programming model, such as OpenCL, work-item (thre
 ad) parallelism is not expressed with a loop; it is implicit in the execut
 ion of the kernels. In order to exploit SIMD parallelism at this top-level
  (thread-level), we need to start from vectorizing the kernel.\n\nOne of t
 he obvious ways to vectorize functions/kernels is to add a fourth vectoriz
 er that specifically deals with function vectorization. In this paper, we 
 argue that such a naive approach will lead us to sub-optimal performance a
 nd/or higher maintenance burden. Instead, we present a technique to take a
 dvantages of the current functionalities and future improvements of Loop V
 ectorizer in order to vectorize functions and kernels.
URL:https://sc18.supercomputing.org/presentation/?id=ws_llvmf105&sess=sess
 176
END:VEVENT
END:VCALENDAR