Skip to content

Conversation

@vchuravy
Copy link
Member

@vchuravy vchuravy commented Mar 9, 2018

Viral and Jeff cornered me and asked me how hard it would be to switch to 6.0.0.

This is intended as a smoke-screen test to see what CI and more importantly Nanosoldier says.

  • Forward port LLVM patches:
    • ~ llvm-5.0.0_threads ~ (might no longer be necessary)
    • llvm-3.9.0_D27296-libssp (upstream PR went stale so we should test without first.)
    • llvm-5.0-NVPTX-addrspaces
  • FreeBSD
  • Add D42260
  • [LLVM6] Assertion triggered in FastISel #26416
  • ARM testing
    • AArch64
    • armv7
  • PPC testing
  • GPU testing
  • Windows link issue
  • Nanosoldier

6.0.0

Base  ────────────────── 76.464808 seconds                 
Base64  ────────────────  3.127560 seconds                 
CRC32c  ────────────────  0.051004 seconds                 
SHA  ───────────────────  0.365380 seconds                 
FileWatching  ──────────  0.268810 seconds                 
Unicode  ───────────────  0.025183 seconds                 
Mmap  ──────────────────  0.183732 seconds                 
Serialization  ─────────  1.652668 seconds                 
Libdl  ─────────────────  0.160410 seconds                 
Markdown  ──────────────  3.160268 seconds                 
LibGit2  ─────────────── 14.052500 seconds                 
Logging  ───────────────  3.085846 seconds                 
Sockets  ───────────────  3.448210 seconds                 
Printf  ────────────────  1.850275 seconds                 
Profile  ───────────────  0.693780 seconds                 
Dates  ─────────────────  5.775388 seconds                 
DelimitedFiles  ────────  3.969369 seconds                 
Random  ──────────────── 10.706404 seconds                 
UUIDs  ─────────────────  1.889006 seconds                 
Future  ────────────────  0.022085 seconds                 
Pkg  ───────────────────  8.691972 seconds                 
LinearAlgebra  ───────── 39.905994 seconds                 
IterativeEigensolvers  ─  3.755822 seconds                 
SparseArrays  ────────── 14.649278 seconds                 
SuiteSparse  ───────────  3.585028 seconds                 
SharedArrays  ────────── 12.941465 seconds                 
Distributed  ───────────  1.774925 seconds                 
Test  ──────────────────  6.042602 seconds                 
REPL  ────────────────── 11.350251 seconds                 
Pkg3  ────────────────── 18.842404 seconds                 
Stdlibs total  ─────────177.743514 seconds                 
WARNING: SparseArrays.blkdiag is deprecated, use blockdiag instead.                                                   
  likely near sysimg.jl:715                                
/home/vchuravy/julia/base/precompile.jl                    
Sysimage built. Summary:                                   
Total ─────── 279.479692 seconds                           
Base: ───────  76.464808 seconds 27.3597%                  
Stdlibs: ──── 177.743514 seconds 63.598%                   
Precompile: ─  25.267449 seconds 9.04089%                  

time -v:

User time (seconds): 531.63
        System time (seconds): 5.29
        Percent of CPU this job got: 100%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 8:55.20
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2292672
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 2233141
        Voluntary context switches: 42
        Involuntary context switches: 594
        Swaps: 0
        File system inputs: 0
        File system outputs: 340656
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

3.9.1

Base  ────────────────── 72.322026 seconds
Base64  ────────────────  2.729186 seconds
CRC32c  ────────────────  0.034614 seconds
SHA  ───────────────────  0.363587 seconds
FileWatching  ──────────  0.267717 seconds
Unicode  ───────────────  0.025211 seconds
Mmap  ──────────────────  0.180100 seconds
Serialization  ─────────  1.660259 seconds
Libdl  ─────────────────  0.148583 seconds
Markdown  ──────────────  2.894347 seconds
LibGit2  ─────────────── 12.889084 seconds
Logging  ───────────────  2.859578 seconds
Sockets  ───────────────  3.200110 seconds
Printf  ────────────────  1.714964 seconds
Profile  ───────────────  0.658392 seconds
Dates  ─────────────────  5.509282 seconds
DelimitedFiles  ────────  3.658211 seconds
Random  ──────────────── 10.228097 seconds
UUIDs  ─────────────────  1.660729 seconds
Future  ────────────────  0.022349 seconds
Pkg  ───────────────────  8.261382 seconds
LinearAlgebra  ───────── 39.721122 seconds
IterativeEigensolvers  ─  3.532457 seconds
SparseArrays  ────────── 14.157760 seconds
SuiteSparse  ───────────  3.436138 seconds
SharedArrays  ────────── 12.632652 seconds
Distributed  ───────────  1.634865 seconds
Test  ──────────────────  5.761900 seconds
REPL  ────────────────── 10.515702 seconds
Pkg3  ────────────────── 17.505437 seconds
Stdlibs total  ─────────169.579216 seconds
/home/vchuravy/julia/base/precompile.jl
Sysimage built. Summary:
Total ─────── 267.523895 seconds 
Base: ───────  72.322026 seconds 27.0339%
Stdlibs: ──── 169.579216 seconds 63.3884%
Precompile: ─  25.618957 seconds 9.57632%
time -v:
        User time (seconds): 528.65
        System time (seconds): 5.23
        Percent of CPU this job got: 100%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 8:52.16
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2225288
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 2340496
        Voluntary context switches: 42
        Involuntary context switches: 567
        Swaps: 0
        File system inputs: 0
        File system outputs: 331216
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Edit: GPU testing to the list above by @ViralBShah

@ViralBShah
Copy link
Member

Looks like about 5% slower, which is quite decent, and perhaps even recoverable quickly - should we decide to do something like this.

@JeffBezanson
Copy link
Member

👍 I think we should try to upgrade LLVM before 1.0, so that the 0.7-1.0 timeframe is the least stable period, and we can steadily increase reliability and performance over the 1.0.x and 1.x series.

@ViralBShah
Copy link
Member

@vchuravy Just wondering if you have a comparison of the total time for running tests.

@ararslan ararslan added the external dependencies Involves LLVM, OpenBLAS, or other linked libraries label Mar 9, 2018
deps/llvm.mk Outdated
$(eval $(call LLVM_PATCH,llvm-PPC-addrspaces)) # PPC
$(eval $(call LLVM_PATCH,llvm-PR36292-5.0)) # PPC fixes #26249, remove for 6.0
else ifeq ($(LLVM_VER_SHORT),6.0)
#$(eval $(call LLVM_PATCH,llvm-5.0.0_threads))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just fyi the comment on this one "Cygwin and openSUSE still use win32-threads mingw" hasn't been true for over a year, ref https://build.opensuse.org/request/show/444383 and https://sourceware.org/ml/cygwin/2016-11/msg00113.html. should be safe to drop this patch nowadays.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheers! That is good news.

@ViralBShah
Copy link
Member

@maleadt Is LLVM 6.0 good from the GPU perspective? Just pinging you here in case you haven't seen it.

@maleadt
Copy link
Member

maleadt commented Mar 10, 2018

Yeah, I was waiting for the AS patch before testing this (if Valentin doesn't beat me to it, I'll do it next week).

@vchuravy
Copy link
Member Author

already ported just not pushed yet, vchuravy@2c19817

@vchuravy
Copy link
Member Author

On Nanosoldier4 (with -j8):

6.0.0

    SUCCESS
        Command being timed: "make -j8 testall"
        User time (seconds): 6719.85
        System time (seconds): 96.32
        Percent of CPU this job got: 573%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 19:49.26
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2258056
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 21
        Minor (reclaiming a frame) page faults: 12547419
        Voluntary context switches: 48900
        Involuntary context switches: 55660107
        Swaps: 0
        File system inputs: 768
        File system outputs: 1013232
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

3.9.1

        Command being timed: "make -j8 testall"
        User time (seconds): 6809.86
        System time (seconds): 89.12
        Percent of CPU this job got: 575%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 19:58.05
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2247944
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 4
        Minor (reclaiming a frame) page faults: 11719777
        Voluntary context switches: 48016
        Involuntary context switches: 57011270
        Swaps: 0
        File system inputs: 456
        File system outputs: 997504
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

Builds

@vchuravy vchuravy changed the base branch from vc/llvm6fixes to master March 10, 2018 20:17
@vchuravy
Copy link
Member Author

@nanosoldier runbenchmarks(ALL, vs=":master")

@KristofferC
Copy link
Member

KristofferC commented Mar 10, 2018

It seems nanosoldier lately only listens to its true master @ararslan

@ararslan
Copy link
Member

@nanosoldier runbenchmarks(ALL, vs=":master")

@ararslan
Copy link
Member

He still ignores me too, like a toddler acting out. @nanosoldier runbenchmarks(ALL, vs=":master")

@ararslan
Copy link
Member

Could be confused by the branch target changing, as it seems both Travis and AppVeyor are. I'd restart the server but there's another build running.

@vchuravy vchuravy closed this Mar 10, 2018
@vchuravy vchuravy reopened this Mar 10, 2018
@vchuravy
Copy link
Member Author

@nanosoldier runbenchmarks(ALL, vs=":master")

@vchuravy
Copy link
Member Author

CircleCI is encountering a Codegen issue.

julia-debug: /home/circleci/project/deps/srccache/llvm-6.0.0/lib/Target/X86/X86FastISel.cpp:1793: bool {anonymous}::X86FastISel::X86SelectShift(const llvm::Instruction*): Assertion `!I->getType()->isIntegerTy(8) && "i8 shifts should be handled by autogenerated table"' failed.

signal (6): Aborted
in expression starting at mpfr.jl:927
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7fed84874265)
__assert_fail at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_ZN12_GLOBAL__N_111X86FastISel21fastSelectInstructionEPKN4llvm11InstructionE at /home/circleci/project/usr/bin/../lib/libLLVM-6.0.so (unknown line)
_ZN4llvm8FastISel17selectInstructionEPKNS_11InstructionE at /home/circleci/project/usr/bin/../lib/libLLVM-6.0.so (unknown line)
_ZN4llvm16SelectionDAGISel20SelectAllBasicBlocksERKNS_8FunctionE at /home/circleci/project/usr/bin/../lib/libLLVM-6.0.so (unknown line)
_ZN4llvm16SelectionDAGISel20runOnMachineFunctionERNS_15MachineFunctionE.part.1097 at /home/circleci/project/usr/bin/../lib/libLLVM-6.0.so (unknown line)
_ZN12_GLOBAL__N_115X86DAGToDAGISel20runOnMachineFunctionERN4llvm15MachineFunctionE at /home/circleci/project/usr/bin/../lib/libLLVM-6.0.so (unknown line)
_ZN4llvm19MachineFunctionPass13runOnFunctionERNS_8FunctionE at /home/circleci/project/usr/bin/../lib/libLLVM-6.0.so (unknown line)
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /home/circleci/project/usr/bin/../lib/libLLVM-6.0.so (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /home/circleci/project/usr/bin/../lib/libLLVM-6.0.so (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /home/circleci/project/usr/bin/../lib/libLLVM-6.0.so (unknown line)
operator() at /home/circleci/project/src/jitlayers.cpp:466
addModule at /home/circleci/project/usr/include/llvm/ExecutionEngine/Orc/IRCompileLayer.h:57
addModule at /home/circleci/project/src/jitlayers.cpp:593
jl_add_to_ee at /home/circleci/project/src/jitlayers.cpp:831
jl_finalize_function at /home/circleci/project/src/jitlayers.cpp:839
getAddressForFunction at /home/circleci/project/src/codegen.cpp:1283
jl_generate_fptr at /home/circleci/project/src/codegen.cpp:1386
jl_compile_method_internal at /home/circleci/project/src/julia_internal.h:370
jl_call_method_internal at /home/circleci/project/src/julia_internal.h:398
jl_apply_generic at /home/circleci/project/src/gf.c:2098
jl_apply at /home/circleci/project/src/julia.h:1528
jl_invoke at /home/circleci/project/src/gf.c:55

Win32 is failing during building LLVM

/usr/bin/ar: ../libLLVMSupport.a: Error reading CMakeFiles/LLVMSupport.dir/Watchdog.cpp.o: File truncated
make[8]: *** [lib/Support/CMakeFiles/LLVMSupport.dir/build.make:2878: lib/libLLVMSupport.a] Error 1
make[8]: *** Deleting file 'lib/libLLVMSupport.a'
make[7]: *** [CMakeFiles/Makefile2:619: lib/Support/CMakeFiles/LLVMSupport.dir/all] Error 2
make[6]: *** [CMakeFiles/Makefile2:10680: tools/llvm-config/CMakeFiles/llvm-config.dir/rule] Error 2
make[5]: *** [Makefile:3356: llvm-config] Error 2

Wind64 is having a link time issue:

/usr/bin/x86_64-w64-mingw32-g++.exe  -m64    -D__USING_SJLJ_EXCEPTIONS__ -D__CRT__NO_INLINE -Wa,-mbig-obj -Werror=date-time -std=gnu++11 -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wno-missing-field-initializers -pedantic -Wno-long-long -Wno-maybe-uninitialized -Wdelete-non-virtual-dtor -Wno-comment  -O2 -DNDEBUG    -shared -o ../../bin/LLVM.dll -Wl,--out-implib,../../lib/libLLVM.dll.a -Wl,--major-image-version,0,--minor-image-version,0 -Wl,--whole-archive CMakeFiles/LLVM.dir/objects.a -Wl,--no-whole-archive @CMakeFiles/LLVM.dir/linklibs.rsp
../../lib/libLLVMNVPTXCodeGen.a(NVPTXInstrInfo.cpp.obj):NVPTXInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo22insertOutlinerPrologueERNS_17MachineBasicBlockERNS_15MachineFunctionERKNS0_19MachineOutlinerInfoE+0x0): multiple definition of `llvm::TargetInstrInfo::insertOutlinerPrologue(llvm::MachineBasicBlock&, llvm::MachineFunction&, llvm::TargetInstrInfo::MachineOutlinerInfo const&) const'
../../lib/libLLVMCodeGen.a(TargetInstrInfo.cpp.obj):TargetInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo22insertOutlinerPrologueERNS_17MachineBasicBlockERNS_15MachineFunctionERKNS0_19MachineOutlinerInfoE+0x0): first defined here
../../lib/libLLVMNVPTXCodeGen.a(NVPTXInstrInfo.cpp.obj):NVPTXInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo18insertOutlinedCallERNS_6ModuleERNS_17MachineBasicBlockERNS_26MachineInstrBundleIteratorINS_12MachineInstrELb0EEERNS_15MachineFunctionERKNS0_19MachineOutlinerInfoE+0x0): multiple definition of `llvm::TargetInstrInfo::insertOutlinedCall(llvm::Module&, llvm::MachineBasicBlock&, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>&, llvm::MachineFunction&, llvm::TargetInstrInfo::MachineOutlinerInfo const&) const'
../../lib/libLLVMCodeGen.a(TargetInstrInfo.cpp.obj):TargetInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo18insertOutlinedCallERNS_6ModuleERNS_17MachineBasicBlockERNS_26MachineInstrBundleIteratorINS_12MachineInstrELb0EEERNS_15MachineFunctionERKNS0_19MachineOutlinerInfoE+0x0): first defined here
../../lib/libLLVMNVPTXCodeGen.a(NVPTXInstrInfo.cpp.obj):NVPTXInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo22insertOutlinerEpilogueERNS_17MachineBasicBlockERNS_15MachineFunctionERKNS0_19MachineOutlinerInfoE+0x0): multiple definition of `llvm::TargetInstrInfo::insertOutlinerEpilogue(llvm::MachineBasicBlock&, llvm::MachineFunction&, llvm::TargetInstrInfo::MachineOutlinerInfo const&) const'
../../lib/libLLVMCodeGen.a(TargetInstrInfo.cpp.obj):TargetInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo22insertOutlinerEpilogueERNS_17MachineBasicBlockERNS_15MachineFunctionERKNS0_19MachineOutlinerInfoE+0x0): first defined here
../../lib/libLLVMNVPTXCodeGen.a(NVPTXInstrInfo.cpp.obj):NVPTXInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo16getOutliningTypeERNS_12MachineInstrE+0x0): multiple definition of `llvm::TargetInstrInfo::getOutliningType(llvm::MachineInstr&) const'
../../lib/libLLVMCodeGen.a(TargetInstrInfo.cpp.obj):TargetInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo16getOutliningTypeERNS_12MachineInstrE+0x0): first defined here
../../lib/libLLVMNVPTXCodeGen.a(NVPTXInstrInfo.cpp.obj):NVPTXInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo27getOutlininingCandidateInfoERSt6vectorISt4pairINS_26MachineInstrBundleIteratorINS_12MachineInstrELb0EEES5_ESaIS6_EE+0x0): multiple definition of `llvm::TargetInstrInfo::getOutlininingCandidateInfo(std::vector<std::pair<llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false> >, std::allocator<std::pair<llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false> > > >&) const'
../../lib/libLLVMCodeGen.a(TargetInstrInfo.cpp.obj):TargetInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo27getOutlininingCandidateInfoERSt6vectorISt4pairINS_26MachineInstrBundleIteratorINS_12MachineInstrELb0EEES5_ESaIS6_EE+0x0): first defined here
../../lib/libLLVMNVPTXCodeGen.a(NVPTXInstrInfo.cpp.obj):NVPTXInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo25replaceBranchWithTailCallERNS_17MachineBasicBlockERNS_15SmallVectorImplINS_14MachineOperandEEERKNS_12MachineInstrE+0x0): multiple definition of `llvm::TargetInstrInfo::replaceBranchWithTailCall(llvm::MachineBasicBlock&, llvm::SmallVectorImpl<llvm::MachineOperand>&, llvm::MachineInstr const&) const'
../../lib/libLLVMCodeGen.a(TargetInstrInfo.cpp.obj):TargetInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo25replaceBranchWithTailCallERNS_17MachineBasicBlockERNS_15SmallVectorImplINS_14MachineOperandEEERKNS_12MachineInstrE+0x0): first defined here
../../lib/libLLVMNVPTXCodeGen.a(NVPTXInstrInfo.cpp.obj):NVPTXInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo19shouldClusterMemOpsERNS_12MachineInstrEjS2_jj+0x0): multiple definition of `llvm::TargetInstrInfo::shouldClusterMemOps(llvm::MachineInstr&, unsigned int, llvm::MachineInstr&, unsigned int, unsigned int) const'
../../lib/libLLVMCodeGen.a(TargetInstrInfo.cpp.obj):TargetInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo19shouldClusterMemOpsERNS_12MachineInstrEjS2_jj+0x0): first defined here
../../lib/libLLVMNVPTXCodeGen.a(NVPTXInstrInfo.cpp.obj):NVPTXInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo20loadRegFromStackSlotERNS_17MachineBasicBlockENS_26MachineInstrBundleIteratorINS_12MachineInstrELb0EEEjiPKNS_19TargetRegisterClassEPKNS_18TargetRegisterInfoE+0x0): multiple definition of `llvm::TargetInstrInfo::loadRegFromStackSlot(llvm::MachineBasicBlock&, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, unsigned int, int, llvm::TargetRegisterClass const*, llvm::TargetRegisterInfo const*) const'
../../lib/libLLVMCodeGen.a(TargetInstrInfo.cpp.obj):TargetInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo20loadRegFromStackSlotERNS_17MachineBasicBlockENS_26MachineInstrBundleIteratorINS_12MachineInstrELb0EEEjiPKNS_19TargetRegisterClassEPKNS_18TargetRegisterInfoE+0x0): first defined here
../../lib/libLLVMNVPTXCodeGen.a(NVPTXInstrInfo.cpp.obj):NVPTXInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo19storeRegToStackSlotERNS_17MachineBasicBlockENS_26MachineInstrBundleIteratorINS_12MachineInstrELb0EEEjbiPKNS_19TargetRegisterClassEPKNS_18TargetRegisterInfoE+0x0): multiple definition of `llvm::TargetInstrInfo::storeRegToStackSlot(llvm::MachineBasicBlock&, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, unsigned int, bool, int, llvm::TargetRegisterClass const*, llvm::TargetRegisterInfo const*) const'
../../lib/libLLVMCodeGen.a(TargetInstrInfo.cpp.obj):TargetInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo19storeRegToStackSlotERNS_17MachineBasicBlockENS_26MachineInstrBundleIteratorINS_12MachineInstrELb0EEEjbiPKNS_19TargetRegisterClassEPKNS_18TargetRegisterInfoE+0x0): first defined here
../../lib/libLLVMNVPTXCodeGen.a(NVPTXInstrInfo.cpp.obj):NVPTXInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo11copyPhysRegERNS_17MachineBasicBlockENS_26MachineInstrBundleIteratorINS_12MachineInstrELb0EEERKNS_8DebugLocEjjb+0x0): multiple definition of `llvm::TargetInstrInfo::copyPhysReg(llvm::MachineBasicBlock&, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, llvm::DebugLoc const&, unsigned int, unsigned int, bool) const'
../../lib/libLLVMCodeGen.a(TargetInstrInfo.cpp.obj):TargetInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo11copyPhysRegERNS_17MachineBasicBlockENS_26MachineInstrBundleIteratorINS_12MachineInstrELb0EEERKNS_8DebugLocEjjb+0x0): first defined here
../../lib/libLLVMNVPTXCodeGen.a(NVPTXInstrInfo.cpp.obj):NVPTXInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo14optimizeSelectERNS_12MachineInstrERNS_15SmallPtrSetImplIPS1_EEb+0x0): multiple definition of `llvm::TargetInstrInfo::optimizeSelect(llvm::MachineInstr&, llvm::SmallPtrSetImpl<llvm::MachineInstr*>&, bool) const'
../../lib/libLLVMCodeGen.a(TargetInstrInfo.cpp.obj):TargetInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo14optimizeSelectERNS_12MachineInstrERNS_15SmallPtrSetImplIPS1_EEb+0x0): first defined here
../../lib/libLLVMNVPTXCodeGen.a(NVPTXInstrInfo.cpp.obj):NVPTXInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo12insertSelectERNS_17MachineBasicBlockENS_26MachineInstrBundleIteratorINS_12MachineInstrELb0EEERKNS_8DebugLocEjNS_8ArrayRefINS_14MachineOperandEEEjj+0x0): multiple definition of `llvm::TargetInstrInfo::insertSelect(llvm::MachineBasicBlock&, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, llvm::DebugLoc const&, unsigned int, llvm::ArrayRef<llvm::MachineOperand>, unsigned int, unsigned int) const'
../../lib/libLLVMCodeGen.a(TargetInstrInfo.cpp.obj):TargetInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo12insertSelectERNS_17MachineBasicBlockENS_26MachineInstrBundleIteratorINS_12MachineInstrELb0EEERKNS_8DebugLocEjNS_8ArrayRefINS_14MachineOperandEEEjj+0x0): first defined here
../../lib/libLLVMNVPTXCodeGen.a(NVPTXInstrInfo.cpp.obj):NVPTXInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo15reduceLoopCountERNS_17MachineBasicBlockEPNS_12MachineInstrERS3_RNS_15SmallVectorImplINS_14MachineOperandEEERNS6_IS4_EEjj+0x0): multiple definition of `llvm::TargetInstrInfo::reduceLoopCount(llvm::MachineBasicBlock&, llvm::MachineInstr*, llvm::MachineInstr&, llvm::SmallVectorImpl<llvm::MachineOperand>&, llvm::SmallVectorImpl<llvm::MachineInstr*>&, unsigned int, unsigned int) const'
../../lib/libLLVMCodeGen.a(TargetInstrInfo.cpp.obj):TargetInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo15reduceLoopCountERNS_17MachineBasicBlockEPNS_12MachineInstrERS3_RNS_15SmallVectorImplINS_14MachineOperandEEERNS6_IS4_EEjj+0x0): first defined here
../../lib/libLLVMNVPTXCodeGen.a(NVPTXInstrInfo.cpp.obj):NVPTXInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo12insertBranchERNS_17MachineBasicBlockEPS1_S3_NS_8ArrayRefINS_14MachineOperandEEERKNS_8DebugLocEPi+0x0): multiple definition of `llvm::TargetInstrInfo::insertBranch(llvm::MachineBasicBlock&, llvm::MachineBasicBlock*, llvm::MachineBasicBlock*, llvm::ArrayRef<llvm::MachineOperand>, llvm::DebugLoc const&, int*) const'
../../lib/libLLVMCodeGen.a(TargetInstrInfo.cpp.obj):TargetInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo12insertBranchERNS_17MachineBasicBlockEPS1_S3_NS_8ArrayRefINS_14MachineOperandEEERKNS_8DebugLocEPi+0x0): first defined here
../../lib/libLLVMNVPTXCodeGen.a(NVPTXInstrInfo.cpp.obj):NVPTXInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo12removeBranchERNS_17MachineBasicBlockEPi+0x0): multiple definition of `llvm::TargetInstrInfo::removeBranch(llvm::MachineBasicBlock&, int*) const'
../../lib/libLLVMCodeGen.a(TargetInstrInfo.cpp.obj):TargetInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo12removeBranchERNS_17MachineBasicBlockEPi+0x0): first defined here
../../lib/libLLVMNVPTXCodeGen.a(NVPTXInstrInfo.cpp.obj):NVPTXInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo20insertIndirectBranchERNS_17MachineBasicBlockES2_RKNS_8DebugLocExPNS_12RegScavengerE+0x0): multiple definition of `llvm::TargetInstrInfo::insertIndirectBranch(llvm::MachineBasicBlock&, llvm::MachineBasicBlock&, llvm::DebugLoc const&, long long, llvm::RegScavenger*) const'
../../lib/libLLVMCodeGen.a(TargetInstrInfo.cpp.obj):TargetInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo20insertIndirectBranchERNS_17MachineBasicBlockES2_RKNS_8DebugLocExPNS_12RegScavengerE+0x0): first defined here
../../lib/libLLVMNVPTXCodeGen.a(NVPTXInstrInfo.cpp.obj):NVPTXInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo18getBranchDestBlockERKNS_12MachineInstrE+0x0): multiple definition of `llvm::TargetInstrInfo::getBranchDestBlock(llvm::MachineInstr const&) const'
../../lib/libLLVMCodeGen.a(TargetInstrInfo.cpp.obj):TargetInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo18getBranchDestBlockERKNS_12MachineInstrE+0x0): first defined here
../../lib/libLLVMNVPTXCodeGen.a(NVPTXInstrInfo.cpp.obj):NVPTXInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo21isBranchOffsetInRangeEjx+0x0): multiple definition of `llvm::TargetInstrInfo::isBranchOffsetInRange(unsigned int, long long) const'
../../lib/libLLVMCodeGen.a(TargetInstrInfo.cpp.obj):TargetInstrInfo.cpp:(.text$_ZNK4llvm15TargetInstrInfo21isBranchOffsetInRangeEjx+0x0): first defined here
collect2: error: ld returned 1 exit status

@Keno
Copy link
Member

Keno commented Mar 10, 2018

Please add https://reviews.llvm.org/D42260 to the patch list.

@ararslan
Copy link
Member

@nanosoldier runbenchmarks(ALL, vs=":master")

@vchuravy
Copy link
Member Author

Sure, You haven't landed it yet, right?

@Keno
Copy link
Member

Keno commented Mar 10, 2018

No, I should though.

@vchuravy
Copy link
Member Author

Quick update on the codegen failure. It is not triggered by the debug build, but rather by O0 and O1 so it might be more likely related to pass order.

@maleadt
Copy link
Member

maleadt commented Mar 11, 2018

Could you include D44140? With it, LLVM.jl and CUDAnative.jl test without issues.

@vchuravy
Copy link
Member Author

@nanosoldier runbenchmarks(ALL, vs=":master")

@ararslan
Copy link
Member

Man, Nanosoldier really hates this PR. @nanosoldier runbenchmarks(ALL, vs=":master")

@ararslan
Copy link
Member

@nanosoldier runbenchmarks(ALL, vs=":master")

@vchuravy
Copy link
Member Author

Good news. After updating a copy of the buildbots from mingw 5.4 to 6.2 and including the last three patches I got Julia to build and pass tests on Win64!
So while the commit history needs to be cleaned up and 4436b90 needs to be submitted upstream this is now at a point where I would want to encourage other people to take this for a spin.

@staticfloat
Copy link
Member

@vchuravy What do you need in order to check off the armv7l and ppc64le boxes on your checklist?

@vchuravy
Copy link
Member Author

vchuravy commented Mar 17, 2018 via email

@tshort
Copy link
Contributor

tshort commented Mar 18, 2018

+1 from me. It compiled fine on Arch Linux, and tests for the LLVM.jl package ran fine. I also tried recompiling with the WebAssembly backend included, and that worked fine, too. (Once this is in, I'll probably submit a PR to include the wasm backend.)

@vchuravy
Copy link
Member Author

vchuravy commented Mar 19, 2018 via email

@vchuravy
Copy link
Member Author

Once I replace 4436b90 with https://reviews.llvm.org/D44650 this should be good to go, but I will give upstream a couple of days to respond to that. There is also movement on the ssp patch for mingw.

@tkelman
Copy link
Contributor

tkelman commented Mar 19, 2018

Should delete the old checksums. And Mac and Windows CI are not actually testing LLVM 6 yet: https://travis-ci.org/JuliaLang/julia/jobs/354081449#L6185, https://ci.appveyor.com/project/JuliaLang/julia/build/1.0.25526/job/pdt7a2i4xvn8nvj7#L2596

@vchuravy
Copy link
Member Author

vchuravy commented Mar 19, 2018 via email

@KristofferC
Copy link
Member

KristofferC commented Mar 23, 2018

Some (encouraging?) benchmarks. Defining a fixed size array, defining methods for unrolled matmat and matvec multiplication as in as in https://gist.github.com/KristofferC/07814a1dc4e6dfc07697ca2452b309da, and disabling/enabling the SLP-vectorizer and running the following benchmark:

n = 4
v4 = FixedVector((rand(n)...,))
m4x4 = FixedMatrix{n, n, Float64, n*n}((rand(n*n)...,))

n = 8
v8_32 = FixedVector((rand(Float32, n)...,))
m8x8_32 = FixedMatrix{n, n, Float32, n*n}((rand(Float32, n*n)...,))

using BenchmarkTools

@btime $m4x4 * $m4x4;
@btime $m4x4 * $v4;

@btime $m8x8_32 * $m8x8_32;
@btime $m8x8_32 * $v8_32;

I get:

0.6-O3 0.7-O3 0.7 LLVM 6-O3 0.6 0.7 0.7 LLVM 6
m4x4 * m4x4 24.753 ns 26.277 ns 8.398 ns 26.861 ns 25.693 ns 25.986 ns
m4x4 * v4 4.715 ns 4.604 ns 2.732 ns 10.938 ns 4.593 ns 4.615 ns
m8x8_32 * m8x8_32 27.090 ns 180.194 ns 26.131 ns 180.476 ns 191.392 ns 181.375 ns
m8x8_32 * v8_32 5.396 ns 20.582 ns 5.723 ns 21.074 ns 20.588 ns 20.759 ns

LLVM 6 with SLP enabled does a very good job. 0.7 with current LLVM seems to fail to vectorize in all cases even with SLP.

@tkoolen
Copy link
Contributor

tkoolen commented Mar 23, 2018

Hmm, on my machine with 0.6 -O3:

julia> using StaticArrays, BenchmarkTools

julia> @btime a * b setup = (a = rand(SMatrix{4, 4}); b = rand(SMatrix{4, 4}))
  9.261 ns (0 allocations: 0 bytes)

and with your gist:

@btime $m4x4 * $m4x4;
  20.611 ns 

StaticArrays doesn't do any explicit SIMD stuff. Why are these baseline results so different?

@KristofferC
Copy link
Member

I used the definitions in https://github.com/JuliaCI/BaseBenchmarks.jl/blob/master/src/tuple/TupleBenchmarks.jl. These were originally taken from StaticArrays but perhaps they have been updated?

@tkoolen
Copy link
Contributor

tkoolen commented Mar 23, 2018

With an added Base.@_inline_meta in the expression returned from the generated function on 0.6 with -O3:

julia> @btime $m4x4 * $m4x4;
  9.242 ns (0 allocations: 0 bytes)

- forward-ports the NVPTX address space changes
- drops the Windows specific threading changes
- add patch for D42260
- add patch for D44140
- backport patch for FastISel bug
- final version of ssp patch for Windows
- fix CMAKE issue for mingw32 compilers
@vchuravy
Copy link
Member Author

Ok, does anybody have any other comments on this. I would like to merge this PR this week and from my side it is ready (modulo the infrastructure work).
I don't think ppc64le should hold this up, since we currently can't claim to support power on 0.7 anyway. I pledge to continue my work on restoring support for ppc64le and hopefully have that all figured out for 1.0 or a patch release.

@tkelman
Copy link
Contributor

tkelman commented Mar 26, 2018

Don't think LLVM_VER should be changed for source builds until CI is actually testing it.

@vchuravy
Copy link
Member Author

vchuravy commented Mar 26, 2018 via email

@tkelman
Copy link
Contributor

tkelman commented Mar 26, 2018

Switch builds servers to Mingw32 6.2

Why? That's not what CI will use, or anyone following the README.windows.md instructions.

@vchuravy
Copy link
Member Author

On the current version of mingw that the buildbots are using I get a linker failure that makes little sense to me (#26398 (comment)) on 6.2 that failure goes away. I managed to fix a similar linker failure in https://reviews.llvm.org/D44650, but that was prevalent on both 5.4 and 6.2.

@vchuravy
Copy link
Member Author

Closed in favour of #26925

@vchuravy vchuravy closed this Apr 30, 2018
@vchuravy vchuravy deleted the vc/llvm6 branch April 30, 2018 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external dependencies Involves LLVM, OpenBLAS, or other linked libraries

Projects

None yet

Development

Successfully merging this pull request may close these issues.