A 4096-core RISC-V Chiplet Architecture for Ultra-efficient Floating-point Computing