rebase: bump the github-dependencies group across 1 directory with 4 updates

Bumps the github-dependencies group with 4 updates in the / directory: [github.com/aws/aws-sdk-go-v2/service/sts](https://github.com/aws/aws-sdk-go-v2), [github.com/kubernetes-csi/csi-lib-utils](https://github.com/kubernetes-csi/csi-lib-utils), [github.com/onsi/ginkgo/v2](https://github.com/onsi/ginkgo) and [github.com/prometheus/client_golang](https://github.com/prometheus/client_golang).


Updates `github.com/aws/aws-sdk-go-v2/service/sts` from 1.30.3 to 1.30.4
- [Release notes](https://github.com/aws/aws-sdk-go-v2/releases)
- [Commits](https://github.com/aws/aws-sdk-go-v2/compare/v1.30.3...v1.30.4)

Updates `github.com/kubernetes-csi/csi-lib-utils` from 0.18.1 to 0.19.0
- [Release notes](https://github.com/kubernetes-csi/csi-lib-utils/releases)
- [Commits](https://github.com/kubernetes-csi/csi-lib-utils/compare/v0.18.1...v0.19.0)

Updates `github.com/onsi/ginkgo/v2` from 2.19.1 to 2.20.0
- [Release notes](https://github.com/onsi/ginkgo/releases)
- [Changelog](https://github.com/onsi/ginkgo/blob/master/CHANGELOG.md)
- [Commits](https://github.com/onsi/ginkgo/compare/v2.19.1...v2.20.0)

Updates `github.com/prometheus/client_golang` from 1.19.1 to 1.20.1
- [Release notes](https://github.com/prometheus/client_golang/releases)
- [Changelog](https://github.com/prometheus/client_golang/blob/v1.20.1/CHANGELOG.md)
- [Commits](https://github.com/prometheus/client_golang/compare/v1.19.1...v1.20.1)

---
updated-dependencies:
- dependency-name: github.com/aws/aws-sdk-go-v2/service/sts
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: github-dependencies
- dependency-name: github.com/kubernetes-csi/csi-lib-utils
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: github-dependencies
- dependency-name: github.com/onsi/ginkgo/v2
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: github-dependencies
- dependency-name: github.com/prometheus/client_golang
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: github-dependencies
...

Signed-off-by: dependabot[bot] <support@github.com>
This commit is contained in:
dependabot[bot]
2024-08-20 13:34:46 +00:00
committed by mergify[bot]
parent 4da71363b5
commit b044363a31
127 changed files with 27677 additions and 245 deletions

View File

@ -0,0 +1 @@
/huff0-fuzz.zip

89
vendor/github.com/klauspost/compress/huff0/README.md generated vendored Normal file
View File

@ -0,0 +1,89 @@
# Huff0 entropy compression
This package provides Huff0 encoding and decoding as used in zstd.
[Huff0](https://github.com/Cyan4973/FiniteStateEntropy#new-generation-entropy-coders),
a Huffman codec designed for modern CPU, featuring OoO (Out of Order) operations on multiple ALU
(Arithmetic Logic Unit), achieving extremely fast compression and decompression speeds.
This can be used for compressing input with a lot of similar input values to the smallest number of bytes.
This does not perform any multi-byte [dictionary coding](https://en.wikipedia.org/wiki/Dictionary_coder) as LZ coders,
but it can be used as a secondary step to compressors (like Snappy) that does not do entropy encoding.
* [Godoc documentation](https://godoc.org/github.com/klauspost/compress/huff0)
## News
This is used as part of the [zstandard](https://github.com/klauspost/compress/tree/master/zstd#zstd) compression and decompression package.
This ensures that most functionality is well tested.
# Usage
This package provides a low level interface that allows to compress single independent blocks.
Each block is separate, and there is no built in integrity checks.
This means that the caller should keep track of block sizes and also do checksums if needed.
Compressing a block is done via the [`Compress1X`](https://godoc.org/github.com/klauspost/compress/huff0#Compress1X) and
[`Compress4X`](https://godoc.org/github.com/klauspost/compress/huff0#Compress4X) functions.
You must provide input and will receive the output and maybe an error.
These error values can be returned:
| Error | Description |
|---------------------|-----------------------------------------------------------------------------|
| `<nil>` | Everything ok, output is returned |
| `ErrIncompressible` | Returned when input is judged to be too hard to compress |
| `ErrUseRLE` | Returned from the compressor when the input is a single byte value repeated |
| `ErrTooBig` | Returned if the input block exceeds the maximum allowed size (128 Kib) |
| `(error)` | An internal error occurred. |
As can be seen above some of there are errors that will be returned even under normal operation so it is important to handle these.
To reduce allocations you can provide a [`Scratch`](https://godoc.org/github.com/klauspost/compress/huff0#Scratch) object
that can be re-used for successive calls. Both compression and decompression accepts a `Scratch` object, and the same
object can be used for both.
Be aware, that when re-using a `Scratch` object that the *output* buffer is also re-used, so if you are still using this
you must set the `Out` field in the scratch to nil. The same buffer is used for compression and decompression output.
The `Scratch` object will retain state that allows to re-use previous tables for encoding and decoding.
## Tables and re-use
Huff0 allows for reusing tables from the previous block to save space if that is expected to give better/faster results.
The Scratch object allows you to set a [`ReusePolicy`](https://godoc.org/github.com/klauspost/compress/huff0#ReusePolicy)
that controls this behaviour. See the documentation for details. This can be altered between each block.
Do however note that this information is *not* stored in the output block and it is up to the users of the package to
record whether [`ReadTable`](https://godoc.org/github.com/klauspost/compress/huff0#ReadTable) should be called,
based on the boolean reported back from the CompressXX call.
If you want to store the table separate from the data, you can access them as `OutData` and `OutTable` on the
[`Scratch`](https://godoc.org/github.com/klauspost/compress/huff0#Scratch) object.
## Decompressing
The first part of decoding is to initialize the decoding table through [`ReadTable`](https://godoc.org/github.com/klauspost/compress/huff0#ReadTable).
This will initialize the decoding tables.
You can supply the complete block to `ReadTable` and it will return the data part of the block
which can be given to the decompressor.
Decompressing is done by calling the [`Decompress1X`](https://godoc.org/github.com/klauspost/compress/huff0#Scratch.Decompress1X)
or [`Decompress4X`](https://godoc.org/github.com/klauspost/compress/huff0#Scratch.Decompress4X) function.
For concurrently decompressing content with a fixed table a stateless [`Decoder`](https://godoc.org/github.com/klauspost/compress/huff0#Decoder) can be requested which will remain correct as long as the scratch is unchanged. The capacity of the provided slice indicates the expected output size.
You must provide the output from the compression stage, at exactly the size you got back. If you receive an error back
your input was likely corrupted.
It is important to note that a successful decoding does *not* mean your output matches your original input.
There are no integrity checks, so relying on errors from the decompressor does not assure your data is valid.
# Contributing
Contributions are always welcome. Be aware that adding public functions will require good justification and breaking
changes will likely not be accepted. If in doubt open an issue before writing the PR.

229
vendor/github.com/klauspost/compress/huff0/bitreader.go generated vendored Normal file
View File

@ -0,0 +1,229 @@
// Copyright 2018 Klaus Post. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// Based on work Copyright (c) 2013, Yann Collet, released under BSD License.
package huff0
import (
"encoding/binary"
"errors"
"fmt"
"io"
)
// bitReader reads a bitstream in reverse.
// The last set bit indicates the start of the stream and is used
// for aligning the input.
type bitReaderBytes struct {
in []byte
off uint // next byte to read is at in[off - 1]
value uint64
bitsRead uint8
}
// init initializes and resets the bit reader.
func (b *bitReaderBytes) init(in []byte) error {
if len(in) < 1 {
return errors.New("corrupt stream: too short")
}
b.in = in
b.off = uint(len(in))
// The highest bit of the last byte indicates where to start
v := in[len(in)-1]
if v == 0 {
return errors.New("corrupt stream, did not find end of stream")
}
b.bitsRead = 64
b.value = 0
if len(in) >= 8 {
b.fillFastStart()
} else {
b.fill()
b.fill()
}
b.advance(8 - uint8(highBit32(uint32(v))))
return nil
}
// peekBitsFast requires that at least one bit is requested every time.
// There are no checks if the buffer is filled.
func (b *bitReaderBytes) peekByteFast() uint8 {
got := uint8(b.value >> 56)
return got
}
func (b *bitReaderBytes) advance(n uint8) {
b.bitsRead += n
b.value <<= n & 63
}
// fillFast() will make sure at least 32 bits are available.
// There must be at least 4 bytes available.
func (b *bitReaderBytes) fillFast() {
if b.bitsRead < 32 {
return
}
// 2 bounds checks.
v := b.in[b.off-4 : b.off]
low := (uint32(v[0])) | (uint32(v[1]) << 8) | (uint32(v[2]) << 16) | (uint32(v[3]) << 24)
b.value |= uint64(low) << (b.bitsRead - 32)
b.bitsRead -= 32
b.off -= 4
}
// fillFastStart() assumes the bitReaderBytes is empty and there is at least 8 bytes to read.
func (b *bitReaderBytes) fillFastStart() {
// Do single re-slice to avoid bounds checks.
b.value = binary.LittleEndian.Uint64(b.in[b.off-8:])
b.bitsRead = 0
b.off -= 8
}
// fill() will make sure at least 32 bits are available.
func (b *bitReaderBytes) fill() {
if b.bitsRead < 32 {
return
}
if b.off > 4 {
v := b.in[b.off-4 : b.off]
low := (uint32(v[0])) | (uint32(v[1]) << 8) | (uint32(v[2]) << 16) | (uint32(v[3]) << 24)
b.value |= uint64(low) << (b.bitsRead - 32)
b.bitsRead -= 32
b.off -= 4
return
}
for b.off > 0 {
b.value |= uint64(b.in[b.off-1]) << (b.bitsRead - 8)
b.bitsRead -= 8
b.off--
}
}
// finished returns true if all bits have been read from the bit stream.
func (b *bitReaderBytes) finished() bool {
return b.off == 0 && b.bitsRead >= 64
}
func (b *bitReaderBytes) remaining() uint {
return b.off*8 + uint(64-b.bitsRead)
}
// close the bitstream and returns an error if out-of-buffer reads occurred.
func (b *bitReaderBytes) close() error {
// Release reference.
b.in = nil
if b.remaining() > 0 {
return fmt.Errorf("corrupt input: %d bits remain on stream", b.remaining())
}
if b.bitsRead > 64 {
return io.ErrUnexpectedEOF
}
return nil
}
// bitReaderShifted reads a bitstream in reverse.
// The last set bit indicates the start of the stream and is used
// for aligning the input.
type bitReaderShifted struct {
in []byte
off uint // next byte to read is at in[off - 1]
value uint64
bitsRead uint8
}
// init initializes and resets the bit reader.
func (b *bitReaderShifted) init(in []byte) error {
if len(in) < 1 {
return errors.New("corrupt stream: too short")
}
b.in = in
b.off = uint(len(in))
// The highest bit of the last byte indicates where to start
v := in[len(in)-1]
if v == 0 {
return errors.New("corrupt stream, did not find end of stream")
}
b.bitsRead = 64
b.value = 0
if len(in) >= 8 {
b.fillFastStart()
} else {
b.fill()
b.fill()
}
b.advance(8 - uint8(highBit32(uint32(v))))
return nil
}
// peekBitsFast requires that at least one bit is requested every time.
// There are no checks if the buffer is filled.
func (b *bitReaderShifted) peekBitsFast(n uint8) uint16 {
return uint16(b.value >> ((64 - n) & 63))
}
func (b *bitReaderShifted) advance(n uint8) {
b.bitsRead += n
b.value <<= n & 63
}
// fillFast() will make sure at least 32 bits are available.
// There must be at least 4 bytes available.
func (b *bitReaderShifted) fillFast() {
if b.bitsRead < 32 {
return
}
// 2 bounds checks.
v := b.in[b.off-4 : b.off]
low := (uint32(v[0])) | (uint32(v[1]) << 8) | (uint32(v[2]) << 16) | (uint32(v[3]) << 24)
b.value |= uint64(low) << ((b.bitsRead - 32) & 63)
b.bitsRead -= 32
b.off -= 4
}
// fillFastStart() assumes the bitReaderShifted is empty and there is at least 8 bytes to read.
func (b *bitReaderShifted) fillFastStart() {
// Do single re-slice to avoid bounds checks.
b.value = binary.LittleEndian.Uint64(b.in[b.off-8:])
b.bitsRead = 0
b.off -= 8
}
// fill() will make sure at least 32 bits are available.
func (b *bitReaderShifted) fill() {
if b.bitsRead < 32 {
return
}
if b.off > 4 {
v := b.in[b.off-4 : b.off]
low := (uint32(v[0])) | (uint32(v[1]) << 8) | (uint32(v[2]) << 16) | (uint32(v[3]) << 24)
b.value |= uint64(low) << ((b.bitsRead - 32) & 63)
b.bitsRead -= 32
b.off -= 4
return
}
for b.off > 0 {
b.value |= uint64(b.in[b.off-1]) << ((b.bitsRead - 8) & 63)
b.bitsRead -= 8
b.off--
}
}
func (b *bitReaderShifted) remaining() uint {
return b.off*8 + uint(64-b.bitsRead)
}
// close the bitstream and returns an error if out-of-buffer reads occurred.
func (b *bitReaderShifted) close() error {
// Release reference.
b.in = nil
if b.remaining() > 0 {
return fmt.Errorf("corrupt input: %d bits remain on stream", b.remaining())
}
if b.bitsRead > 64 {
return io.ErrUnexpectedEOF
}
return nil
}

102
vendor/github.com/klauspost/compress/huff0/bitwriter.go generated vendored Normal file
View File

@ -0,0 +1,102 @@
// Copyright 2018 Klaus Post. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// Based on work Copyright (c) 2013, Yann Collet, released under BSD License.
package huff0
// bitWriter will write bits.
// First bit will be LSB of the first byte of output.
type bitWriter struct {
bitContainer uint64
nBits uint8
out []byte
}
// addBits16Clean will add up to 16 bits. value may not contain more set bits than indicated.
// It will not check if there is space for them, so the caller must ensure that it has flushed recently.
func (b *bitWriter) addBits16Clean(value uint16, bits uint8) {
b.bitContainer |= uint64(value) << (b.nBits & 63)
b.nBits += bits
}
// encSymbol will add up to 16 bits. value may not contain more set bits than indicated.
// It will not check if there is space for them, so the caller must ensure that it has flushed recently.
func (b *bitWriter) encSymbol(ct cTable, symbol byte) {
enc := ct[symbol]
b.bitContainer |= uint64(enc.val) << (b.nBits & 63)
if false {
if enc.nBits == 0 {
panic("nbits 0")
}
}
b.nBits += enc.nBits
}
// encTwoSymbols will add up to 32 bits. value may not contain more set bits than indicated.
// It will not check if there is space for them, so the caller must ensure that it has flushed recently.
func (b *bitWriter) encTwoSymbols(ct cTable, av, bv byte) {
encA := ct[av]
encB := ct[bv]
sh := b.nBits & 63
combined := uint64(encA.val) | (uint64(encB.val) << (encA.nBits & 63))
b.bitContainer |= combined << sh
if false {
if encA.nBits == 0 {
panic("nbitsA 0")
}
if encB.nBits == 0 {
panic("nbitsB 0")
}
}
b.nBits += encA.nBits + encB.nBits
}
// encFourSymbols adds up to 32 bits from four symbols.
// It will not check if there is space for them,
// so the caller must ensure that b has been flushed recently.
func (b *bitWriter) encFourSymbols(encA, encB, encC, encD cTableEntry) {
bitsA := encA.nBits
bitsB := bitsA + encB.nBits
bitsC := bitsB + encC.nBits
bitsD := bitsC + encD.nBits
combined := uint64(encA.val) |
(uint64(encB.val) << (bitsA & 63)) |
(uint64(encC.val) << (bitsB & 63)) |
(uint64(encD.val) << (bitsC & 63))
b.bitContainer |= combined << (b.nBits & 63)
b.nBits += bitsD
}
// flush32 will flush out, so there are at least 32 bits available for writing.
func (b *bitWriter) flush32() {
if b.nBits < 32 {
return
}
b.out = append(b.out,
byte(b.bitContainer),
byte(b.bitContainer>>8),
byte(b.bitContainer>>16),
byte(b.bitContainer>>24))
b.nBits -= 32
b.bitContainer >>= 32
}
// flushAlign will flush remaining full bytes and align to next byte boundary.
func (b *bitWriter) flushAlign() {
nbBytes := (b.nBits + 7) >> 3
for i := uint8(0); i < nbBytes; i++ {
b.out = append(b.out, byte(b.bitContainer>>(i*8)))
}
b.nBits = 0
b.bitContainer = 0
}
// close will write the alignment bit and write the final byte(s)
// to the output.
func (b *bitWriter) close() {
// End mark
b.addBits16Clean(1, 1)
// flush until next byte.
b.flushAlign()
}

742
vendor/github.com/klauspost/compress/huff0/compress.go generated vendored Normal file
View File

@ -0,0 +1,742 @@
package huff0
import (
"fmt"
"math"
"runtime"
"sync"
)
// Compress1X will compress the input.
// The output can be decoded using Decompress1X.
// Supply a Scratch object. The scratch object contains state about re-use,
// So when sharing across independent encodes, be sure to set the re-use policy.
func Compress1X(in []byte, s *Scratch) (out []byte, reUsed bool, err error) {
s, err = s.prepare(in)
if err != nil {
return nil, false, err
}
return compress(in, s, s.compress1X)
}
// Compress4X will compress the input. The input is split into 4 independent blocks
// and compressed similar to Compress1X.
// The output can be decoded using Decompress4X.
// Supply a Scratch object. The scratch object contains state about re-use,
// So when sharing across independent encodes, be sure to set the re-use policy.
func Compress4X(in []byte, s *Scratch) (out []byte, reUsed bool, err error) {
s, err = s.prepare(in)
if err != nil {
return nil, false, err
}
if false {
// TODO: compress4Xp only slightly faster.
const parallelThreshold = 8 << 10
if len(in) < parallelThreshold || runtime.GOMAXPROCS(0) == 1 {
return compress(in, s, s.compress4X)
}
return compress(in, s, s.compress4Xp)
}
return compress(in, s, s.compress4X)
}
func compress(in []byte, s *Scratch, compressor func(src []byte) ([]byte, error)) (out []byte, reUsed bool, err error) {
// Nuke previous table if we cannot reuse anyway.
if s.Reuse == ReusePolicyNone {
s.prevTable = s.prevTable[:0]
}
// Create histogram, if none was provided.
maxCount := s.maxCount
var canReuse = false
if maxCount == 0 {
maxCount, canReuse = s.countSimple(in)
} else {
canReuse = s.canUseTable(s.prevTable)
}
// We want the output size to be less than this:
wantSize := len(in)
if s.WantLogLess > 0 {
wantSize -= wantSize >> s.WantLogLess
}
// Reset for next run.
s.clearCount = true
s.maxCount = 0
if maxCount >= len(in) {
if maxCount > len(in) {
return nil, false, fmt.Errorf("maxCount (%d) > length (%d)", maxCount, len(in))
}
if len(in) == 1 {
return nil, false, ErrIncompressible
}
// One symbol, use RLE
return nil, false, ErrUseRLE
}
if maxCount == 1 || maxCount < (len(in)>>7) {
// Each symbol present maximum once or too well distributed.
return nil, false, ErrIncompressible
}
if s.Reuse == ReusePolicyMust && !canReuse {
// We must reuse, but we can't.
return nil, false, ErrIncompressible
}
if (s.Reuse == ReusePolicyPrefer || s.Reuse == ReusePolicyMust) && canReuse {
keepTable := s.cTable
keepTL := s.actualTableLog
s.cTable = s.prevTable
s.actualTableLog = s.prevTableLog
s.Out, err = compressor(in)
s.cTable = keepTable
s.actualTableLog = keepTL
if err == nil && len(s.Out) < wantSize {
s.OutData = s.Out
return s.Out, true, nil
}
if s.Reuse == ReusePolicyMust {
return nil, false, ErrIncompressible
}
// Do not attempt to re-use later.
s.prevTable = s.prevTable[:0]
}
// Calculate new table.
err = s.buildCTable()
if err != nil {
return nil, false, err
}
if false && !s.canUseTable(s.cTable) {
panic("invalid table generated")
}
if s.Reuse == ReusePolicyAllow && canReuse {
hSize := len(s.Out)
oldSize := s.prevTable.estimateSize(s.count[:s.symbolLen])
newSize := s.cTable.estimateSize(s.count[:s.symbolLen])
if oldSize <= hSize+newSize || hSize+12 >= wantSize {
// Retain cTable even if we re-use.
keepTable := s.cTable
keepTL := s.actualTableLog
s.cTable = s.prevTable
s.actualTableLog = s.prevTableLog
s.Out, err = compressor(in)
// Restore ctable.
s.cTable = keepTable
s.actualTableLog = keepTL
if err != nil {
return nil, false, err
}
if len(s.Out) >= wantSize {
return nil, false, ErrIncompressible
}
s.OutData = s.Out
return s.Out, true, nil
}
}
// Use new table
err = s.cTable.write(s)
if err != nil {
s.OutTable = nil
return nil, false, err
}
s.OutTable = s.Out
// Compress using new table
s.Out, err = compressor(in)
if err != nil {
s.OutTable = nil
return nil, false, err
}
if len(s.Out) >= wantSize {
s.OutTable = nil
return nil, false, ErrIncompressible
}
// Move current table into previous.
s.prevTable, s.prevTableLog, s.cTable = s.cTable, s.actualTableLog, s.prevTable[:0]
s.OutData = s.Out[len(s.OutTable):]
return s.Out, false, nil
}
// EstimateSizes will estimate the data sizes
func EstimateSizes(in []byte, s *Scratch) (tableSz, dataSz, reuseSz int, err error) {
s, err = s.prepare(in)
if err != nil {
return 0, 0, 0, err
}
// Create histogram, if none was provided.
tableSz, dataSz, reuseSz = -1, -1, -1
maxCount := s.maxCount
var canReuse = false
if maxCount == 0 {
maxCount, canReuse = s.countSimple(in)
} else {
canReuse = s.canUseTable(s.prevTable)
}
// We want the output size to be less than this:
wantSize := len(in)
if s.WantLogLess > 0 {
wantSize -= wantSize >> s.WantLogLess
}
// Reset for next run.
s.clearCount = true
s.maxCount = 0
if maxCount >= len(in) {
if maxCount > len(in) {
return 0, 0, 0, fmt.Errorf("maxCount (%d) > length (%d)", maxCount, len(in))
}
if len(in) == 1 {
return 0, 0, 0, ErrIncompressible
}
// One symbol, use RLE
return 0, 0, 0, ErrUseRLE
}
if maxCount == 1 || maxCount < (len(in)>>7) {
// Each symbol present maximum once or too well distributed.
return 0, 0, 0, ErrIncompressible
}
// Calculate new table.
err = s.buildCTable()
if err != nil {
return 0, 0, 0, err
}
if false && !s.canUseTable(s.cTable) {
panic("invalid table generated")
}
tableSz, err = s.cTable.estTableSize(s)
if err != nil {
return 0, 0, 0, err
}
if canReuse {
reuseSz = s.prevTable.estimateSize(s.count[:s.symbolLen])
}
dataSz = s.cTable.estimateSize(s.count[:s.symbolLen])
// Restore
return tableSz, dataSz, reuseSz, nil
}
func (s *Scratch) compress1X(src []byte) ([]byte, error) {
return s.compress1xDo(s.Out, src), nil
}
func (s *Scratch) compress1xDo(dst, src []byte) []byte {
var bw = bitWriter{out: dst}
// N is length divisible by 4.
n := len(src)
n -= n & 3
cTable := s.cTable[:256]
// Encode last bytes.
for i := len(src) & 3; i > 0; i-- {
bw.encSymbol(cTable, src[n+i-1])
}
n -= 4
if s.actualTableLog <= 8 {
for ; n >= 0; n -= 4 {
tmp := src[n : n+4]
// tmp should be len 4
bw.flush32()
bw.encFourSymbols(cTable[tmp[3]], cTable[tmp[2]], cTable[tmp[1]], cTable[tmp[0]])
}
} else {
for ; n >= 0; n -= 4 {
tmp := src[n : n+4]
// tmp should be len 4
bw.flush32()
bw.encTwoSymbols(cTable, tmp[3], tmp[2])
bw.flush32()
bw.encTwoSymbols(cTable, tmp[1], tmp[0])
}
}
bw.close()
return bw.out
}
var sixZeros [6]byte
func (s *Scratch) compress4X(src []byte) ([]byte, error) {
if len(src) < 12 {
return nil, ErrIncompressible
}
segmentSize := (len(src) + 3) / 4
// Add placeholder for output length
offsetIdx := len(s.Out)
s.Out = append(s.Out, sixZeros[:]...)
for i := 0; i < 4; i++ {
toDo := src
if len(toDo) > segmentSize {
toDo = toDo[:segmentSize]
}
src = src[len(toDo):]
idx := len(s.Out)
s.Out = s.compress1xDo(s.Out, toDo)
if len(s.Out)-idx > math.MaxUint16 {
// We cannot store the size in the jump table
return nil, ErrIncompressible
}
// Write compressed length as little endian before block.
if i < 3 {
// Last length is not written.
length := len(s.Out) - idx
s.Out[i*2+offsetIdx] = byte(length)
s.Out[i*2+offsetIdx+1] = byte(length >> 8)
}
}
return s.Out, nil
}
// compress4Xp will compress 4 streams using separate goroutines.
func (s *Scratch) compress4Xp(src []byte) ([]byte, error) {
if len(src) < 12 {
return nil, ErrIncompressible
}
// Add placeholder for output length
s.Out = s.Out[:6]
segmentSize := (len(src) + 3) / 4
var wg sync.WaitGroup
wg.Add(4)
for i := 0; i < 4; i++ {
toDo := src
if len(toDo) > segmentSize {
toDo = toDo[:segmentSize]
}
src = src[len(toDo):]
// Separate goroutine for each block.
go func(i int) {
s.tmpOut[i] = s.compress1xDo(s.tmpOut[i][:0], toDo)
wg.Done()
}(i)
}
wg.Wait()
for i := 0; i < 4; i++ {
o := s.tmpOut[i]
if len(o) > math.MaxUint16 {
// We cannot store the size in the jump table
return nil, ErrIncompressible
}
// Write compressed length as little endian before block.
if i < 3 {
// Last length is not written.
s.Out[i*2] = byte(len(o))
s.Out[i*2+1] = byte(len(o) >> 8)
}
// Write output.
s.Out = append(s.Out, o...)
}
return s.Out, nil
}
// countSimple will create a simple histogram in s.count.
// Returns the biggest count.
// Does not update s.clearCount.
func (s *Scratch) countSimple(in []byte) (max int, reuse bool) {
reuse = true
_ = s.count // Assert that s != nil to speed up the following loop.
for _, v := range in {
s.count[v]++
}
m := uint32(0)
if len(s.prevTable) > 0 {
for i, v := range s.count[:] {
if v == 0 {
continue
}
if v > m {
m = v
}
s.symbolLen = uint16(i) + 1
if i >= len(s.prevTable) {
reuse = false
} else if s.prevTable[i].nBits == 0 {
reuse = false
}
}
return int(m), reuse
}
for i, v := range s.count[:] {
if v == 0 {
continue
}
if v > m {
m = v
}
s.symbolLen = uint16(i) + 1
}
return int(m), false
}
func (s *Scratch) canUseTable(c cTable) bool {
if len(c) < int(s.symbolLen) {
return false
}
for i, v := range s.count[:s.symbolLen] {
if v != 0 && c[i].nBits == 0 {
return false
}
}
return true
}
//lint:ignore U1000 used for debugging
func (s *Scratch) validateTable(c cTable) bool {
if len(c) < int(s.symbolLen) {
return false
}
for i, v := range s.count[:s.symbolLen] {
if v != 0 {
if c[i].nBits == 0 {
return false
}
if c[i].nBits > s.actualTableLog {
return false
}
}
}
return true
}
// minTableLog provides the minimum logSize to safely represent a distribution.
func (s *Scratch) minTableLog() uint8 {
minBitsSrc := highBit32(uint32(s.srcLen)) + 1
minBitsSymbols := highBit32(uint32(s.symbolLen-1)) + 2
if minBitsSrc < minBitsSymbols {
return uint8(minBitsSrc)
}
return uint8(minBitsSymbols)
}
// optimalTableLog calculates and sets the optimal tableLog in s.actualTableLog
func (s *Scratch) optimalTableLog() {
tableLog := s.TableLog
minBits := s.minTableLog()
maxBitsSrc := uint8(highBit32(uint32(s.srcLen-1))) - 1
if maxBitsSrc < tableLog {
// Accuracy can be reduced
tableLog = maxBitsSrc
}
if minBits > tableLog {
tableLog = minBits
}
// Need a minimum to safely represent all symbol values
if tableLog < minTablelog {
tableLog = minTablelog
}
if tableLog > tableLogMax {
tableLog = tableLogMax
}
s.actualTableLog = tableLog
}
type cTableEntry struct {
val uint16
nBits uint8
// We have 8 bits extra
}
const huffNodesMask = huffNodesLen - 1
func (s *Scratch) buildCTable() error {
s.optimalTableLog()
s.huffSort()
if cap(s.cTable) < maxSymbolValue+1 {
s.cTable = make([]cTableEntry, s.symbolLen, maxSymbolValue+1)
} else {
s.cTable = s.cTable[:s.symbolLen]
for i := range s.cTable {
s.cTable[i] = cTableEntry{}
}
}
var startNode = int16(s.symbolLen)
nonNullRank := s.symbolLen - 1
nodeNb := startNode
huffNode := s.nodes[1 : huffNodesLen+1]
// This overlays the slice above, but allows "-1" index lookups.
// Different from reference implementation.
huffNode0 := s.nodes[0 : huffNodesLen+1]
for huffNode[nonNullRank].count() == 0 {
nonNullRank--
}
lowS := int16(nonNullRank)
nodeRoot := nodeNb + lowS - 1
lowN := nodeNb
huffNode[nodeNb].setCount(huffNode[lowS].count() + huffNode[lowS-1].count())
huffNode[lowS].setParent(nodeNb)
huffNode[lowS-1].setParent(nodeNb)
nodeNb++
lowS -= 2
for n := nodeNb; n <= nodeRoot; n++ {
huffNode[n].setCount(1 << 30)
}
// fake entry, strong barrier
huffNode0[0].setCount(1 << 31)
// create parents
for nodeNb <= nodeRoot {
var n1, n2 int16
if huffNode0[lowS+1].count() < huffNode0[lowN+1].count() {
n1 = lowS
lowS--
} else {
n1 = lowN
lowN++
}
if huffNode0[lowS+1].count() < huffNode0[lowN+1].count() {
n2 = lowS
lowS--
} else {
n2 = lowN
lowN++
}
huffNode[nodeNb].setCount(huffNode0[n1+1].count() + huffNode0[n2+1].count())
huffNode0[n1+1].setParent(nodeNb)
huffNode0[n2+1].setParent(nodeNb)
nodeNb++
}
// distribute weights (unlimited tree height)
huffNode[nodeRoot].setNbBits(0)
for n := nodeRoot - 1; n >= startNode; n-- {
huffNode[n].setNbBits(huffNode[huffNode[n].parent()].nbBits() + 1)
}
for n := uint16(0); n <= nonNullRank; n++ {
huffNode[n].setNbBits(huffNode[huffNode[n].parent()].nbBits() + 1)
}
s.actualTableLog = s.setMaxHeight(int(nonNullRank))
maxNbBits := s.actualTableLog
// fill result into tree (val, nbBits)
if maxNbBits > tableLogMax {
return fmt.Errorf("internal error: maxNbBits (%d) > tableLogMax (%d)", maxNbBits, tableLogMax)
}
var nbPerRank [tableLogMax + 1]uint16
var valPerRank [16]uint16
for _, v := range huffNode[:nonNullRank+1] {
nbPerRank[v.nbBits()]++
}
// determine stating value per rank
{
min := uint16(0)
for n := maxNbBits; n > 0; n-- {
// get starting value within each rank
valPerRank[n] = min
min += nbPerRank[n]
min >>= 1
}
}
// push nbBits per symbol, symbol order
for _, v := range huffNode[:nonNullRank+1] {
s.cTable[v.symbol()].nBits = v.nbBits()
}
// assign value within rank, symbol order
t := s.cTable[:s.symbolLen]
for n, val := range t {
nbits := val.nBits & 15
v := valPerRank[nbits]
t[n].val = v
valPerRank[nbits] = v + 1
}
return nil
}
// huffSort will sort symbols, decreasing order.
func (s *Scratch) huffSort() {
type rankPos struct {
base uint32
current uint32
}
// Clear nodes
nodes := s.nodes[:huffNodesLen+1]
s.nodes = nodes
nodes = nodes[1 : huffNodesLen+1]
// Sort into buckets based on length of symbol count.
var rank [32]rankPos
for _, v := range s.count[:s.symbolLen] {
r := highBit32(v+1) & 31
rank[r].base++
}
// maxBitLength is log2(BlockSizeMax) + 1
const maxBitLength = 18 + 1
for n := maxBitLength; n > 0; n-- {
rank[n-1].base += rank[n].base
}
for n := range rank[:maxBitLength] {
rank[n].current = rank[n].base
}
for n, c := range s.count[:s.symbolLen] {
r := (highBit32(c+1) + 1) & 31
pos := rank[r].current
rank[r].current++
prev := nodes[(pos-1)&huffNodesMask]
for pos > rank[r].base && c > prev.count() {
nodes[pos&huffNodesMask] = prev
pos--
prev = nodes[(pos-1)&huffNodesMask]
}
nodes[pos&huffNodesMask] = makeNodeElt(c, byte(n))
}
}
func (s *Scratch) setMaxHeight(lastNonNull int) uint8 {
maxNbBits := s.actualTableLog
huffNode := s.nodes[1 : huffNodesLen+1]
//huffNode = huffNode[: huffNodesLen]
largestBits := huffNode[lastNonNull].nbBits()
// early exit : no elt > maxNbBits
if largestBits <= maxNbBits {
return largestBits
}
totalCost := int(0)
baseCost := int(1) << (largestBits - maxNbBits)
n := uint32(lastNonNull)
for huffNode[n].nbBits() > maxNbBits {
totalCost += baseCost - (1 << (largestBits - huffNode[n].nbBits()))
huffNode[n].setNbBits(maxNbBits)
n--
}
// n stops at huffNode[n].nbBits <= maxNbBits
for huffNode[n].nbBits() == maxNbBits {
n--
}
// n end at index of smallest symbol using < maxNbBits
// renorm totalCost
totalCost >>= largestBits - maxNbBits /* note : totalCost is necessarily a multiple of baseCost */
// repay normalized cost
{
const noSymbol = 0xF0F0F0F0
var rankLast [tableLogMax + 2]uint32
for i := range rankLast[:] {
rankLast[i] = noSymbol
}
// Get pos of last (smallest) symbol per rank
{
currentNbBits := maxNbBits
for pos := int(n); pos >= 0; pos-- {
if huffNode[pos].nbBits() >= currentNbBits {
continue
}
currentNbBits = huffNode[pos].nbBits() // < maxNbBits
rankLast[maxNbBits-currentNbBits] = uint32(pos)
}
}
for totalCost > 0 {
nBitsToDecrease := uint8(highBit32(uint32(totalCost))) + 1
for ; nBitsToDecrease > 1; nBitsToDecrease-- {
highPos := rankLast[nBitsToDecrease]
lowPos := rankLast[nBitsToDecrease-1]
if highPos == noSymbol {
continue
}
if lowPos == noSymbol {
break
}
highTotal := huffNode[highPos].count()
lowTotal := 2 * huffNode[lowPos].count()
if highTotal <= lowTotal {
break
}
}
// only triggered when no more rank 1 symbol left => find closest one (note : there is necessarily at least one !)
// HUF_MAX_TABLELOG test just to please gcc 5+; but it should not be necessary
// FIXME: try to remove
for (nBitsToDecrease <= tableLogMax) && (rankLast[nBitsToDecrease] == noSymbol) {
nBitsToDecrease++
}
totalCost -= 1 << (nBitsToDecrease - 1)
if rankLast[nBitsToDecrease-1] == noSymbol {
// this rank is no longer empty
rankLast[nBitsToDecrease-1] = rankLast[nBitsToDecrease]
}
huffNode[rankLast[nBitsToDecrease]].setNbBits(1 +
huffNode[rankLast[nBitsToDecrease]].nbBits())
if rankLast[nBitsToDecrease] == 0 {
/* special case, reached largest symbol */
rankLast[nBitsToDecrease] = noSymbol
} else {
rankLast[nBitsToDecrease]--
if huffNode[rankLast[nBitsToDecrease]].nbBits() != maxNbBits-nBitsToDecrease {
rankLast[nBitsToDecrease] = noSymbol /* this rank is now empty */
}
}
}
for totalCost < 0 { /* Sometimes, cost correction overshoot */
if rankLast[1] == noSymbol { /* special case : no rank 1 symbol (using maxNbBits-1); let's create one from largest rank 0 (using maxNbBits) */
for huffNode[n].nbBits() == maxNbBits {
n--
}
huffNode[n+1].setNbBits(huffNode[n+1].nbBits() - 1)
rankLast[1] = n + 1
totalCost++
continue
}
huffNode[rankLast[1]+1].setNbBits(huffNode[rankLast[1]+1].nbBits() - 1)
rankLast[1]++
totalCost++
}
}
return maxNbBits
}
// A nodeElt is the fields
//
// count uint32
// parent uint16
// symbol byte
// nbBits uint8
//
// in some order, all squashed into an integer so that the compiler
// always loads and stores entire nodeElts instead of separate fields.
type nodeElt uint64
func makeNodeElt(count uint32, symbol byte) nodeElt {
return nodeElt(count) | nodeElt(symbol)<<48
}
func (e *nodeElt) count() uint32 { return uint32(*e) }
func (e *nodeElt) parent() uint16 { return uint16(*e >> 32) }
func (e *nodeElt) symbol() byte { return byte(*e >> 48) }
func (e *nodeElt) nbBits() uint8 { return uint8(*e >> 56) }
func (e *nodeElt) setCount(c uint32) { *e = (*e)&0xffffffff00000000 | nodeElt(c) }
func (e *nodeElt) setParent(p int16) { *e = (*e)&0xffff0000ffffffff | nodeElt(uint16(p))<<32 }
func (e *nodeElt) setNbBits(n uint8) { *e = (*e)&0x00ffffffffffffff | nodeElt(n)<<56 }

1167
vendor/github.com/klauspost/compress/huff0/decompress.go generated vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,226 @@
//go:build amd64 && !appengine && !noasm && gc
// +build amd64,!appengine,!noasm,gc
// This file contains the specialisation of Decoder.Decompress4X
// and Decoder.Decompress1X that use an asm implementation of thir main loops.
package huff0
import (
"errors"
"fmt"
"github.com/klauspost/compress/internal/cpuinfo"
)
// decompress4x_main_loop_x86 is an x86 assembler implementation
// of Decompress4X when tablelog > 8.
//
//go:noescape
func decompress4x_main_loop_amd64(ctx *decompress4xContext)
// decompress4x_8b_loop_x86 is an x86 assembler implementation
// of Decompress4X when tablelog <= 8 which decodes 4 entries
// per loop.
//
//go:noescape
func decompress4x_8b_main_loop_amd64(ctx *decompress4xContext)
// fallback8BitSize is the size where using Go version is faster.
const fallback8BitSize = 800
type decompress4xContext struct {
pbr *[4]bitReaderShifted
peekBits uint8
out *byte
dstEvery int
tbl *dEntrySingle
decoded int
limit *byte
}
// Decompress4X will decompress a 4X encoded stream.
// The length of the supplied input must match the end of a block exactly.
// The *capacity* of the dst slice must match the destination size of
// the uncompressed data exactly.
func (d *Decoder) Decompress4X(dst, src []byte) ([]byte, error) {
if len(d.dt.single) == 0 {
return nil, errors.New("no table loaded")
}
if len(src) < 6+(4*1) {
return nil, errors.New("input too small")
}
use8BitTables := d.actualTableLog <= 8
if cap(dst) < fallback8BitSize && use8BitTables {
return d.decompress4X8bit(dst, src)
}
var br [4]bitReaderShifted
// Decode "jump table"
start := 6
for i := 0; i < 3; i++ {
length := int(src[i*2]) | (int(src[i*2+1]) << 8)
if start+length >= len(src) {
return nil, errors.New("truncated input (or invalid offset)")
}
err := br[i].init(src[start : start+length])
if err != nil {
return nil, err
}
start += length
}
err := br[3].init(src[start:])
if err != nil {
return nil, err
}
// destination, offset to match first output
dstSize := cap(dst)
dst = dst[:dstSize]
out := dst
dstEvery := (dstSize + 3) / 4
const tlSize = 1 << tableLogMax
const tlMask = tlSize - 1
single := d.dt.single[:tlSize]
var decoded int
if len(out) > 4*4 && !(br[0].off < 4 || br[1].off < 4 || br[2].off < 4 || br[3].off < 4) {
ctx := decompress4xContext{
pbr: &br,
peekBits: uint8((64 - d.actualTableLog) & 63), // see: bitReaderShifted.peekBitsFast()
out: &out[0],
dstEvery: dstEvery,
tbl: &single[0],
limit: &out[dstEvery-4], // Always stop decoding when first buffer gets here to avoid writing OOB on last.
}
if use8BitTables {
decompress4x_8b_main_loop_amd64(&ctx)
} else {
decompress4x_main_loop_amd64(&ctx)
}
decoded = ctx.decoded
out = out[decoded/4:]
}
// Decode remaining.
remainBytes := dstEvery - (decoded / 4)
for i := range br {
offset := dstEvery * i
endsAt := offset + remainBytes
if endsAt > len(out) {
endsAt = len(out)
}
br := &br[i]
bitsLeft := br.remaining()
for bitsLeft > 0 {
br.fill()
if offset >= endsAt {
return nil, errors.New("corruption detected: stream overrun 4")
}
// Read value and increment offset.
val := br.peekBitsFast(d.actualTableLog)
v := single[val&tlMask].entry
nBits := uint8(v)
br.advance(nBits)
bitsLeft -= uint(nBits)
out[offset] = uint8(v >> 8)
offset++
}
if offset != endsAt {
return nil, fmt.Errorf("corruption detected: short output block %d, end %d != %d", i, offset, endsAt)
}
decoded += offset - dstEvery*i
err = br.close()
if err != nil {
return nil, err
}
}
if dstSize != decoded {
return nil, errors.New("corruption detected: short output block")
}
return dst, nil
}
// decompress4x_main_loop_x86 is an x86 assembler implementation
// of Decompress1X when tablelog > 8.
//
//go:noescape
func decompress1x_main_loop_amd64(ctx *decompress1xContext)
// decompress4x_main_loop_x86 is an x86 with BMI2 assembler implementation
// of Decompress1X when tablelog > 8.
//
//go:noescape
func decompress1x_main_loop_bmi2(ctx *decompress1xContext)
type decompress1xContext struct {
pbr *bitReaderShifted
peekBits uint8
out *byte
outCap int
tbl *dEntrySingle
decoded int
}
// Error reported by asm implementations
const error_max_decoded_size_exeeded = -1
// Decompress1X will decompress a 1X encoded stream.
// The cap of the output buffer will be the maximum decompressed size.
// The length of the supplied input must match the end of a block exactly.
func (d *Decoder) Decompress1X(dst, src []byte) ([]byte, error) {
if len(d.dt.single) == 0 {
return nil, errors.New("no table loaded")
}
var br bitReaderShifted
err := br.init(src)
if err != nil {
return dst, err
}
maxDecodedSize := cap(dst)
dst = dst[:maxDecodedSize]
const tlSize = 1 << tableLogMax
const tlMask = tlSize - 1
if maxDecodedSize >= 4 {
ctx := decompress1xContext{
pbr: &br,
out: &dst[0],
outCap: maxDecodedSize,
peekBits: uint8((64 - d.actualTableLog) & 63), // see: bitReaderShifted.peekBitsFast()
tbl: &d.dt.single[0],
}
if cpuinfo.HasBMI2() {
decompress1x_main_loop_bmi2(&ctx)
} else {
decompress1x_main_loop_amd64(&ctx)
}
if ctx.decoded == error_max_decoded_size_exeeded {
return nil, ErrMaxDecodedSizeExceeded
}
dst = dst[:ctx.decoded]
}
// br < 8, so uint8 is fine
bitsLeft := uint8(br.off)*8 + 64 - br.bitsRead
for bitsLeft > 0 {
br.fill()
if len(dst) >= maxDecodedSize {
br.close()
return nil, ErrMaxDecodedSizeExceeded
}
v := d.dt.single[br.peekBitsFast(d.actualTableLog)&tlMask]
nBits := uint8(v.entry)
br.advance(nBits)
bitsLeft -= nBits
dst = append(dst, uint8(v.entry>>8))
}
return dst, br.close()
}

View File

@ -0,0 +1,830 @@
// Code generated by command: go run gen.go -out ../decompress_amd64.s -pkg=huff0. DO NOT EDIT.
//go:build amd64 && !appengine && !noasm && gc
// func decompress4x_main_loop_amd64(ctx *decompress4xContext)
TEXT ·decompress4x_main_loop_amd64(SB), $0-8
// Preload values
MOVQ ctx+0(FP), AX
MOVBQZX 8(AX), DI
MOVQ 16(AX), BX
MOVQ 48(AX), SI
MOVQ 24(AX), R8
MOVQ 32(AX), R9
MOVQ (AX), R10
// Main loop
main_loop:
XORL DX, DX
CMPQ BX, SI
SETGE DL
// br0.fillFast32()
MOVQ 32(R10), R11
MOVBQZX 40(R10), R12
CMPQ R12, $0x20
JBE skip_fill0
MOVQ 24(R10), AX
SUBQ $0x20, R12
SUBQ $0x04, AX
MOVQ (R10), R13
// b.value |= uint64(low) << (b.bitsRead & 63)
MOVL (AX)(R13*1), R13
MOVQ R12, CX
SHLQ CL, R13
MOVQ AX, 24(R10)
ORQ R13, R11
// exhausted += (br0.off < 4)
CMPQ AX, $0x04
ADCB $+0, DL
skip_fill0:
// val0 := br0.peekTopBits(peekBits)
MOVQ R11, R13
MOVQ DI, CX
SHRQ CL, R13
// v0 := table[val0&mask]
MOVW (R9)(R13*2), CX
// br0.advance(uint8(v0.entry)
MOVB CH, AL
SHLQ CL, R11
ADDB CL, R12
// val1 := br0.peekTopBits(peekBits)
MOVQ DI, CX
MOVQ R11, R13
SHRQ CL, R13
// v1 := table[val1&mask]
MOVW (R9)(R13*2), CX
// br0.advance(uint8(v1.entry))
MOVB CH, AH
SHLQ CL, R11
ADDB CL, R12
// these two writes get coalesced
// out[id * dstEvery + 0] = uint8(v0.entry >> 8)
// out[id * dstEvery + 1] = uint8(v1.entry >> 8)
MOVW AX, (BX)
// update the bitreader structure
MOVQ R11, 32(R10)
MOVB R12, 40(R10)
// br1.fillFast32()
MOVQ 80(R10), R11
MOVBQZX 88(R10), R12
CMPQ R12, $0x20
JBE skip_fill1
MOVQ 72(R10), AX
SUBQ $0x20, R12
SUBQ $0x04, AX
MOVQ 48(R10), R13
// b.value |= uint64(low) << (b.bitsRead & 63)
MOVL (AX)(R13*1), R13
MOVQ R12, CX
SHLQ CL, R13
MOVQ AX, 72(R10)
ORQ R13, R11
// exhausted += (br1.off < 4)
CMPQ AX, $0x04
ADCB $+0, DL
skip_fill1:
// val0 := br1.peekTopBits(peekBits)
MOVQ R11, R13
MOVQ DI, CX
SHRQ CL, R13
// v0 := table[val0&mask]
MOVW (R9)(R13*2), CX
// br1.advance(uint8(v0.entry)
MOVB CH, AL
SHLQ CL, R11
ADDB CL, R12
// val1 := br1.peekTopBits(peekBits)
MOVQ DI, CX
MOVQ R11, R13
SHRQ CL, R13
// v1 := table[val1&mask]
MOVW (R9)(R13*2), CX
// br1.advance(uint8(v1.entry))
MOVB CH, AH
SHLQ CL, R11
ADDB CL, R12
// these two writes get coalesced
// out[id * dstEvery + 0] = uint8(v0.entry >> 8)
// out[id * dstEvery + 1] = uint8(v1.entry >> 8)
MOVW AX, (BX)(R8*1)
// update the bitreader structure
MOVQ R11, 80(R10)
MOVB R12, 88(R10)
// br2.fillFast32()
MOVQ 128(R10), R11
MOVBQZX 136(R10), R12
CMPQ R12, $0x20
JBE skip_fill2
MOVQ 120(R10), AX
SUBQ $0x20, R12
SUBQ $0x04, AX
MOVQ 96(R10), R13
// b.value |= uint64(low) << (b.bitsRead & 63)
MOVL (AX)(R13*1), R13
MOVQ R12, CX
SHLQ CL, R13
MOVQ AX, 120(R10)
ORQ R13, R11
// exhausted += (br2.off < 4)
CMPQ AX, $0x04
ADCB $+0, DL
skip_fill2:
// val0 := br2.peekTopBits(peekBits)
MOVQ R11, R13
MOVQ DI, CX
SHRQ CL, R13
// v0 := table[val0&mask]
MOVW (R9)(R13*2), CX
// br2.advance(uint8(v0.entry)
MOVB CH, AL
SHLQ CL, R11
ADDB CL, R12
// val1 := br2.peekTopBits(peekBits)
MOVQ DI, CX
MOVQ R11, R13
SHRQ CL, R13
// v1 := table[val1&mask]
MOVW (R9)(R13*2), CX
// br2.advance(uint8(v1.entry))
MOVB CH, AH
SHLQ CL, R11
ADDB CL, R12
// these two writes get coalesced
// out[id * dstEvery + 0] = uint8(v0.entry >> 8)
// out[id * dstEvery + 1] = uint8(v1.entry >> 8)
MOVW AX, (BX)(R8*2)
// update the bitreader structure
MOVQ R11, 128(R10)
MOVB R12, 136(R10)
// br3.fillFast32()
MOVQ 176(R10), R11
MOVBQZX 184(R10), R12
CMPQ R12, $0x20
JBE skip_fill3
MOVQ 168(R10), AX
SUBQ $0x20, R12
SUBQ $0x04, AX
MOVQ 144(R10), R13
// b.value |= uint64(low) << (b.bitsRead & 63)
MOVL (AX)(R13*1), R13
MOVQ R12, CX
SHLQ CL, R13
MOVQ AX, 168(R10)
ORQ R13, R11
// exhausted += (br3.off < 4)
CMPQ AX, $0x04
ADCB $+0, DL
skip_fill3:
// val0 := br3.peekTopBits(peekBits)
MOVQ R11, R13
MOVQ DI, CX
SHRQ CL, R13
// v0 := table[val0&mask]
MOVW (R9)(R13*2), CX
// br3.advance(uint8(v0.entry)
MOVB CH, AL
SHLQ CL, R11
ADDB CL, R12
// val1 := br3.peekTopBits(peekBits)
MOVQ DI, CX
MOVQ R11, R13
SHRQ CL, R13
// v1 := table[val1&mask]
MOVW (R9)(R13*2), CX
// br3.advance(uint8(v1.entry))
MOVB CH, AH
SHLQ CL, R11
ADDB CL, R12
// these two writes get coalesced
// out[id * dstEvery + 0] = uint8(v0.entry >> 8)
// out[id * dstEvery + 1] = uint8(v1.entry >> 8)
LEAQ (R8)(R8*2), CX
MOVW AX, (BX)(CX*1)
// update the bitreader structure
MOVQ R11, 176(R10)
MOVB R12, 184(R10)
ADDQ $0x02, BX
TESTB DL, DL
JZ main_loop
MOVQ ctx+0(FP), AX
SUBQ 16(AX), BX
SHLQ $0x02, BX
MOVQ BX, 40(AX)
RET
// func decompress4x_8b_main_loop_amd64(ctx *decompress4xContext)
TEXT ·decompress4x_8b_main_loop_amd64(SB), $0-8
// Preload values
MOVQ ctx+0(FP), CX
MOVBQZX 8(CX), DI
MOVQ 16(CX), BX
MOVQ 48(CX), SI
MOVQ 24(CX), R8
MOVQ 32(CX), R9
MOVQ (CX), R10
// Main loop
main_loop:
XORL DX, DX
CMPQ BX, SI
SETGE DL
// br0.fillFast32()
MOVQ 32(R10), R11
MOVBQZX 40(R10), R12
CMPQ R12, $0x20
JBE skip_fill0
MOVQ 24(R10), R13
SUBQ $0x20, R12
SUBQ $0x04, R13
MOVQ (R10), R14
// b.value |= uint64(low) << (b.bitsRead & 63)
MOVL (R13)(R14*1), R14
MOVQ R12, CX
SHLQ CL, R14
MOVQ R13, 24(R10)
ORQ R14, R11
// exhausted += (br0.off < 4)
CMPQ R13, $0x04
ADCB $+0, DL
skip_fill0:
// val0 := br0.peekTopBits(peekBits)
MOVQ R11, R13
MOVQ DI, CX
SHRQ CL, R13
// v0 := table[val0&mask]
MOVW (R9)(R13*2), CX
// br0.advance(uint8(v0.entry)
MOVB CH, AL
SHLQ CL, R11
ADDB CL, R12
// val1 := br0.peekTopBits(peekBits)
MOVQ R11, R13
MOVQ DI, CX
SHRQ CL, R13
// v1 := table[val0&mask]
MOVW (R9)(R13*2), CX
// br0.advance(uint8(v1.entry)
MOVB CH, AH
SHLQ CL, R11
ADDB CL, R12
BSWAPL AX
// val2 := br0.peekTopBits(peekBits)
MOVQ R11, R13
MOVQ DI, CX
SHRQ CL, R13
// v2 := table[val0&mask]
MOVW (R9)(R13*2), CX
// br0.advance(uint8(v2.entry)
MOVB CH, AH
SHLQ CL, R11
ADDB CL, R12
// val3 := br0.peekTopBits(peekBits)
MOVQ R11, R13
MOVQ DI, CX
SHRQ CL, R13
// v3 := table[val0&mask]
MOVW (R9)(R13*2), CX
// br0.advance(uint8(v3.entry)
MOVB CH, AL
SHLQ CL, R11
ADDB CL, R12
BSWAPL AX
// these four writes get coalesced
// out[id * dstEvery + 0] = uint8(v0.entry >> 8)
// out[id * dstEvery + 1] = uint8(v1.entry >> 8)
// out[id * dstEvery + 3] = uint8(v2.entry >> 8)
// out[id * dstEvery + 4] = uint8(v3.entry >> 8)
MOVL AX, (BX)
// update the bitreader structure
MOVQ R11, 32(R10)
MOVB R12, 40(R10)
// br1.fillFast32()
MOVQ 80(R10), R11
MOVBQZX 88(R10), R12
CMPQ R12, $0x20
JBE skip_fill1
MOVQ 72(R10), R13
SUBQ $0x20, R12
SUBQ $0x04, R13
MOVQ 48(R10), R14
// b.value |= uint64(low) << (b.bitsRead & 63)
MOVL (R13)(R14*1), R14
MOVQ R12, CX
SHLQ CL, R14
MOVQ R13, 72(R10)
ORQ R14, R11
// exhausted += (br1.off < 4)
CMPQ R13, $0x04
ADCB $+0, DL
skip_fill1:
// val0 := br1.peekTopBits(peekBits)
MOVQ R11, R13
MOVQ DI, CX
SHRQ CL, R13
// v0 := table[val0&mask]
MOVW (R9)(R13*2), CX
// br1.advance(uint8(v0.entry)
MOVB CH, AL
SHLQ CL, R11
ADDB CL, R12
// val1 := br1.peekTopBits(peekBits)
MOVQ R11, R13
MOVQ DI, CX
SHRQ CL, R13
// v1 := table[val0&mask]
MOVW (R9)(R13*2), CX
// br1.advance(uint8(v1.entry)
MOVB CH, AH
SHLQ CL, R11
ADDB CL, R12
BSWAPL AX
// val2 := br1.peekTopBits(peekBits)
MOVQ R11, R13
MOVQ DI, CX
SHRQ CL, R13
// v2 := table[val0&mask]
MOVW (R9)(R13*2), CX
// br1.advance(uint8(v2.entry)
MOVB CH, AH
SHLQ CL, R11
ADDB CL, R12
// val3 := br1.peekTopBits(peekBits)
MOVQ R11, R13
MOVQ DI, CX
SHRQ CL, R13
// v3 := table[val0&mask]
MOVW (R9)(R13*2), CX
// br1.advance(uint8(v3.entry)
MOVB CH, AL
SHLQ CL, R11
ADDB CL, R12
BSWAPL AX
// these four writes get coalesced
// out[id * dstEvery + 0] = uint8(v0.entry >> 8)
// out[id * dstEvery + 1] = uint8(v1.entry >> 8)
// out[id * dstEvery + 3] = uint8(v2.entry >> 8)
// out[id * dstEvery + 4] = uint8(v3.entry >> 8)
MOVL AX, (BX)(R8*1)
// update the bitreader structure
MOVQ R11, 80(R10)
MOVB R12, 88(R10)
// br2.fillFast32()
MOVQ 128(R10), R11
MOVBQZX 136(R10), R12
CMPQ R12, $0x20
JBE skip_fill2
MOVQ 120(R10), R13
SUBQ $0x20, R12
SUBQ $0x04, R13
MOVQ 96(R10), R14
// b.value |= uint64(low) << (b.bitsRead & 63)
MOVL (R13)(R14*1), R14
MOVQ R12, CX
SHLQ CL, R14
MOVQ R13, 120(R10)
ORQ R14, R11
// exhausted += (br2.off < 4)
CMPQ R13, $0x04
ADCB $+0, DL
skip_fill2:
// val0 := br2.peekTopBits(peekBits)
MOVQ R11, R13
MOVQ DI, CX
SHRQ CL, R13
// v0 := table[val0&mask]
MOVW (R9)(R13*2), CX
// br2.advance(uint8(v0.entry)
MOVB CH, AL
SHLQ CL, R11
ADDB CL, R12
// val1 := br2.peekTopBits(peekBits)
MOVQ R11, R13
MOVQ DI, CX
SHRQ CL, R13
// v1 := table[val0&mask]
MOVW (R9)(R13*2), CX
// br2.advance(uint8(v1.entry)
MOVB CH, AH
SHLQ CL, R11
ADDB CL, R12
BSWAPL AX
// val2 := br2.peekTopBits(peekBits)
MOVQ R11, R13
MOVQ DI, CX
SHRQ CL, R13
// v2 := table[val0&mask]
MOVW (R9)(R13*2), CX
// br2.advance(uint8(v2.entry)
MOVB CH, AH
SHLQ CL, R11
ADDB CL, R12
// val3 := br2.peekTopBits(peekBits)
MOVQ R11, R13
MOVQ DI, CX
SHRQ CL, R13
// v3 := table[val0&mask]
MOVW (R9)(R13*2), CX
// br2.advance(uint8(v3.entry)
MOVB CH, AL
SHLQ CL, R11
ADDB CL, R12
BSWAPL AX
// these four writes get coalesced
// out[id * dstEvery + 0] = uint8(v0.entry >> 8)
// out[id * dstEvery + 1] = uint8(v1.entry >> 8)
// out[id * dstEvery + 3] = uint8(v2.entry >> 8)
// out[id * dstEvery + 4] = uint8(v3.entry >> 8)
MOVL AX, (BX)(R8*2)
// update the bitreader structure
MOVQ R11, 128(R10)
MOVB R12, 136(R10)
// br3.fillFast32()
MOVQ 176(R10), R11
MOVBQZX 184(R10), R12
CMPQ R12, $0x20
JBE skip_fill3
MOVQ 168(R10), R13
SUBQ $0x20, R12
SUBQ $0x04, R13
MOVQ 144(R10), R14
// b.value |= uint64(low) << (b.bitsRead & 63)
MOVL (R13)(R14*1), R14
MOVQ R12, CX
SHLQ CL, R14
MOVQ R13, 168(R10)
ORQ R14, R11
// exhausted += (br3.off < 4)
CMPQ R13, $0x04
ADCB $+0, DL
skip_fill3:
// val0 := br3.peekTopBits(peekBits)
MOVQ R11, R13
MOVQ DI, CX
SHRQ CL, R13
// v0 := table[val0&mask]
MOVW (R9)(R13*2), CX
// br3.advance(uint8(v0.entry)
MOVB CH, AL
SHLQ CL, R11
ADDB CL, R12
// val1 := br3.peekTopBits(peekBits)
MOVQ R11, R13
MOVQ DI, CX
SHRQ CL, R13
// v1 := table[val0&mask]
MOVW (R9)(R13*2), CX
// br3.advance(uint8(v1.entry)
MOVB CH, AH
SHLQ CL, R11
ADDB CL, R12
BSWAPL AX
// val2 := br3.peekTopBits(peekBits)
MOVQ R11, R13
MOVQ DI, CX
SHRQ CL, R13
// v2 := table[val0&mask]
MOVW (R9)(R13*2), CX
// br3.advance(uint8(v2.entry)
MOVB CH, AH
SHLQ CL, R11
ADDB CL, R12
// val3 := br3.peekTopBits(peekBits)
MOVQ R11, R13
MOVQ DI, CX
SHRQ CL, R13
// v3 := table[val0&mask]
MOVW (R9)(R13*2), CX
// br3.advance(uint8(v3.entry)
MOVB CH, AL
SHLQ CL, R11
ADDB CL, R12
BSWAPL AX
// these four writes get coalesced
// out[id * dstEvery + 0] = uint8(v0.entry >> 8)
// out[id * dstEvery + 1] = uint8(v1.entry >> 8)
// out[id * dstEvery + 3] = uint8(v2.entry >> 8)
// out[id * dstEvery + 4] = uint8(v3.entry >> 8)
LEAQ (R8)(R8*2), CX
MOVL AX, (BX)(CX*1)
// update the bitreader structure
MOVQ R11, 176(R10)
MOVB R12, 184(R10)
ADDQ $0x04, BX
TESTB DL, DL
JZ main_loop
MOVQ ctx+0(FP), AX
SUBQ 16(AX), BX
SHLQ $0x02, BX
MOVQ BX, 40(AX)
RET
// func decompress1x_main_loop_amd64(ctx *decompress1xContext)
TEXT ·decompress1x_main_loop_amd64(SB), $0-8
MOVQ ctx+0(FP), CX
MOVQ 16(CX), DX
MOVQ 24(CX), BX
CMPQ BX, $0x04
JB error_max_decoded_size_exceeded
LEAQ (DX)(BX*1), BX
MOVQ (CX), SI
MOVQ (SI), R8
MOVQ 24(SI), R9
MOVQ 32(SI), R10
MOVBQZX 40(SI), R11
MOVQ 32(CX), SI
MOVBQZX 8(CX), DI
JMP loop_condition
main_loop:
// Check if we have room for 4 bytes in the output buffer
LEAQ 4(DX), CX
CMPQ CX, BX
JGE error_max_decoded_size_exceeded
// Decode 4 values
CMPQ R11, $0x20
JL bitReader_fillFast_1_end
SUBQ $0x20, R11
SUBQ $0x04, R9
MOVL (R8)(R9*1), R12
MOVQ R11, CX
SHLQ CL, R12
ORQ R12, R10
bitReader_fillFast_1_end:
MOVQ DI, CX
MOVQ R10, R12
SHRQ CL, R12
MOVW (SI)(R12*2), CX
MOVB CH, AL
MOVBQZX CL, CX
ADDQ CX, R11
SHLQ CL, R10
MOVQ DI, CX
MOVQ R10, R12
SHRQ CL, R12
MOVW (SI)(R12*2), CX
MOVB CH, AH
MOVBQZX CL, CX
ADDQ CX, R11
SHLQ CL, R10
BSWAPL AX
CMPQ R11, $0x20
JL bitReader_fillFast_2_end
SUBQ $0x20, R11
SUBQ $0x04, R9
MOVL (R8)(R9*1), R12
MOVQ R11, CX
SHLQ CL, R12
ORQ R12, R10
bitReader_fillFast_2_end:
MOVQ DI, CX
MOVQ R10, R12
SHRQ CL, R12
MOVW (SI)(R12*2), CX
MOVB CH, AH
MOVBQZX CL, CX
ADDQ CX, R11
SHLQ CL, R10
MOVQ DI, CX
MOVQ R10, R12
SHRQ CL, R12
MOVW (SI)(R12*2), CX
MOVB CH, AL
MOVBQZX CL, CX
ADDQ CX, R11
SHLQ CL, R10
BSWAPL AX
// Store the decoded values
MOVL AX, (DX)
ADDQ $0x04, DX
loop_condition:
CMPQ R9, $0x08
JGE main_loop
// Update ctx structure
MOVQ ctx+0(FP), AX
SUBQ 16(AX), DX
MOVQ DX, 40(AX)
MOVQ (AX), AX
MOVQ R9, 24(AX)
MOVQ R10, 32(AX)
MOVB R11, 40(AX)
RET
// Report error
error_max_decoded_size_exceeded:
MOVQ ctx+0(FP), AX
MOVQ $-1, CX
MOVQ CX, 40(AX)
RET
// func decompress1x_main_loop_bmi2(ctx *decompress1xContext)
// Requires: BMI2
TEXT ·decompress1x_main_loop_bmi2(SB), $0-8
MOVQ ctx+0(FP), CX
MOVQ 16(CX), DX
MOVQ 24(CX), BX
CMPQ BX, $0x04
JB error_max_decoded_size_exceeded
LEAQ (DX)(BX*1), BX
MOVQ (CX), SI
MOVQ (SI), R8
MOVQ 24(SI), R9
MOVQ 32(SI), R10
MOVBQZX 40(SI), R11
MOVQ 32(CX), SI
MOVBQZX 8(CX), DI
JMP loop_condition
main_loop:
// Check if we have room for 4 bytes in the output buffer
LEAQ 4(DX), CX
CMPQ CX, BX
JGE error_max_decoded_size_exceeded
// Decode 4 values
CMPQ R11, $0x20
JL bitReader_fillFast_1_end
SUBQ $0x20, R11
SUBQ $0x04, R9
MOVL (R8)(R9*1), CX
SHLXQ R11, CX, CX
ORQ CX, R10
bitReader_fillFast_1_end:
SHRXQ DI, R10, CX
MOVW (SI)(CX*2), CX
MOVB CH, AL
MOVBQZX CL, CX
ADDQ CX, R11
SHLXQ CX, R10, R10
SHRXQ DI, R10, CX
MOVW (SI)(CX*2), CX
MOVB CH, AH
MOVBQZX CL, CX
ADDQ CX, R11
SHLXQ CX, R10, R10
BSWAPL AX
CMPQ R11, $0x20
JL bitReader_fillFast_2_end
SUBQ $0x20, R11
SUBQ $0x04, R9
MOVL (R8)(R9*1), CX
SHLXQ R11, CX, CX
ORQ CX, R10
bitReader_fillFast_2_end:
SHRXQ DI, R10, CX
MOVW (SI)(CX*2), CX
MOVB CH, AH
MOVBQZX CL, CX
ADDQ CX, R11
SHLXQ CX, R10, R10
SHRXQ DI, R10, CX
MOVW (SI)(CX*2), CX
MOVB CH, AL
MOVBQZX CL, CX
ADDQ CX, R11
SHLXQ CX, R10, R10
BSWAPL AX
// Store the decoded values
MOVL AX, (DX)
ADDQ $0x04, DX
loop_condition:
CMPQ R9, $0x08
JGE main_loop
// Update ctx structure
MOVQ ctx+0(FP), AX
SUBQ 16(AX), DX
MOVQ DX, 40(AX)
MOVQ (AX), AX
MOVQ R9, 24(AX)
MOVQ R10, 32(AX)
MOVB R11, 40(AX)
RET
// Report error
error_max_decoded_size_exceeded:
MOVQ ctx+0(FP), AX
MOVQ $-1, CX
MOVQ CX, 40(AX)
RET

View File

@ -0,0 +1,299 @@
//go:build !amd64 || appengine || !gc || noasm
// +build !amd64 appengine !gc noasm
// This file contains a generic implementation of Decoder.Decompress4X.
package huff0
import (
"errors"
"fmt"
)
// Decompress4X will decompress a 4X encoded stream.
// The length of the supplied input must match the end of a block exactly.
// The *capacity* of the dst slice must match the destination size of
// the uncompressed data exactly.
func (d *Decoder) Decompress4X(dst, src []byte) ([]byte, error) {
if len(d.dt.single) == 0 {
return nil, errors.New("no table loaded")
}
if len(src) < 6+(4*1) {
return nil, errors.New("input too small")
}
if use8BitTables && d.actualTableLog <= 8 {
return d.decompress4X8bit(dst, src)
}
var br [4]bitReaderShifted
// Decode "jump table"
start := 6
for i := 0; i < 3; i++ {
length := int(src[i*2]) | (int(src[i*2+1]) << 8)
if start+length >= len(src) {
return nil, errors.New("truncated input (or invalid offset)")
}
err := br[i].init(src[start : start+length])
if err != nil {
return nil, err
}
start += length
}
err := br[3].init(src[start:])
if err != nil {
return nil, err
}
// destination, offset to match first output
dstSize := cap(dst)
dst = dst[:dstSize]
out := dst
dstEvery := (dstSize + 3) / 4
const tlSize = 1 << tableLogMax
const tlMask = tlSize - 1
single := d.dt.single[:tlSize]
// Use temp table to avoid bound checks/append penalty.
buf := d.buffer()
var off uint8
var decoded int
// Decode 2 values from each decoder/loop.
const bufoff = 256
for {
if br[0].off < 4 || br[1].off < 4 || br[2].off < 4 || br[3].off < 4 {
break
}
{
const stream = 0
const stream2 = 1
br[stream].fillFast()
br[stream2].fillFast()
val := br[stream].peekBitsFast(d.actualTableLog)
val2 := br[stream2].peekBitsFast(d.actualTableLog)
v := single[val&tlMask]
v2 := single[val2&tlMask]
br[stream].advance(uint8(v.entry))
br[stream2].advance(uint8(v2.entry))
buf[stream][off] = uint8(v.entry >> 8)
buf[stream2][off] = uint8(v2.entry >> 8)
val = br[stream].peekBitsFast(d.actualTableLog)
val2 = br[stream2].peekBitsFast(d.actualTableLog)
v = single[val&tlMask]
v2 = single[val2&tlMask]
br[stream].advance(uint8(v.entry))
br[stream2].advance(uint8(v2.entry))
buf[stream][off+1] = uint8(v.entry >> 8)
buf[stream2][off+1] = uint8(v2.entry >> 8)
}
{
const stream = 2
const stream2 = 3
br[stream].fillFast()
br[stream2].fillFast()
val := br[stream].peekBitsFast(d.actualTableLog)
val2 := br[stream2].peekBitsFast(d.actualTableLog)
v := single[val&tlMask]
v2 := single[val2&tlMask]
br[stream].advance(uint8(v.entry))
br[stream2].advance(uint8(v2.entry))
buf[stream][off] = uint8(v.entry >> 8)
buf[stream2][off] = uint8(v2.entry >> 8)
val = br[stream].peekBitsFast(d.actualTableLog)
val2 = br[stream2].peekBitsFast(d.actualTableLog)
v = single[val&tlMask]
v2 = single[val2&tlMask]
br[stream].advance(uint8(v.entry))
br[stream2].advance(uint8(v2.entry))
buf[stream][off+1] = uint8(v.entry >> 8)
buf[stream2][off+1] = uint8(v2.entry >> 8)
}
off += 2
if off == 0 {
if bufoff > dstEvery {
d.bufs.Put(buf)
return nil, errors.New("corruption detected: stream overrun 1")
}
// There must at least be 3 buffers left.
if len(out)-bufoff < dstEvery*3 {
d.bufs.Put(buf)
return nil, errors.New("corruption detected: stream overrun 2")
}
//copy(out, buf[0][:])
//copy(out[dstEvery:], buf[1][:])
//copy(out[dstEvery*2:], buf[2][:])
//copy(out[dstEvery*3:], buf[3][:])
*(*[bufoff]byte)(out) = buf[0]
*(*[bufoff]byte)(out[dstEvery:]) = buf[1]
*(*[bufoff]byte)(out[dstEvery*2:]) = buf[2]
*(*[bufoff]byte)(out[dstEvery*3:]) = buf[3]
out = out[bufoff:]
decoded += bufoff * 4
}
}
if off > 0 {
ioff := int(off)
if len(out) < dstEvery*3+ioff {
d.bufs.Put(buf)
return nil, errors.New("corruption detected: stream overrun 3")
}
copy(out, buf[0][:off])
copy(out[dstEvery:], buf[1][:off])
copy(out[dstEvery*2:], buf[2][:off])
copy(out[dstEvery*3:], buf[3][:off])
decoded += int(off) * 4
out = out[off:]
}
// Decode remaining.
remainBytes := dstEvery - (decoded / 4)
for i := range br {
offset := dstEvery * i
endsAt := offset + remainBytes
if endsAt > len(out) {
endsAt = len(out)
}
br := &br[i]
bitsLeft := br.remaining()
for bitsLeft > 0 {
br.fill()
if offset >= endsAt {
d.bufs.Put(buf)
return nil, errors.New("corruption detected: stream overrun 4")
}
// Read value and increment offset.
val := br.peekBitsFast(d.actualTableLog)
v := single[val&tlMask].entry
nBits := uint8(v)
br.advance(nBits)
bitsLeft -= uint(nBits)
out[offset] = uint8(v >> 8)
offset++
}
if offset != endsAt {
d.bufs.Put(buf)
return nil, fmt.Errorf("corruption detected: short output block %d, end %d != %d", i, offset, endsAt)
}
decoded += offset - dstEvery*i
err = br.close()
if err != nil {
return nil, err
}
}
d.bufs.Put(buf)
if dstSize != decoded {
return nil, errors.New("corruption detected: short output block")
}
return dst, nil
}
// Decompress1X will decompress a 1X encoded stream.
// The cap of the output buffer will be the maximum decompressed size.
// The length of the supplied input must match the end of a block exactly.
func (d *Decoder) Decompress1X(dst, src []byte) ([]byte, error) {
if len(d.dt.single) == 0 {
return nil, errors.New("no table loaded")
}
if use8BitTables && d.actualTableLog <= 8 {
return d.decompress1X8Bit(dst, src)
}
var br bitReaderShifted
err := br.init(src)
if err != nil {
return dst, err
}
maxDecodedSize := cap(dst)
dst = dst[:0]
// Avoid bounds check by always having full sized table.
const tlSize = 1 << tableLogMax
const tlMask = tlSize - 1
dt := d.dt.single[:tlSize]
// Use temp table to avoid bound checks/append penalty.
bufs := d.buffer()
buf := &bufs[0]
var off uint8
for br.off >= 8 {
br.fillFast()
v := dt[br.peekBitsFast(d.actualTableLog)&tlMask]
br.advance(uint8(v.entry))
buf[off+0] = uint8(v.entry >> 8)
v = dt[br.peekBitsFast(d.actualTableLog)&tlMask]
br.advance(uint8(v.entry))
buf[off+1] = uint8(v.entry >> 8)
// Refill
br.fillFast()
v = dt[br.peekBitsFast(d.actualTableLog)&tlMask]
br.advance(uint8(v.entry))
buf[off+2] = uint8(v.entry >> 8)
v = dt[br.peekBitsFast(d.actualTableLog)&tlMask]
br.advance(uint8(v.entry))
buf[off+3] = uint8(v.entry >> 8)
off += 4
if off == 0 {
if len(dst)+256 > maxDecodedSize {
br.close()
d.bufs.Put(bufs)
return nil, ErrMaxDecodedSizeExceeded
}
dst = append(dst, buf[:]...)
}
}
if len(dst)+int(off) > maxDecodedSize {
d.bufs.Put(bufs)
br.close()
return nil, ErrMaxDecodedSizeExceeded
}
dst = append(dst, buf[:off]...)
// br < 8, so uint8 is fine
bitsLeft := uint8(br.off)*8 + 64 - br.bitsRead
for bitsLeft > 0 {
br.fill()
if false && br.bitsRead >= 32 {
if br.off >= 4 {
v := br.in[br.off-4:]
v = v[:4]
low := (uint32(v[0])) | (uint32(v[1]) << 8) | (uint32(v[2]) << 16) | (uint32(v[3]) << 24)
br.value = (br.value << 32) | uint64(low)
br.bitsRead -= 32
br.off -= 4
} else {
for br.off > 0 {
br.value = (br.value << 8) | uint64(br.in[br.off-1])
br.bitsRead -= 8
br.off--
}
}
}
if len(dst) >= maxDecodedSize {
d.bufs.Put(bufs)
br.close()
return nil, ErrMaxDecodedSizeExceeded
}
v := d.dt.single[br.peekBitsFast(d.actualTableLog)&tlMask]
nBits := uint8(v.entry)
br.advance(nBits)
bitsLeft -= nBits
dst = append(dst, uint8(v.entry>>8))
}
d.bufs.Put(bufs)
return dst, br.close()
}

337
vendor/github.com/klauspost/compress/huff0/huff0.go generated vendored Normal file
View File

@ -0,0 +1,337 @@
// Package huff0 provides fast huffman encoding as used in zstd.
//
// See README.md at https://github.com/klauspost/compress/tree/master/huff0 for details.
package huff0
import (
"errors"
"fmt"
"math"
"math/bits"
"sync"
"github.com/klauspost/compress/fse"
)
const (
maxSymbolValue = 255
// zstandard limits tablelog to 11, see:
// https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#huffman-tree-description
tableLogMax = 11
tableLogDefault = 11
minTablelog = 5
huffNodesLen = 512
// BlockSizeMax is maximum input size for a single block uncompressed.
BlockSizeMax = 1<<18 - 1
)
var (
// ErrIncompressible is returned when input is judged to be too hard to compress.
ErrIncompressible = errors.New("input is not compressible")
// ErrUseRLE is returned from the compressor when the input is a single byte value repeated.
ErrUseRLE = errors.New("input is single value repeated")
// ErrTooBig is return if input is too large for a single block.
ErrTooBig = errors.New("input too big")
// ErrMaxDecodedSizeExceeded is return if input is too large for a single block.
ErrMaxDecodedSizeExceeded = errors.New("maximum output size exceeded")
)
type ReusePolicy uint8
const (
// ReusePolicyAllow will allow reuse if it produces smaller output.
ReusePolicyAllow ReusePolicy = iota
// ReusePolicyPrefer will re-use aggressively if possible.
// This will not check if a new table will produce smaller output,
// except if the current table is impossible to use or
// compressed output is bigger than input.
ReusePolicyPrefer
// ReusePolicyNone will disable re-use of tables.
// This is slightly faster than ReusePolicyAllow but may produce larger output.
ReusePolicyNone
// ReusePolicyMust must allow reuse and produce smaller output.
ReusePolicyMust
)
type Scratch struct {
count [maxSymbolValue + 1]uint32
// Per block parameters.
// These can be used to override compression parameters of the block.
// Do not touch, unless you know what you are doing.
// Out is output buffer.
// If the scratch is re-used before the caller is done processing the output,
// set this field to nil.
// Otherwise the output buffer will be re-used for next Compression/Decompression step
// and allocation will be avoided.
Out []byte
// OutTable will contain the table data only, if a new table has been generated.
// Slice of the returned data.
OutTable []byte
// OutData will contain the compressed data.
// Slice of the returned data.
OutData []byte
// MaxDecodedSize will set the maximum allowed output size.
// This value will automatically be set to BlockSizeMax if not set.
// Decoders will return ErrMaxDecodedSizeExceeded is this limit is exceeded.
MaxDecodedSize int
srcLen int
// MaxSymbolValue will override the maximum symbol value of the next block.
MaxSymbolValue uint8
// TableLog will attempt to override the tablelog for the next block.
// Must be <= 11 and >= 5.
TableLog uint8
// Reuse will specify the reuse policy
Reuse ReusePolicy
// WantLogLess allows to specify a log 2 reduction that should at least be achieved,
// otherwise the block will be returned as incompressible.
// The reduction should then at least be (input size >> WantLogLess)
// If WantLogLess == 0 any improvement will do.
WantLogLess uint8
symbolLen uint16 // Length of active part of the symbol table.
maxCount int // count of the most probable symbol
clearCount bool // clear count
actualTableLog uint8 // Selected tablelog.
prevTableLog uint8 // Tablelog for previous table
prevTable cTable // Table used for previous compression.
cTable cTable // compression table
dt dTable // decompression table
nodes []nodeElt
tmpOut [4][]byte
fse *fse.Scratch
decPool sync.Pool // *[4][256]byte buffers.
huffWeight [maxSymbolValue + 1]byte
}
// TransferCTable will transfer the previously used compression table.
func (s *Scratch) TransferCTable(src *Scratch) {
if cap(s.prevTable) < len(src.prevTable) {
s.prevTable = make(cTable, 0, maxSymbolValue+1)
}
s.prevTable = s.prevTable[:len(src.prevTable)]
copy(s.prevTable, src.prevTable)
s.prevTableLog = src.prevTableLog
}
func (s *Scratch) prepare(in []byte) (*Scratch, error) {
if len(in) > BlockSizeMax {
return nil, ErrTooBig
}
if s == nil {
s = &Scratch{}
}
if s.MaxSymbolValue == 0 {
s.MaxSymbolValue = maxSymbolValue
}
if s.TableLog == 0 {
s.TableLog = tableLogDefault
}
if s.TableLog > tableLogMax || s.TableLog < minTablelog {
return nil, fmt.Errorf(" invalid tableLog %d (%d -> %d)", s.TableLog, minTablelog, tableLogMax)
}
if s.MaxDecodedSize <= 0 || s.MaxDecodedSize > BlockSizeMax {
s.MaxDecodedSize = BlockSizeMax
}
if s.clearCount && s.maxCount == 0 {
for i := range s.count {
s.count[i] = 0
}
s.clearCount = false
}
if cap(s.Out) == 0 {
s.Out = make([]byte, 0, len(in))
}
s.Out = s.Out[:0]
s.OutTable = nil
s.OutData = nil
if cap(s.nodes) < huffNodesLen+1 {
s.nodes = make([]nodeElt, 0, huffNodesLen+1)
}
s.nodes = s.nodes[:0]
if s.fse == nil {
s.fse = &fse.Scratch{}
}
s.srcLen = len(in)
return s, nil
}
type cTable []cTableEntry
func (c cTable) write(s *Scratch) error {
var (
// precomputed conversion table
bitsToWeight [tableLogMax + 1]byte
huffLog = s.actualTableLog
// last weight is not saved.
maxSymbolValue = uint8(s.symbolLen - 1)
huffWeight = s.huffWeight[:256]
)
const (
maxFSETableLog = 6
)
// convert to weight
bitsToWeight[0] = 0
for n := uint8(1); n < huffLog+1; n++ {
bitsToWeight[n] = huffLog + 1 - n
}
// Acquire histogram for FSE.
hist := s.fse.Histogram()
hist = hist[:256]
for i := range hist[:16] {
hist[i] = 0
}
for n := uint8(0); n < maxSymbolValue; n++ {
v := bitsToWeight[c[n].nBits] & 15
huffWeight[n] = v
hist[v]++
}
// FSE compress if feasible.
if maxSymbolValue >= 2 {
huffMaxCnt := uint32(0)
huffMax := uint8(0)
for i, v := range hist[:16] {
if v == 0 {
continue
}
huffMax = byte(i)
if v > huffMaxCnt {
huffMaxCnt = v
}
}
s.fse.HistogramFinished(huffMax, int(huffMaxCnt))
s.fse.TableLog = maxFSETableLog
b, err := fse.Compress(huffWeight[:maxSymbolValue], s.fse)
if err == nil && len(b) < int(s.symbolLen>>1) {
s.Out = append(s.Out, uint8(len(b)))
s.Out = append(s.Out, b...)
return nil
}
// Unable to compress (RLE/uncompressible)
}
// write raw values as 4-bits (max : 15)
if maxSymbolValue > (256 - 128) {
// should not happen : likely means source cannot be compressed
return ErrIncompressible
}
op := s.Out
// special case, pack weights 4 bits/weight.
op = append(op, 128|(maxSymbolValue-1))
// be sure it doesn't cause msan issue in final combination
huffWeight[maxSymbolValue] = 0
for n := uint16(0); n < uint16(maxSymbolValue); n += 2 {
op = append(op, (huffWeight[n]<<4)|huffWeight[n+1])
}
s.Out = op
return nil
}
func (c cTable) estTableSize(s *Scratch) (sz int, err error) {
var (
// precomputed conversion table
bitsToWeight [tableLogMax + 1]byte
huffLog = s.actualTableLog
// last weight is not saved.
maxSymbolValue = uint8(s.symbolLen - 1)
huffWeight = s.huffWeight[:256]
)
const (
maxFSETableLog = 6
)
// convert to weight
bitsToWeight[0] = 0
for n := uint8(1); n < huffLog+1; n++ {
bitsToWeight[n] = huffLog + 1 - n
}
// Acquire histogram for FSE.
hist := s.fse.Histogram()
hist = hist[:256]
for i := range hist[:16] {
hist[i] = 0
}
for n := uint8(0); n < maxSymbolValue; n++ {
v := bitsToWeight[c[n].nBits] & 15
huffWeight[n] = v
hist[v]++
}
// FSE compress if feasible.
if maxSymbolValue >= 2 {
huffMaxCnt := uint32(0)
huffMax := uint8(0)
for i, v := range hist[:16] {
if v == 0 {
continue
}
huffMax = byte(i)
if v > huffMaxCnt {
huffMaxCnt = v
}
}
s.fse.HistogramFinished(huffMax, int(huffMaxCnt))
s.fse.TableLog = maxFSETableLog
b, err := fse.Compress(huffWeight[:maxSymbolValue], s.fse)
if err == nil && len(b) < int(s.symbolLen>>1) {
sz += 1 + len(b)
return sz, nil
}
// Unable to compress (RLE/uncompressible)
}
// write raw values as 4-bits (max : 15)
if maxSymbolValue > (256 - 128) {
// should not happen : likely means source cannot be compressed
return 0, ErrIncompressible
}
// special case, pack weights 4 bits/weight.
sz += 1 + int(maxSymbolValue/2)
return sz, nil
}
// estimateSize returns the estimated size in bytes of the input represented in the
// histogram supplied.
func (c cTable) estimateSize(hist []uint32) int {
nbBits := uint32(7)
for i, v := range c[:len(hist)] {
nbBits += uint32(v.nBits) * hist[i]
}
return int(nbBits >> 3)
}
// minSize returns the minimum possible size considering the shannon limit.
func (s *Scratch) minSize(total int) int {
nbBits := float64(7)
fTotal := float64(total)
for _, v := range s.count[:s.symbolLen] {
n := float64(v)
if n > 0 {
nbBits += math.Log2(fTotal/n) * n
}
}
return int(nbBits) >> 3
}
func highBit32(val uint32) (n uint32) {
return uint32(bits.Len32(val) - 1)
}