Make FLINT memory-efficient

The main constraint for solving large problems often isn't time but memory. For example, on my laptop with 32 GB memory, the largest ``nmod_poly`` multiplication with 64-bit modulus that I can do is with inputs of length around $N = 10^8$. This takes just 25 seconds on a single core. Trying the same multiplication with 8 cores or with $N$ increased to $2 \cdot 10^8$ kills the process. Note that ``4 * (10^8) * (64 bits) = 3.2 GB``, so the inputs and outputs use just 10% of RAM; the rest of the memory is consumed by temporary buffers in ``fft_small``. With a more memory-efficient multiplication algorithm, I should be able to solve a 10x larger problem entirely in RAM. If FLINT were able to use file storage, I should be able to do a multiplication 100x as large.

## Memory limits

A first idea is to add an interface along the following lines:

``void flint_set_local_memory_limit(slong bytes)``

``slong flint_get_local_memory_limit()``

The idea is that when FLINT selects between algorithms, e.g. for polynomial multiplication, it could try to estimate in advance the size of temporary allocations. If this exceeds ``flint_get_local_memory_limit()`` for the fastest algorithm (e.g. an FFT), we switch to a potentially slower, but less memory-hungry algorithm. Note that there are already various hardcoded cutoffs in FLINT along these lines. An example is the hardcoded 4 GB sieve limit for the Riemann zeta function: https://github.com/flintlib/flint/blob/d29dc3cf119a3a56a6a842f6a8f16a838b55456a/src/acb_poly/zeta_em_sum.c#L33 Another example is the following condition in ``arb_mat_mul``: https://github.com/flintlib/flint/blob/d29dc3cf119a3a56a6a842f6a8f16a838b55456a/src/arb_mat/mul_block.c#L327

Such hardcoded limits could be replaced by a global, runtime-selectable tuning parameter.

Like ``flint_get/set_num_threads()``, I think this is a parameter that needs to be set manually by the user; doing something automatic based on the total amount of memory available on the machine is probably going to be fragile, and the optimal setting is going to depend on what the user plans to do anyway.

What are some optimizations that we need to implement to allow economizing memory?

One of the main memory hogs as noted above is FFT multiplication for huge polynomials and integers. An easy general-purpose solution is switch to Toom-Cook for extremely large multiplications, using FFT for the "pointwise" multiplications (or even doing another round of Toom-Cook recursively if the operands are truly enormous). Toom-Cook with $k$ interpolation points can be done with $O(N/k)$ scratch space, so with appropriate parameters the memory overhead of multiplication could be something like 10% or 20% instead of the current 500% or so. The time efficiency loss should be a factor 2-3 or so.

The existing FFT code could also be made more efficient (e.g. not storing tables of roots of unity; processing primes one at a time when using CRT) but that would be harder to implement. Even harder would be to implement some in-place FFT variant.

Matrix multiplication is analogous: instead of using Strassen or evaluation-interpolation type algorithms for the largest sizes, do blockwise classical multiplication and rely on the faster algorithms for sufficiently small blocks.

For various other operations, we'd want to replace divide and conquer algorithms that recurse to size $N/2$ with iterative algorithms that recurse to size $N/k$. That includes Newton division, LU factorization of matrices, etc.

## File-based operands

A second idea is to support file storage when even the operands are too large to fit in RAM.

It would be pretty neat if ``fmpz`` and ``nmod_poly`` (etc.) could switch to file storage automatically, but probably dedicated types with file storage is more realistic.

For composite types this could be done quite generically: if ``gr`` rings provide a method to serialize/deserialize vectors of elements to vectors of bytes, they could be wrapped by some kind of ``gr_file_vec``, ``gr_file_poly``, ``gr_file_mat`` types that stream ``flint_get_local_memory_limit()``-sized chunks in and out of memory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make FLINT memory-efficient #2535

Memory limits

File-based operands

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make FLINT memory-efficient #2535

Description

Memory limits

File-based operands

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions