Skip to content

Make FLINT memory-efficient #2535

@fredrik-johansson

Description

@fredrik-johansson

The main constraint for solving large problems often isn't time but memory. For example, on my laptop with 32 GB memory, the largest nmod_poly multiplication with 64-bit modulus that I can do is with inputs of length around $N = 10^8$. This takes just 25 seconds on a single core. Trying the same multiplication with 8 cores or with $N$ increased to $2 \cdot 10^8$ kills the process. Note that 4 * (10^8) * (64 bits) = 3.2 GB, so the inputs and outputs use just 10% of RAM; the rest of the memory is consumed by temporary buffers in fft_small. With a more memory-efficient multiplication algorithm, I should be able to solve a 10x larger problem entirely in RAM. If FLINT were able to use file storage, I should be able to do a multiplication 100x as large.

Memory limits

A first idea is to add an interface along the following lines:

void flint_set_local_memory_limit(slong bytes)

slong flint_get_local_memory_limit()

The idea is that when FLINT selects between algorithms, e.g. for polynomial multiplication, it could try to estimate in advance the size of temporary allocations. If this exceeds flint_get_local_memory_limit() for the fastest algorithm (e.g. an FFT), we switch to a potentially slower, but less memory-hungry algorithm. Note that there are already various hardcoded cutoffs in FLINT along these lines. An example is the hardcoded 4 GB sieve limit for the Riemann zeta function:

#define SIEVE_ALLOC_LIMIT 4e9 /* 4 GB */
Another example is the following condition in arb_mat_mul:
if (A_max_bits > 8000 || B_max_bits > 8000 ||

Such hardcoded limits could be replaced by a global, runtime-selectable tuning parameter.

Like flint_get/set_num_threads(), I think this is a parameter that needs to be set manually by the user; doing something automatic based on the total amount of memory available on the machine is probably going to be fragile, and the optimal setting is going to depend on what the user plans to do anyway.

What are some optimizations that we need to implement to allow economizing memory?

One of the main memory hogs as noted above is FFT multiplication for huge polynomials and integers. An easy general-purpose solution is switch to Toom-Cook for extremely large multiplications, using FFT for the "pointwise" multiplications (or even doing another round of Toom-Cook recursively if the operands are truly enormous). Toom-Cook with $k$ interpolation points can be done with $O(N/k)$ scratch space, so with appropriate parameters the memory overhead of multiplication could be something like 10% or 20% instead of the current 500% or so. The time efficiency loss should be a factor 2-3 or so.

The existing FFT code could also be made more efficient (e.g. not storing tables of roots of unity; processing primes one at a time when using CRT) but that would be harder to implement. Even harder would be to implement some in-place FFT variant.

Matrix multiplication is analogous: instead of using Strassen or evaluation-interpolation type algorithms for the largest sizes, do blockwise classical multiplication and rely on the faster algorithms for sufficiently small blocks.

For various other operations, we'd want to replace divide and conquer algorithms that recurse to size $N/2$ with iterative algorithms that recurse to size $N/k$. That includes Newton division, LU factorization of matrices, etc.

File-based operands

A second idea is to support file storage when even the operands are too large to fit in RAM.

It would be pretty neat if fmpz and nmod_poly (etc.) could switch to file storage automatically, but probably dedicated types with file storage is more realistic.

For composite types this could be done quite generically: if gr rings provide a method to serialize/deserialize vectors of elements to vectors of bytes, they could be wrapped by some kind of gr_file_vec, gr_file_poly, gr_file_mat types that stream flint_get_local_memory_limit()-sized chunks in and out of memory.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions