^{1}

^{2}

^{*}

^{3}

^{4}

^{5}

^{*}

^{1}

^{2}

^{3}

^{4}

^{5}

Edited by: Jean-Luc Bouchot, Beijing Institute of Technology, China

Reviewed by: Valeriya Naumova, Simula Research Laboratory, Norway; Xin Guo, Hong Kong Polytechnic University, Hong Kong

This article was submitted to Mathematics of Computation and Data Science, a section of the journal Frontiers in Applied Mathematics and Statistics

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

In compressed sensing one uses known structures of otherwise unknown signals to recover them from as few linear observations as possible. The structure comes in form of some compressibility including different notions of sparsity and low rankness. In many cases convex relaxations allow to efficiently solve the inverse problems using standard convex solvers at almost-optimal sampling rates. A standard practice to account for multiple simultaneous structures in convex optimization is to add further regularizers or constraints. From the compressed sensing perspective there is then the hope to also improve the sampling rate. Unfortunately, when taking simple combinations of regularizers, this seems not to be automatically the case as it has been shown for several examples in recent works. Here, we give an overview over ideas of combining multiple structures in convex programs by taking weighted sums and weighted maximums. We discuss explicitly cases where optimal weights are used reflecting an optimal tuning of the reconstruction. In particular, we extend known lower bounds on the number of required measurements to the optimally weighted maximum by using geometric arguments. As examples, we discuss simultaneously low rank and sparse matrices and notions of matrix norms (in the “square deal” sense) for regularizing for tensor products. We state an SDP formulation for numerically estimating the statistical dimensions and find a tensor case where the lower bound is roughly met up to a factor of two.

The recovery of an unknown signal from a limited number of observations can be more efficient by exploiting compressibility and a priori known structure of the signal. This compressed sensing methodology has been applied to many fields in physics, applied math, and engineering. In the most common setting, the structure is given as sparsity in a known basis. To mention some more recent directions, block-, group-, and hierarchical sparsity, low-rankness in matrix or tensor recovery problems and also the generic concepts of atomic decomposition are important structures present in many estimation problems.

In most of these cases, convex relaxations render the inverse problems itself amenable to standard solvers at almost-optimal sampling rates and also show tractability from a theoretical perspective [_{1}-norm can be used to regularize for sparsity and the nuclear norm for low rankness of matrices.

Let us describe the compressed sensing setting in more detail. We consider a linear ^{d} define by its output components

Throughout this work we assume the measurements to be Gaussian, i.e., _{j}}_{j∈[d]} is an orthonormal basis of _{0} ∈

We wish to reconstruct _{0} from _{reg} as regularizer. Specifically, we consider an outcome of the following convex optimization problem

as a candidate for a reconstruction of _{0}, where we will choose _{reg} to be either the weighted sum or the weighted maximum of other norms (18). If computations related to _{reg} are also computationally tractable, Equation (3) can be solved efficiently. We wish that _{reg} coincides with _{0}. Indeed, when the number of measurements _{0} = _{reg}. For instance, when

However, when the signal is _{reg}. For the case of Gaussian measurements, a quantity called the

Now we consider the case where the signal has several structures simultaneously. Two important examples are (i) simultaneously low rank and sparse matrices and (ii) tensors with several low rank matrizations. In such cases one often knows good regularizers for the individual structures. In this work, we address the question of how well one can use convex combinations of the individual regularizers to regularize for the combined structure.

It is a standard practice to account for multiple simultaneous structures in convex optimization by combining different regularizers or constraints. The hope is to improve the sampling rate, i.e., to recover _{0} from fewer observations

For the prominent example of entrywise sparse and low-rank matrices, Oymak et al. [_{1}– and nuclear norm cannot improve the scaling of the sampling rate

In particular, this analysis also covers certain approaches to tensor recovery. It should be noted that low-rank tensor reconstruction using few measurements is notoriously difficult. Initially, it was suggested to use linear combinations of nuclear norms as a regularizer [

If one gives up on having a convex reconstruction algorithm, non-convex quasi-norms can be minimized that lead to an almost optimal sampling rate [

Also following this idea of atomic norm decompositions [

We discuss further the idea of taking convex combinations of regularizers from a convex analysis viewpoint. Moreover, we focus on the scenario where the weights in a maximum and also a sum of norms can depend on the unknown object _{0} itself, which reflects an optimal tuning of the convex program.

Based on tools established in Mu et al. [

In section 3 we discuss the prominent case of simultaneously sparse and low rank matrices. Here, our contributions are two-fold. We first show that the measurements satisfy a restricted isometry property (RIP) at a sampling rate that is essentially optimal for low rank matrices, which implies injectivity of the measurement map

Then, in section 4 we discuss the regularization for rank-1 tensors using combinations of matrix norms (extending the “square deal” [

It is the hope that this work will contribute to more a comprehensive understanding of convex approaches to simultaneous structures in important recovery problems.

In this section we review the convex arguments used by Mu et al. [

For a positive integer _{p}-norm of a vector _{ℓp} and Schatten _{p}. For _{ℓ2}. For a cone

The

For a set _{1} and _{2} one has

where “⊂” follows trivially from the definitions. In order to see “⊃” we write a conic combination _{1}) + cone(_{2}) as _{1} and _{2} is denoted by _{1} + _{2} ≔ {_{1}, _{2}}. It holds that

The

where ∥

and it holds that [

where cl _{i}}_{i∈[k]} be proper convex functions such that the relative interiors of their domains have at least a point in common. Then

A function _{i}}_{i∈[m]} has the subdifferential [

Hence, the generated cone is the Minkowski sum

The Lipschitz constant of a function

Usually, _{2}-norm, which also fits the Euclidean geometry of the circular cones defined below.

The

where

for any cone ^{d}. Now, let us consider a compressed sensing problem where we wish to recover a signal _{0} from _{0})) of _{0} [

By ≳ and ≲ we denote the usual greater or equal and less or equal relations up to a positive constant prefactor and ∝ if both holds simultaneously. Then we can summarize that the convex reconstruction (3) requires exactly a number of measurements _{0})), which can be calculated via the last equation.

We consider the convex combination and weighted maxima of norms _{(i)}, where

where λ = (λ_{1}, …, λ_{k}) ≥ 0 and μ = (μ_{1}, …, μ_{k}) ≥ 0 are to be chosen later. Here, we assume that we have access to single norms of our original signal ∥_{0}∥_{(i)} so that we can fine-tune the parameters λ and μ accordingly.

In Oymak et al. [_{0}. For the sum norm _{λ,sum} this case is covered by the lower bounds in Mu et al. [_{μ,max} with μ being optimally chosen for a given signal.

A description of the norms dual to those given by (18) is provided in section 2.4.1.

The statistical dimension of the descent cone is obtained as Gaussian squared distance from the cone generated by the subdifferential (17). Hence, it can be lower bounded by showing (i) that the subdifferential is contained in a small cone and (ii) bounding that cone [

We use the following statements from Amelunxen et al. [^{d} is θ if 〈_{ℓ2}∥_{ℓ2} and write in that case ∠(

Its statistical dimension has a simple upper bound in terms of its dimension: For all ^{d} and some θ ∈ [0, π/2] [

By the following argument this bound can be turned into a lower bound on the statistical dimension of descent cones and, hence, to the necessary number of measurements for signal reconstructions. The following two propositions summarize arguments made by Mu et al. [

_{0}) ⊂ circ(_{0}, θ)

This statement is already implicitly contained in Mu et al. [

Hence, with the bound on the statistical dimension of the circular cone (20),

□

Recall from (14) a norm _{2}–norm on a (sub-)space

for all

for all

^{d} _{2}-

with

For sake of self-containedness we summarize the proof of Mu et al. [

^{d} with dual norm

Then, thanks to Hölder's inequality _{0})

This bound directly implies the claim. □

So together, Propositions 1 and 2 imply

where

is the _{0} (e.g., effective sparsity for _{ℓ1} and effective matrix rank for _{1}).

A larger subdifferential leads to a smaller statistical dimension of the descent cone and, hence, to a reconstruction with a potentially smaller number of measurements in the optimization problems

and

with the norms (18). Having the statistical dimension (17) in mind, we set

which give the optimal number of measurements in a precise sense [_{sum}(λ) is continuous in λ.

Now we fix _{0} and consider adjusting the weights λ_{i} and μ_{i} depending on _{0}. We will see the geometric arguments from Mu et al. [_{μ,max} with weights μ_{i} chosen as

has a better recovery performance than all other convex combinations of norms. With this choice the terms in the maximum (18) defining _{μ,max} are all equal. Hence, from the subdifferential of a maximum of functions (12) it follows that this choice of weights is indeed optimal. Moreover, the optimally weighted max-norm _{0} than all sum-norms _{λ,sum} and, hence, indeed to a better reconstruction performance:

_{0} ^{*} from

with (11)

Then (7) concludes the proof. □

Proposition 3 implies that if the max-norm minimization (32) with optimal weight (36) does not recover _{0}, then the sum-norm minimization (33) can also not recover _{0} for any weight λ.

Now we will see that lower bounds on the number of measurements from Mu et al. [

_{(i)} _{(i)} ≤ _{i}∥_{ℓ2} for

In conjunction with Proposition 1 this yields the bound

Hence, upper bounds on the Lipschitz constants of the single norms yield a circular cone containing the subdifferential of the maximum, where the smaller the largest upper bound the smaller the circular cone. In terms of

Now, Mu et al. [

_{(i)} _{(i)} ≤ _{i} ∥_{ℓ2}

_{0}

We will specify the results in more detail for the sparse and low-rank case in Corollary 8 and an example for the tensor case in Equation (100).

Often one can characterize the subdifferential of the regularizer by a semidefinite program (SDP). In this case, one can estimate the statistical dimension via sampling and solving such SDPs.

In more detail, in order to estimate the statistical dimension (17) for a norm _{0}) and take the sample average in the end. In order to do so, we wish to also have an SDP characterization of the dual norm ∂

In the case when the regularizer function _{μ,max} or _{λ,sum}, where the single norms _{(i)} have simple dual norms, we can indeed obtain such an SDP characterization of the dual norm

It will be convenient to have explicit formulae for the dual norms to the combined regularizers defined in section 2.2 [

_{(i)} _{μ,max} and _{λ,sum}

For the proof of the lemma we use the notation from the convex analysis book by Rockafellar [

Let ^{d} be a convex set. Then its

and its indicator function by

If

and the

[Note that while formally different, this definition is consistent with the polar of a cone introduced in (5).] The polar of a gauge function γ is denoted by γ°(^{*}. The infimal convolution of convex functions

There are all sorts of rules [_{1}, _{2} and ^{**} = ^{°°} = c1 ^{*} = ^{*}+^{*},

In order to establish the statement of the lemma we will use the fact that the norms are the gauge functions of their unit balls, i.e., _{(i)} = γ_{Bi} with _{i} := {_{(i)} ≤ 1}. For the proofs, however, let _{i} be closed convex sets containing the origin (they do not need to satisfy _{i} = −_{i}, as is the case for norms).

We begin with the polar of the sum (48). Note that, in general, the “polar body (

so that

which implies the statement (48).

In order to establish the polar of the maximum (47) we start with the identity

to obtain

so that

which proves the identity (47). □

We were aiming to estimate the statistical dimension (17) by sampling over Gaussian vectors

For _{λ,sum} we use its polar (48) to obtain

where one needs to note that an optimal feasible point of (60) also yields an optimal feasible point of (61) and vice versa. But this implies that

For the maximum of norms regularizer we again choose the optimal weights (36) to ensure that all norms are active, i.e.,

Then we use the subdifferential expression (13) for a point-wise maximum of functions to obtain

In the case that _{∞}-norms or spectral norms these programs can be written as SDPs and be solved by standard SDP solvers.

A class of structured signals that is important in many applications are matrices which are simultaneously sparse and of low rank. Such matrices occur in sparse phase retrieval^{1}

We consider the _{1} · _{2}–dimensional vector space _{1} × _{2}-matrices with components in the field 𝕂 being either ℝ or ℂ. The space

Our core problem is to recover structured matrices from

It is clear that without further a-priori assumptions on the unknown matrix _{1} · _{2} measurements to be able to successfully reconstruct

As additional structure, we consider subsets of

and the vectors {_{i}} and {_{i}} can be required to be orthogonal, respectively. This characterization giving rise to the nuclear norm _{1} as the corresponding atomic norm, see Chandrasekaran et al. [_{i}} and {_{i}} yields alternative formulations of rank. In the case of sparsity, one can formally ask for a description in terms of (_{1}, _{2})-sparse atoms:

where ∥_{ℓ0} denotes the number of non-zero elements of a vector

We say that a matrix _{1}, _{2})

Our model differs to the joint-sparse setting as used in Lee et al. [

By _{i} we denote _{1}(_{2}(

Obviously, the matrices _{1}-column-sparse and _{2}-row-sparse (sometimes also called as joint-sparse), i.e.,

^{2}_{1}, _{2})-sparse matrices and there are at most _{1} · _{2}) non-zero components. Note that a joint (_{1}, _{2})-row and column sparse matrix has only _{1} · _{2} non-zero entries. Hence, considering this only from the perspective of sparse vectors, we expect that up to logarithmic terms recovery can be achieved from _{1} · _{2}) measurements. On the other hand, solely from a viewpoint of low-rankness, _{1} + _{2}) measurements also determine an _{1} × _{2}-matrix of rank _{1}_{2}, _{1} + _{2}).

On the other hand, these matrices are determined by at most _{1} + _{2}) non-zero numbers. Optimistically, we therefore hope that already _{1} + _{2}) sufficiently diverse observations are enough to infer on _{1} and _{2}. In the next part we will discuss that this intuitive parameter counting argument is indeed true in the low-rank regime ^{m} for _{1} + _{2}) via a Gaussian map

Intuitively, (_{1}, _{2})–sparse rank-^{m}, i.e., distances between different matrices are preserved up to small error during the measurements process. Note that we have the inclusion

Since _{1}, _{2})-RIP with constant δ if

holds, where the supremum is taken over all

A generic result, based on the ideas of Baraniuk et al. [_{2} being an absolute constant. Another RIP perspective has been considered in Fornasier et al. [^{2}

We provide now a condensed version of the generic statement in Jung and Walk [

_{1}, _{2})-

Clearly, standard Gaussian measurement maps

_{2} ≤ ϵ. Since the matrices _{1} := _{1} row-sparse and _{2} ≔ _{2} column-sparse there are

different combinations for the row support _{1} ⊂ [_{1}] and column support _{2} ⊂ [_{2}] with |_{1}| = _{1} and |_{2}| = _{2}. For each of these canonical matrix subspaces supported on _{1} × _{2}, we consider matrices of rank at most _{1} × _{2} matrices of rank

In other words, up to logarithmic factors, this quantity also reflects the intuitive parameter counting. The net construction also ensures that for each _{1}(_{1} and | supp_{2}(_{2}. However, note that in non-trivial cases rank(_{1} + _{2} with 〈_{1}, _{2}〉 = 0 for some

For some

Now we choose

Requiring that the right hand side is bounded by δ and solving this inequality for

Using the assumption ^{−cδ2m} and the union bound

Thus, if we want to have RIP satisfied with probability

it is sufficient to impose that ^{−2}log |

In essence the theorem shows that the intrinsic geometry of sparse and low-rank matrices is preserved in low-dimensional embeddings when choosing the dimension above a threshold. It states that, in the low-rank regime _{1} + _{2}). This additional low-rank restriction is an technical artifact due to suboptimal combining of covering number estimates. Indeed, upon revising the manuscript we found that the statement above may be improved by utilizing [_{1} + _{2}) without restrictions on _{1} + _{2}) is sufficient anyway for all ranks

Intuitively, one should therefore be able reconstruct an unknown _{1} × _{2}–sparse matrix of rank _{1} + _{2}) generic random measurements. This would indeed reflect the intuitive parameter counting argument. Unfortunately, as will be discussed next, so far, no algorithm is known that can achieve such a reconstruction for generic matrices.

It is well-known that sufficiently small RIP constants

which uses a weighted sum as a regularizer, where _{0}. Related approaches have been used also for applications including sparse phase retrieval and sparse blind deconvolution. Obviously, then the corresponding measurement map is different and depends on the particular application. The practical relevance of this convex formulation is that it always allows to use generic solvers and there is a rich theory available to analyze the performance for certain types measurement maps in terms of norms of the recovery error _{0}. Intuitively, one might think that this amounts only to characterize the probability when the matrix _{0}, i.e., fulfills RIP or similar conditions. However, this is not enough as observed and worked out in Oymak et al. [_{ℓ1} = 0) or sparsity (μ_{1} = 0), separately. In other words, for any pair (μ_{ℓ1}, μ_{1}) the required sampling rate can not be better than the minimum one with single reguralizer norm (either μ_{ℓ1} = 0 or μ_{ℓ1} = 0). A difficult point in this discussion is what will happen if the program is optimally tuned, i.e., if μ_{ℓ1} = 1/‖_{0}‖_{ℓ1} and μ_{1} = 1/‖_{0}‖_{1}. We have based our generic investigations given in section 2.3 on the considerably more simplified technique of Mu et al. [

Due to the limitations of tractable convex programs non-convex recovery approaches have been investigated intensively in the last years. In particular, the alternating and thresholding based algorithm “sparse power factorization”, as presented in Lee et al. [

An interesting point has been discussed in Foucart et al. [_{1} = _{2} and _{1} = _{2}. Assume that for given rank ^{p×n} are standard Gaussian matrices with ^{p×n} from the raw measurements _{0} from

In the following section we will further strengthen this “no-go” result for convex recovery. As already mentioned above, an issue which has not been discussed in sufficient depth is what can be said about optimally tuned convex programs and beyond convex combinations of multiple regularizers. For our simultaneously sparse and low rank matrices Theorem 5 yields the following.

_{1}, _{2})-_{1} × _{2}-_{0}

In words, even when optimally tuning convex algorithms and when using the intuitive best regularizer having the largest subdifferential, the required sampling rate still scales multiplicative in sparsity, i.e., it shows the same no-go behavior as the other (suboptimal) regularizer.

_{1}–norm and the nuclear norm (w.r.t. the Frobenius norm) are

respectively. Using that the matrix _{0} is _{1}_{2})–sparse yields

is the expression in the minimum in (45) corresponding to the index “(_{1}” used for the ℓ_{1} norm. Using that _{0} has rank at most

with

and Theorem 5 establishes the corollary. □

We have numerically estimated the statistical dimension of the decent cones and have performed the actual reconstruction of simultaneously low rank and sparse matrices.

In section 2.4.2 we showed that the Gaussian distance can be estimated numerically by sampling over (in this case) semidefinite programs (SDP) according to (62) and (64). When empirically averaging these results according to (17) one obtains an estimate of the statistical dimension and therefore the phase transition point for successful convex recovery. For the case of sparse and low-rank matrices with the weighted sum of nuclear norm and ℓ_{1}–norm as regularizer the distance (62) becomes

A similar SDP can be obtained for the case of the maximum of these two regularizers. We solve both SDPs using the CVX toolbox in MATLAB (with SDPT3 as solver) for many realization of a Gaussian matrix

Statistical dimension from (17) for

These results indeed suggest that the statistical dimension for optimally weighted maximum of regularizers is better than the sum of regularizers.

We numerically find the phase transition for the convex recovery of complex sparse and low-rank matrices using the sum and maximum of optimally weighted ℓ_{1} and nuclear norm. We also compare to the results obtained by convex recovery using only either the ℓ_{1}–norm or the nuclear norm as regularizer and also provide and example for a non-convex algorithm.

The dimension of the matrices are _{1} = _{2} = 30, the sparsity range is _{1} = _{2} = 5, …, 20, and the rank is

holds. Each (

The results are shown in _{1}–norm and the nuclear norm as regularizer, respectively. The lower bound from Theorem 5 on the required number of measurements yield for those cases the sparsity ^{2} of _{0} and ^{2} there is a clear advantage of ℓ_{1}-regularization compared to the nuclear norm. The actual recovery rates scale roughly as 2^{2} ln(^{2} / ^{2}) and 4

Phase transitions for convex recovery using the ℓ_{1}–norm _{0}-dependent weights as regularizers. Furthermore, guessing weights in a greedy fashion for the sum-norm using is shown

However, combining both regularizers with optimal weights improves as shown in _{1}– and nuclear norm. For sufficiently small sparsity the ℓ_{1}–norm is more dominant and for higher sparsity than the nuclear norm determines the behavior of the phase transition. But only for the maximum of the regularizers the sampling rate saturates at approximately _{0}) = 1 (see

We also mention that the maximum of regularizers improves only if it is optimally tuned, which is already indicated by the subdifferential of a maximum (12), where only the largest terms contribute. In contrast, reconstruction behavior of the sum of norms seems to more stable. This observations has also been mentioned in Jalali [

To sketch a greedy approach for guessing the weights we consider the following strategy. Ideally, we would like to choose λ_{1} = 1/∥_{0}∥_{1} and λ_{ℓ1} = 1/∥_{0}∥_{ℓ1} in the minimization of the objective function λ_{1}∥_{1} + λ_{ℓ1}∥_{ℓ1}. Since, for Frobenius norm normalized _{1} ≥ ∥_{∞} (similarly for the ℓ_{∞}-norm) we choose for as initialization

we update

Finally, there is indeed strong evidence that in many problems with simultaneous structures non-convex algorithms perform considerably better and faster then convex ones. Although we have focused in this work on better understand of convex recovery, let us also comment on other settings. For the sparse and low-rank setting there exists several very efficient and powerful algorithms, as examples we mention here sparse power factorization (SPF) [_{2, 1} [_{2, 1} seems to be a better choice.

Tensor recovery is an important and notoriously difficult problem, which can be seen as a generalization of low-rank matrix recovery. However, for tensors there are several notions of rank and corresponding tensor decompositions [

Gandy et al. [

The

We consider the tensor spaces _{i} as

An index

We call the _{b} = ∏_{i∈b} _{i}, i.e., the indices in ^{c} into the column index. It is performed by a _{b}(

Now, we consider ranks based on a set of index bipartitions

The corresponding (formal) rank

Similarly, given a signal _{0} ∈

Note that for the case that _{0} is a product

for all

Let us give more explicit examples for the set of bipartitions:

One interesting remark might be that there are several measures of entanglement in quantum physics, which measure the non-productness in case of quantum state vectors. The _{ℓ2} = 1)

is the negativity [_{b} denotes the partial transposition w.r.t.

Theorem 5 applies to tensor recovery with the regularizer (97). We illustrate the lower bound for the special case of 4-way tensors (_{i} =

If _{0} is a tensor product, this becomes κ ≈ ^{2}.

A similar statement as Theorem 7 has been proved for the HOSVD and TT rank for the case of sub-Gaussian measurements by Rauhut et al. [

It is unclear how RIP results could be extended to the “ranks” without an associated tensor decomposition and probably these ranks need to be better understood first.

We sample the statistical dimension (17) numerically for _{0}; see ^{2}. The missing factor of two might be due to the following mismatch. In the argument with the circular cones we only considered tensors of unit

Observed average of the statistical dimension (17) for a product signal

A similar experiment can be done for the related sum-norm from (18). This leads to similar statistical dimensions except for the tensor train bipartition, where the statistical dimension is significantly larger (~ 25%) for the sum-norm.

We have investigated the problem of convex recovery of simultaneously structured objects from few random observations. We have revisited the idea of taking convex combinations of regularizers and have focused on the best among them, which is given by an optimally weighted maximum of the individual regularizers. We have extended lower bounds on the required number of measurements by Mu et al. [

For these settings, we have compared the lower bounds to numerical experiments. In those experiments we have (i) demonstrated the actual recovery and (ii) estimated the statistical dimension that gives the actual value of the phase transition of the recovery rate. The latter can be achieved by sampling over certain SDPs. For tensors, we have observed that the lower bound can be quite tight up to a factor of 2.

The main question, whether or not one can derive strong rigorous recovery guarantees for efficient reconstruction algorithms in the case of simultaneous structures remains largely open. However, there are a few smaller questions that we would like to point out.

Numerically, we have observed that weights deviating from the optimal ones have a relatively small effect for the sum of norms as compared to the maximum of norms. Indeed, _{μ,max} due to (13). Of course it would be good to have tight upper bounds both regularizers. Perhaps, one can also find a useful interpolation between _{μ,sum} and _{μ,max} by using an ℓ_{p} norm of the vector containing the single norms _{μ,max} for _{μ,sum} for ^{*}. Finally, perhaps one can modify an iterative non-convex procedure for solving the optimization problem we are using for the reconstructions such that one obtains recovery from fewer measurements.

All authors listed have made a substantial, direct and intellectual contribution to the work, and approved it for publication.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

We thank Michał Horodecki, Omer Sakarya, David Gross, Ingo Roth, Dominik Stoeger, and Željka Stojanak for fruitful discussions.

_{1}-Norm

^{1}See the cited works for further references on the classical non-sparse phase retrieval problem.

^{2}Assuming that _{1} ≔ _{1} ≤ _{1} and _{2} ≔ _{2} ≤ _{2}.