Powered by Starship v1.3
Funniest DFS ever
Dec 27 2024
11:41 AM

Continuation of my previous post about pattern-matching.

Previously, I explained how to use a depth-first-search of a trie to compute pattern-matchings in a set of strings. However, explicit trie used too much memory for the specific task I was trying to apply it on. (64 megabyte limit)

So I took some advice from Alternet and made the trie implicit - rather than even storing intermediate nodes, the entropy of the strings being stored in the trie was low enough that storing each query individually and sorting it took up less memory than a trie. Then this sorted array could be "dfsed" with divide-and-conquer.

#include <algorithm>
#include <iostream>
int *a[21], d[1<<21];
using ll = long long; const ll X = (1<<22) - 1;
int res[1000000]; ll b[1000000];
void dfs(int l, int r, int d, int s){ if(!s) for(; l<r; ++l) res[b[l] & X] = a[d][0];
else{ int s1 = r, s2 = r;
for(int i=r-1; i>=l; --i){ if(((b[i] >> (60-d*2)) & 3) == 2) s2 = s1 = i; if(((b[i] >> (60-d*2)) & 3) == 1) s1 = i; }
if(l != s1){ for(int i=0; i<s; ++i) a[d+1][i] = a[d][i]; dfs(l, s1, d+1, s/2); }
if(s1 != s2){ for(int i=0; i<s; ++i) a[d+1][i] = a[d][i+s]; dfs(s1, s2, d+1, s/2); }
if(s2 != r){ for(int i=0; i<s; ++i) a[d+1][i] = a[d][i] + a[d][i+s]; dfs(s2, r, d+1, s/2); } } }
int main(){ std::cin.tie(0) -> sync_with_stdio(0);
int l, q; std::cin >> l >> q;
a[0] = d; for(int s=1<<l, p=1; s; ++p, s/=2) a[p] = a[p-1]+s;
for(int i=0; i<1<<l; ++i){ char c; std::cin >> c; a[0][i] = c-'0'; }
for(int i=0; i<q; ++i){ b[i] |= i; for(int j=0; j<l; ++j){ char c; std::cin >> c; ll m = c == '?' ? 2 : c - '0'; b[i] |= m << (60-j*2); } }
std::sort(b, b+q);
dfs(0, q, 0, 1<<(l-1));
for(int i=0; i<q; ++i) std::cout << res[i] << '\n'; }

This entire implementation is incredibly messy due to my attempts to memory-optimize my old trie code. Here's a simplified version of dfs:

dfs(left boundary, right boundary, depth, segment size) ->
    if we are at a leaf:
        the smallest 'leaf array' now is of size 1, and its only element
            is the sum of all matching leaves so far.
for every query between left and right boundary, set answer to the value of this one element.
else: find the indices s1, s2 that partition the range l..r into the three ranges l..s1, s1..s2, s2..r whose corresponding sequences start with 0, 1, and ?.
if the range l..s1 is not empty: populate the next 'leaf array' with the first half of the current one
dfs(l, s1, depth+1, segment size/2)
if the range s1..s2 is not empty: do the same as above, but the take the second half of the leaf array
if the range s2..r is not empty: do the same, but add together the two halves

So this DFS is not really a DFS. It's a divide-and-conquer traversal of an implicit trie, which handles the bottom-up traversal of a full binary tree.. good stuff.

tags: programming