Max Coverage of Segments
Problem Description
Given a total of N elements, find the largest coverage we can get after removing one segment.
Example:
Input: [[5,9], [1,4], [3,7]]
Output: 7
Explanation: By removing [3,7], the remaining segments covers a total length of 7
Input format: the first line contains the length of the array; all following lines are the start and end of segments.
The example above corresponds to:
3
5 9
1 4
3 7
System Design
The Segment
Class
The Segment
class represents a segment (as a line in the input file), with a start
and an end
.
This class is designed to be a hashable and immutable model class (although the immutability usually cannot be enforced in Python).
In addition to hash
and equals
, it provides several useful functions:
Function | Description |
---|---|
size() | Returns the size of the segment |
covers(other) | Tells if the current segment covers another segment |
overlaps(other) | Tells if the current segment overlaps with another segment |
intersection(other) | Return the intersection segment of the current and another segment |
The FileEngine
Class
Each FileEngine
object is in charge of a pair of input/output files.
An instance can be created by FileEngine(file_id)
, and it provides two functions:
Function | Description |
---|---|
read() | Read the raw input file and return a list of Segment objects |
write(int) | Write an integer to the output file |
Algorithm
Pre-processing
Sort
First, we sort the Segment
s array according to their start time. This can be achieved in $O(n log(n))$ time.
Remove Sub-segments
Then, for each Segment
object in the sorted array, we remove all following segments that it covers. This can be achieved in $O(n)$ time.
Find Redundancy
If there exists an segments[i]
such that segments[i-1]
overlaps with segments[i+1]
, we call segments[i]
“redundant”, because the overall coverage will NOT be affected after it is removed.
If a redundant segment is found, we simply calculate and return the coverage of the remaining segments.
Properties of Segment Array
After pre-processing, we have a segment array with some special properties.
Consider any three consecutive segments in the array, segments[i-1]
, segments[i]
and segments[i+1]
, the following properties can be ensured:
Property | Explanation |
---|---|
segments[i-1].start <= segments[i].start <= segments[i+1].start | Sorted |
segments[i-1].end < segments[i].end < segments[i+1].end | NO sub-segments |
segments[i-1].end < segments[i+1].start | NO redundancy (in other words, segments[i-1] and segments[i+1] are disjoint) |
With such properties, the coverage of remaining segments after removing a segments[i]
can be calculated by an addition of:
- Total coverage from
segments[0]
tosegments[i-1]
- Total coverage from
segments[i+1]
tosegments[-1]
Two-pass DP
The core algorithm uses a two-pass dynamic programming technique, once forward and another backward.
There are two arrays: forward
and backward
.
forward[i]
stores the total coverage from segments[0]
to segments[i]
, and backward[i]
stores the total coverage from segments[i]
to segments[-1]
(both ends included).
Hence, the first element in forward
should be the size()
of the first segment.
Then, from the second segment, for each segments[i]
, we compute the combined total coverage up until that element, forward[i]
, using three values we already have:
segments[i-1]
segments[i]
forward[i-1]
If segments[i-1]
and segments[i]
does NOT overlap, the current total coverage forward[i]
can be calculated by forward[i-1] + segments[i].size()
.
Otherwise, we need to calculate the intersection
by calling segments[i].intersection(segments[i-1])
, and then forward[i] = forward[i-1] + segments[i].size() + intersection.size()
After we fill up the forward
array, we do the same in the reverse order to generate the backward
array.
Once we have both the forward
and backward
arrays, finding the maximum coverage is super easy. For any element in between (not the first/last), the total coverage without it is simply forward[i-1] + backward[i+1]
.
Finally, we compare all coverages above and forward[-2]
, backward[1]
to get the maximum.