Skip to content

[packer] Changing List of Feasible Candidates to Priority Queue #2994

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 45 additions & 86 deletions vpr/src/pack/greedy_candidate_selector.cpp

Large diffs are not rendered by default.

35 changes: 24 additions & 11 deletions vpr/src/pack/greedy_candidate_selector.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
#include "vtr_vector.h"
#include "vtr_random.h"
#include "vtr_vector_map.h"
#include "lazy_pop_unique_priority_queue.h"

// Forward declarations
class AtomNetlist;
Expand Down Expand Up @@ -97,13 +98,6 @@ struct ClusterGainStats {
/// with the cluster.
AttractGroupId attraction_grp_id;

/// @brief Array of feasible blocks to select from [0..max_array_size-1]
///
/// Sorted in ascending gain order so that the last cluster_ctx.blocks is
/// the most desirable (this makes it easy to pop blocks off the list.
std::vector<PackMoleculeId> feasible_blocks;
int num_feasible_blocks;

/// @brief The flat placement location of this cluster.
///
/// This is some function of the positions of the molecules which have been
Expand All @@ -126,6 +120,25 @@ struct ClusterGainStats {
/// set when the stats are created based on the primitive pb type
/// of the seed.
bool is_memory = false;

/// @brief List of feasible block and its gain pairs.
/// The list is maintained in heap structure with the highest gain block
/// at the front.
LazyPopUniquePriorityQueue<PackMoleculeId, float> feasible_blocks;

/// @brief Indicator for the initial search for feasible blocks.
bool initial_search_for_feasible_blocks;

/// @brief Limit for the number of candiate proposed at each stage.
unsigned candidates_propose_limit;

/// @brief Counter for the number of candiate proposed at each stage.
unsigned num_candidates_proposed;

/// @brief Check if the current stage candidates proposed limit is reached.
bool current_stage_candidates_proposed_limit_reached() {
return num_candidates_proposed >= candidates_propose_limit;
}
};

/**
Expand Down Expand Up @@ -444,7 +457,7 @@ class GreedyCandidateSelector {
// Cluster Candidate Selection
// ===================================================================== //

/*
/**
* @brief Add molecules with strong connectedness to the current cluster to
* the list of feasible blocks.
*/
Expand All @@ -471,7 +484,7 @@ class GreedyCandidateSelector {
LegalizationClusterId legalization_cluster_id,
const ClusterLegalizer& cluster_legalizer);

/*
/**
* @brief Add molecules based on transitive connections (eg. 2 hops away)
* with current cluster.
*/
Expand All @@ -481,7 +494,7 @@ class GreedyCandidateSelector {
const ClusterLegalizer& cluster_legalizer,
AttractionInfo& attraction_groups);

/*
/**
* @brief Add molecules based on weak connectedness (connected by high
* fanout nets) with current cluster.
*/
Expand All @@ -491,7 +504,7 @@ class GreedyCandidateSelector {
const ClusterLegalizer& cluster_legalizer,
AttractionInfo& attraction_groups);

/*
/**
* @brief If the current cluster being packed has an attraction group
* associated with it (i.e. there are atoms in it that belong to an
* attraction group), this routine adds molecules from the associated
Expand Down
216 changes: 216 additions & 0 deletions vpr/src/util/lazy_pop_unique_priority_queue.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
/**
* @file
* @author Rongbo Zhang
* @date 2025-04-23
* @brief This file contains the definition of the LazyPopUniquePriorityQueue class.
*
* The class LazyPopUniquePriorityQueue is a priority queue that allows for lazy deletion of elements.
* The elements are pair of key and sort-value. The key is a unique value to identify the item, and the sort-value is used to sort the item.
* It is implemented using a vector and 2 sets, one set keeps track of the elements in the queue, and the other set keeps track of the elements that are pending deletion,
* so that they can be removed from the queue when they are popped.
*
* Currently, the class supports the following functions:
* LazyPopUniquePriorityQueue::push(): Pushes a key-sort-value (K-SV) pair into the priority queue and adds the key to the tracking set.
* LazyPopUniquePriorityQueue::pop(): Returns the K-SV pair with the highest SV whose key is not pending deletion.
* LazyPopUniquePriorityQueue::remove(): Removes an element from the priority queue immediately.
* LazyPopUniquePriorityQueue::remove_at_pop_time(): Removes an element from the priority queue when it is popped.
* LazyPopUniquePriorityQueue::empty(): Returns whether the queue is empty.
* LazyPopUniquePriorityQueue::clear(): Clears the priority queue vector and the tracking sets.
* LazyPopUniquePriorityQueue::size(): Returns the number of elements in the queue.
* LazyPopUniquePriorityQueue::contains(): Returns true if the key is in the queue, false otherwise.
*/

#pragma once

#include <unordered_set>
#include <vector>
#include <algorithm>

/**
* @brief Lazy Pop Unique Priority Queue
*
* This is a priority queue that is used to sort items which are identified by the key
* and sorted by the sort value.
*
* It uses a vector to store the key and sort value pair.
* It uses a set to store the keys that are in the vector for uniqueness checking
* and a set to store the delete pending keys which will be removed at pop time.
*/

template<typename T_key, typename T_sort>
class LazyPopUniquePriorityQueue {
public:
/** @brief The custom comparsion struct for sorting the items in the priority queue.
* A less than comparison will put the item with the highest sort value to the front of the queue.
* A greater than comparison will put the item with the lowest sort value to the front of the queue.
*/
struct LazyPopUniquePriorityQueueCompare {
bool operator()(const std::pair<T_key, T_sort>& a,
const std::pair<T_key, T_sort>& b) const {
return a.second < b.second;
}
};

/// @brief The vector maintained as heap to store the key and sort value pair.
std::vector<std::pair<T_key, T_sort>> heap;

/// @brief The set to store the keys that are in the queue. This is used to ensure uniqueness
std::unordered_set<T_key> content_set;

/// @brief The set to store the delete pending item from the queue refered by the key.
std::unordered_set<T_key> delete_pending_set;

/**
* @brief Push the key and the sort value as a pair into the priority queue.
*
* @param key
* The unique key for the item that will be pushed onto the queue.
* @param value
* The sort value used for sorting the item.
*/
void push(T_key key, T_sort value) {
// Insert the key and sort value pair into the queue if it is not already present
if (content_set.find(key) != content_set.end()) {
// If the key is already in the queue, do nothing
return;
}
// Insert the key and sort value pair into the heap and track the key
// The new item is added to the end of the vector and then the push_heap function is call
// to push the item to the correct position in the heap structure.
heap.emplace_back(key, value);
std::push_heap(heap.begin(), heap.end(), LazyPopUniquePriorityQueueCompare());
content_set.insert(key);
}

/**
* @brief Pop the top item from the priority queue.
*
* @return The key and sort value pair.
*/
std::pair<T_key, T_sort> pop() {
std::pair<T_key, T_sort> top_pair;
while (heap.size() > 0) {
top_pair = heap.front();
// Remove the key from the heap and the tracking set.
// The pop_heap function will move the top item in the heap structure to the end of the vector container.
// Then the pop_back function will remove the last item.
std::pop_heap(heap.begin(), heap.end(), LazyPopUniquePriorityQueueCompare());
heap.pop_back();
content_set.erase(top_pair.first);

// Checking if the key with the highest sort value is in the delete pending set.
// If it is, ignore the current top item and remove the key from the delete pending set. Then get the next top item.
// Otherwise, the top item found, break the loop.
if (delete_pending_set.find(top_pair.first) != delete_pending_set.end()) {
delete_pending_set.erase(top_pair.first);
top_pair = std::pair<T_key, T_sort>();
} else {
break;
}
}

// If there is zero non-pending-delete item, clear the queue.
if (empty()) {
clear();
}

return top_pair;
}

/**
* @brief Remove the item with matching key value from the priority queue
* This will immediately remove the item and re-heapify the queue.
*
* This function is expensive, as it requires a full re-heapify of the queue.
* The time complexity is O(n log n) for the re-heapify, where n is the size of the queue.
* It is recommended to use remove_at_pop_time() instead.
* @param key
* The key of the item to be delected from the queue.
*/
void remove(T_key key) {
// If the key is in the priority queue, remove it from the heap and reheapify.
// Otherwise, do nothing.
if (content_set.find(key) != content_set.end()) {
content_set.erase(key);
delete_pending_set.erase(key);
for (int i = 0; i < heap.size(); i++) {
if (heap[i].first == key) {
heap.erase(heap.begin() + i);
break;
}
}

// If this delete caused the queue to have zero non-pending-delete item, clear the queue.
if (empty()) {
clear();
// Otherwise re-heapify the queue
} else {
std::make_heap(heap.begin(), heap.end(), LazyPopUniquePriorityQueueCompare());
}
}
}

/**
* @brief Remove the item with matching key value from the priority queue at pop time.
* Add the key to the delete pending set for tracking,
* and it will be deleted when it is popped.
*
* This function will not immediately delete the key from the
* priority queue. It will be deleted when it is popped. Thus do not
* expect a size reduction in the priority queue immediately.
* @param key
* The key of the item to be delected from the queue at pop time.
*/
void remove_at_pop_time(T_key key) {
// If the key is in the list, start tracking it in the delete pending list.
// Otherwise, do nothing.
if (content_set.find(key) != content_set.end()) {
delete_pending_set.insert(key);

// If this marks the last non-pending-delete item as to-be-deleted, clear the queue
if (empty()) {
clear();
}
}
}

/**
* @brief Check if the priority queue is empty, i.e. there is zero non-pending-delete item.
*
* @return True if the priority queue is empty, false otherwise.
*/
bool empty() {
return size() == 0;
}

/**
* @brief Clears the priority queue and the tracking sets.
*
* @return None
*/
void clear() {
heap.clear();
content_set.clear();
delete_pending_set.clear();
}

/**
* @brief Get the number of non-pending-delete items in the priority queue.
*
* @return The number of non-pending-delete items in the priority queue.
*/
size_t size() {
return heap.size() - delete_pending_set.size();
}

/**
* @brief Check if the item referred to the key is in the priority queue.
*
* @param key
* The key of the item.
* @return True if the key is in the priority queue, false otherwise.
*/
bool contains(T_key key) {
return content_set.find(key) != content_set.end();
}
};
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
arch circuit script_params vtr_flow_elapsed_time vtr_max_mem_stage vtr_max_mem error odin_synth_time max_odin_mem parmys_synth_time max_parmys_mem abc_depth abc_synth_time abc_cec_time abc_sec_time max_abc_mem ace_time max_ace_mem num_clb num_io num_memories num_mult vpr_status vpr_revision vpr_build_info vpr_compiler vpr_compiled hostname rundir max_vpr_mem num_primary_inputs num_primary_outputs num_pre_packed_nets num_pre_packed_blocks num_netlist_clocks num_post_packed_nets num_post_packed_blocks device_width device_height device_grid_tiles device_limiting_resources device_name pack_mem pack_time placed_wirelength_est total_swap accepted_swap rejected_swap aborted_swap place_mem place_time place_quench_time min_chan_width routed_wirelength min_chan_width_route_success_iteration logic_block_area_total logic_block_area_used min_chan_width_routing_area_total min_chan_width_routing_area_per_tile min_chan_width_route_time
k4_N10_memSize16384_memData64.xml ch_intrinsics.v common 1.71 vpr 62.29 MiB -1 -1 0.45 18372 3 0.09 -1 -1 33140 -1 -1 71 99 1 0 success v8.0.0-11920-g63becbef4-dirty release IPO VTR_ASSERT_LEVEL=2 GNU 9.4.0 on Linux-4.15.0-213-generic x86_64 2024-12-04T15:29:41 betzgrp-wintermute.eecg.utoronto.ca /home/elgamma8/research/release/vtr-verilog-to-routing 63780 99 130 353 483 1 222 301 13 13 169 clb auto 22.7 MiB 0.06 730 30541 5185 13290 12066 62.3 MiB 0.05 0.00 28 1583 11 3.33e+06 2.25e+06 384474. 2275.00 0.18
k4_N10_memSize16384_memData64.xml diffeq1.v common 3.90 vpr 66.30 MiB -1 -1 0.72 23492 23 0.30 -1 -1 34028 -1 -1 77 162 0 5 success v8.0.0-11920-g63becbef4-dirty release IPO VTR_ASSERT_LEVEL=2 GNU 9.4.0 on Linux-4.15.0-213-generic x86_64 2024-12-04T15:29:41 betzgrp-wintermute.eecg.utoronto.ca /home/elgamma8/research/release/vtr-verilog-to-routing 67888 162 96 1200 1141 1 675 340 13 13 169 clb auto 25.9 MiB 0.18 5120 92848 24971 61178 6699 66.3 MiB 0.19 0.00 52 9637 13 3.33e+06 2.76e+06 671819. 3975.26 1.14
k4_N10_memSize16384_memData64.xml single_wire.v common 2.10 vpr 59.81 MiB -1 -1 0.16 16372 1 0.17 -1 -1 29680 -1 -1 0 1 0 0 success v8.0.0-11920-g63becbef4-dirty release IPO VTR_ASSERT_LEVEL=2 GNU 9.4.0 on Linux-4.15.0-213-generic x86_64 2024-12-04T15:29:41 betzgrp-wintermute.eecg.utoronto.ca /home/elgamma8/research/release/vtr-verilog-to-routing 61244 1 1 1 2 0 1 2 3 3 9 -1 auto 21.3 MiB 0.00 2 3 0 3 0 59.8 MiB 0.01 0.00 2 1 1 30000 0 1489.46 165.495 0.01
k4_N10_memSize16384_memData64.xml single_ff.v common 2.13 vpr 59.62 MiB -1 -1 0.15 16244 1 0.17 -1 -1 29552 -1 -1 1 2 0 0 success v8.0.0-11920-g63becbef4-dirty release IPO VTR_ASSERT_LEVEL=2 GNU 9.4.0 on Linux-4.15.0-213-generic x86_64 2024-12-04T15:29:41 betzgrp-wintermute.eecg.utoronto.ca /home/elgamma8/research/release/vtr-verilog-to-routing 61048 2 1 3 4 1 3 4 3 3 9 -1 auto 21.2 MiB 0.00 6 9 6 0 3 59.6 MiB 0.01 0.00 16 5 1 30000 30000 2550.78 283.420 0.01
arch circuit script_params vtr_flow_elapsed_time vtr_max_mem_stage vtr_max_mem error odin_synth_time max_odin_mem parmys_synth_time max_parmys_mem abc_depth abc_synth_time abc_cec_time abc_sec_time max_abc_mem ace_time max_ace_mem num_clb num_io num_memories num_mult vpr_status vpr_revision vpr_build_info vpr_compiler vpr_compiled hostname rundir max_vpr_mem num_primary_inputs num_primary_outputs num_pre_packed_nets num_pre_packed_blocks num_netlist_clocks num_post_packed_nets num_post_packed_blocks device_width device_height device_grid_tiles device_limiting_resources device_name pack_mem pack_time initial_placed_wirelength_est placed_wirelength_est total_swap accepted_swap rejected_swap aborted_swap place_mem place_time place_quench_time min_chan_width routed_wirelength min_chan_width_route_success_iteration logic_block_area_total logic_block_area_used min_chan_width_routing_area_total min_chan_width_routing_area_per_tile min_chan_width_route_time
k4_N10_memSize16384_memData64.xml ch_intrinsics.v common 1.17 vpr 63.18 MiB -1 -1 0.21 18728 3 0.06 -1 -1 32704 -1 -1 72 99 1 0 success v8.0.0-12603-g716d96fe0-dirty release IPO VTR_ASSERT_LEVEL=2 Clang 18.1.3 on Linux-6.8.0-58-generic x86_64 2025-05-01T22:32:41 betzgrp-wintermute /home/zhan6738/VTR/vtr-verilog-to-routing/vtr_flow/tasks 64692 99 130 353 483 1 220 302 13 13 169 clb auto 23.4 MiB 0.03 1748 641 31674 5814 13912 11948 63.2 MiB 0.03 0.00 36 1209 9 3.33e+06 2.28e+06 481319. 2848.04 0.18
k4_N10_memSize16384_memData64.xml diffeq1.v common 2.84 vpr 66.43 MiB -1 -1 0.32 23332 23 0.28 -1 -1 33440 -1 -1 78 162 0 5 success v8.0.0-12603-g716d96fe0-dirty release IPO VTR_ASSERT_LEVEL=2 Clang 18.1.3 on Linux-6.8.0-58-generic x86_64 2025-05-01T22:32:41 betzgrp-wintermute /home/zhan6738/VTR/vtr-verilog-to-routing/vtr_flow/tasks 68020 162 96 1200 1141 1 690 341 14 14 196 clb auto 26.8 MiB 0.11 8696 5304 81261 22686 53433 5142 66.4 MiB 0.09 0.00 46 10726 18 4.32e+06 2.79e+06 735717. 3753.66 1.10
k4_N10_memSize16384_memData64.xml single_wire.v common 0.50 vpr 61.17 MiB -1 -1 0.06 17192 1 0.02 -1 -1 29568 -1 -1 0 1 0 0 success v8.0.0-12603-g716d96fe0-dirty release IPO VTR_ASSERT_LEVEL=2 Clang 18.1.3 on Linux-6.8.0-58-generic x86_64 2025-05-01T22:32:41 betzgrp-wintermute /home/zhan6738/VTR/vtr-verilog-to-routing/vtr_flow/tasks 62636 1 1 1 2 0 1 2 3 3 9 -1 auto 22.6 MiB 0.00 2 2 3 0 3 0 61.2 MiB 0.00 0.00 2 1 1 30000 0 1489.46 165.495 0.00
k4_N10_memSize16384_memData64.xml single_ff.v common 0.51 vpr 61.02 MiB -1 -1 0.05 17192 1 0.02 -1 -1 29212 -1 -1 1 2 0 0 success v8.0.0-12603-g716d96fe0-dirty release IPO VTR_ASSERT_LEVEL=2 Clang 18.1.3 on Linux-6.8.0-58-generic x86_64 2025-05-01T22:32:41 betzgrp-wintermute /home/zhan6738/VTR/vtr-verilog-to-routing/vtr_flow/tasks 62484 2 1 3 4 1 3 4 3 3 9 -1 auto 22.4 MiB 0.00 6 6 9 6 0 3 61.0 MiB 0.00 0.00 16 5 1 30000 30000 2550.78 283.420 0.00
Loading