├── .gitignore
├── images
    ├── CFG.png
    ├── llvm.png
    ├── .DS_Store
    ├── shinchan .jpg
    ├── trx_arch.png
    ├── first_step.png
    ├── memory_seg.png
    ├── trx_size_udp.png
    ├── javascript_meme.jpg
    ├── compiler_all_stages.png
    ├── meme_trx_size_limit.png
    ├── compiler_three_pillers.png
    ├── header_rust_compiler2.jpg
    ├── reintialization_attack.jpg
    ├── rust_compiler_footer.png
    └── solana_limitation_bye.gif
├── README.md
├── reinitialization_attack.md
├── rust_compiler_for_dummies.md
└── solana_limitations.md


/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | 


--------------------------------------------------------------------------------
/images/CFG.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/baindlapranayraj/proof-of-blogs/HEAD/images/CFG.png


--------------------------------------------------------------------------------
/images/llvm.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/baindlapranayraj/proof-of-blogs/HEAD/images/llvm.png


--------------------------------------------------------------------------------
/images/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/baindlapranayraj/proof-of-blogs/HEAD/images/.DS_Store


--------------------------------------------------------------------------------
/images/shinchan .jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/baindlapranayraj/proof-of-blogs/HEAD/images/shinchan .jpg


--------------------------------------------------------------------------------
/images/trx_arch.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/baindlapranayraj/proof-of-blogs/HEAD/images/trx_arch.png


--------------------------------------------------------------------------------
/images/first_step.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/baindlapranayraj/proof-of-blogs/HEAD/images/first_step.png


--------------------------------------------------------------------------------
/images/memory_seg.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/baindlapranayraj/proof-of-blogs/HEAD/images/memory_seg.png


--------------------------------------------------------------------------------
/images/trx_size_udp.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/baindlapranayraj/proof-of-blogs/HEAD/images/trx_size_udp.png


--------------------------------------------------------------------------------
/images/javascript_meme.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/baindlapranayraj/proof-of-blogs/HEAD/images/javascript_meme.jpg


--------------------------------------------------------------------------------
/images/compiler_all_stages.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/baindlapranayraj/proof-of-blogs/HEAD/images/compiler_all_stages.png


--------------------------------------------------------------------------------
/images/meme_trx_size_limit.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/baindlapranayraj/proof-of-blogs/HEAD/images/meme_trx_size_limit.png


--------------------------------------------------------------------------------
/images/compiler_three_pillers.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/baindlapranayraj/proof-of-blogs/HEAD/images/compiler_three_pillers.png


--------------------------------------------------------------------------------
/images/header_rust_compiler2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/baindlapranayraj/proof-of-blogs/HEAD/images/header_rust_compiler2.jpg


--------------------------------------------------------------------------------
/images/reintialization_attack.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/baindlapranayraj/proof-of-blogs/HEAD/images/reintialization_attack.jpg


--------------------------------------------------------------------------------
/images/rust_compiler_footer.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/baindlapranayraj/proof-of-blogs/HEAD/images/rust_compiler_footer.png


--------------------------------------------------------------------------------
/images/solana_limitation_bye.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/baindlapranayraj/proof-of-blogs/HEAD/images/solana_limitation_bye.gif


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | <img src = "https://i.pinimg.com/1200x/bb/a3/d1/bba3d17e08596bbc965e562e36287043.jpg"/>
2 | 
3 | # Welcome to my Proof of blogs
4 | 


--------------------------------------------------------------------------------
/reinitialization_attack.md:
--------------------------------------------------------------------------------
  1 | <img
  2 |  width="1000px"
  3 |  height="500px"
  4 |  src="./images/reintialization_attack.jpg"
  5 | />
  6 | 
  7 | # 🌙 Reinitialization Attack
  8 | 
  9 | Program Example (Btw I use Neovim 🙂): [github link](https://github.com/baindlapranayraj/reinitialization-attack-demo)  
 10 | To understand how reinitialization attacks work, we must first understand the inner workings of the `init` and `init_if_needed` constraints in Anchor ⚓️.
 11 | 
 12 | ## **`init`:**
 13 | 
 14 | In simple words, `init` ensures the account is created and initialized **only once** for a given address (determined by seeds and bump). `init` internally performs the following two sets of instructions:
 15 | 
 16 | ### **System Level:**
 17 | 
 18 | - It uses the `create_account` instruction in the System Program.
 19 | - It allocates on-chain space for the account.
 20 | - It funds the rent-exempt lamports using `Rent::minimum_balance`.
 21 | - It assigns an owner program (e.g., your custom program).
 22 | 
 23 | ### Initialization **of an Account:**
 24 | 
 25 | - Your program defines the account data structure using the `#[account]` struct.
 26 | - Writes the **8-byte discriminator and attaches it to your data (which is used as a type checker).**
 27 | - Stores the initial data as shown in the example below for the PDA account struct.
 28 | 
 29 | ```rust
 30 | #[account]
 31 | #[derive(InitSpace)]
 32 | pub struct User {
 33 |   pub user_pubkey: Pubkey,
 34 |   #[max_len(30)]
 35 |   pub user_name: Option<String>,
 36 |   pub balance: u64,
 37 |   pub user_vault_bump: u8,
 38 |   pub user_bump: u8,
 39 | }
 40 | ```
 41 | 
 42 | ```rust
 43 | #[account(
 44 |     init,
 45 |     payer = USER,
 46 |     space = 8 + User::INIT_SPACE,
 47 |     seeds = [
 48 |               USER,
 49 |               user.key().to_bytes().as_ref(),
 50 |               chau_config.key().to_bytes().as_ref()
 51 |             ],
 52 |     bump
 53 | )]
 54 | pub user_profile: Account<'info, User>,
 55 | ```
 56 | 
 57 | ### Security of `init` Constraint 🛡️:
 58 | 
 59 | The security of the `init` constraint is rigid, unlike `init_if_needed`. You cannot create a fake PDA and pass it to the program, nor can you initialize the same account twice, as doing so will throw an error.
 60 | 
 61 | **Anchor checks whether the account is initialized by evaluating the following conditions:**
 62 | 
 63 | 1. If the account has some lamports and it is owned by the System Program, Anchor considers it an uninitialized account and throws an error.
 64 | 2. If an account has zero lamports, then Anchor `init` can create and initialize the account (because it is a new account).
 65 | 
 66 | **How Anchor checks the PDAs 🔍:**
 67 | 
 68 | 1. When you serialize your account, Anchor adds an extra discriminator (or tag) that stores the name of your account (e.g., `User` for the PDA above) in 8 bytes using the first 8 bytes of the SHA256 hash of the account’s name.
 69 | 2. During deserialization, this discriminator acts as a type checker. If a malicious account is passed, Anchor compares the discriminator in the account’s data to the expected discriminator for the account type. If they don’t match, Anchor will reject the account, preventing the malicious account from being used.
 70 | 3. In Solana, every piece of data is stored as raw-binary data, so Anchor could not figure out if an account is a `User` or not. This discriminator acts like a label/tag so that Anchor can deserialize it back to the correct struct.
 71 | 
 72 | ### **Attack Scenario 🔪:**
 73 | 
 74 | - Let’s say there is an instruction that requires a `User` PDA, as shown in the first example code, and an attacker tries to send a malicious PDA to the `User` address.
 75 | - Here, Anchor first checks the discriminator of the `Fake` account and compares it with the discriminator of `User`. If the discriminator doesn’t match, Anchor rejects the account and throws an error.
 76 | 
 77 | You might (and you should) start appreciating Anchor. It takes care of a lot of important security checks, as well as the serialization and deserialization of accounts (and some other footguns), under the hood.
 78 | 
 79 | Note: Here, the Account created by Anchor is a wrapper around the AccountInfo (shown below), which helps Anchor to verify program ownership.
 80 | 
 81 | ```rust
 82 | pub struct AccountInfo<'a> {
 83 |      /// Public key of the account
 84 |      pub key: &'a Pubkey,
 85 |      /// The lamports in the account.  Modifiable by programs.
 86 |      pub lamports: Rc<RefCell<&'a mut u64>>,
 87 |      /// The data held in this account.  Modifiable by programs.
 88 |      pub data: Rc<RefCell<&'a mut [u8]>>,
 89 |      /// Program that owns this account
 90 |      pub owner: &'a Pubkey,
 91 |      /// The epoch at which this account will next owe rent
 92 |      pub rent_epoch: u64,
 93 |      /// Was the transaction signed by this account's public key?
 94 |      pub is_signer: bool,
 95 |      /// Is the account writable?
 96 |      pub is_writable: bool,
 97 |      /// This account's data contains a loaded program (and is now read-only)
 98 |      pub executable: bool,
 99 | }
100 | ```
101 | 
102 | ## `init_if_needed`:
103 | 
104 | Unlike `init`, `init_if_needed` is not rigid and is flexible to use. `init_if_needed` is similar to `init`, but it is not as strict. When you use `init_if_needed` instead of `init`, Anchor follows these checks:
105 | 
106 | 1. Does the account exist and is initialized? If not, create and initialize, then move forward.
107 | 2. Does the account exist but is not initialized? Then initialize and reset the data, then move forward.
108 | 3. Does the account exist and is initialized? Then it skips initialization and lets you modify the existing account without permission (this is dangerous without proper validations).
109 | 
110 | ### **Attack Scenario (Reinitialization Attack) 🔪:**
111 | 
112 | > **Note:**
113 | >
114 | > Anchor considers an account "**uninitialized**" when:
115 | >
116 | > - The 8-byte discriminator is missing or invalid (doesn't match the expected hash).
117 | > - The account ownership doesn't match the expected program.
118 | > - The account has zero lamports (effectively making it non-existent on-chain).
119 | 
120 | ### Attacker Flow:
121 | 
122 | ### **Normal Flow of Code:**
123 | 
124 | In a normal scenario, `init_if_needed` is used for accounts that might need lazy initialization. For example:
125 | 
126 | ```rust
127 | #[account(
128 |   init_if_needed,
129 |   payer = user,
130 |   space = 8 + User::INIT_SPACE,
131 |   seeds = [b"user", user.key().as_ref()],
132 |   bump
133 | )]
134 | pub user_account: Account<'info, User>,
135 | ```
136 | 
137 | Here, the account is safely initialized only if it doesn't exist or is uninitialized.
138 | 
139 | ### **How a Malicious User Exploits This:**
140 | 
141 | An attacker can exploit `init_if_needed` by:
142 | 
143 | 1. **Uninitializing an account** by draining its lamports (setting balance to zero) or changing its ownership.
144 | 2. **Forcing reinitialization** by passing the same account to an instruction that uses `init_if_needed`, tricking the program into resetting the account's state.
145 | 
146 | ### **Methods of Uninitializing an Account:**
147 | 
148 | 1. **Draining Lamports:**
149 |    - An attacker can withdraw all lamports from an account, making it "uninitialized" in Anchor's eyes (since zero-lamport accounts are treated as nonexistent).
150 | 2. **Changing Ownership:**
151 |    - If an attacker changes the account's owner to another program, Anchor will consider it uninitialized for the original program.
152 | 3. **Corrupting the Discriminator:**
153 |    - If an attacker modifies the first 8 bytes (discriminator) of the account data, Anchor will treat it as uninitialized.
154 | 
155 | ### **What Happens After the Attack?**
156 | 
157 | - The account's data is **reset to default values** (e.g., balance = 0, user_name = None).
158 | - An attacker can abuse this to:
159 |   - Reset their debt in a lending protocol.
160 |   - Regain access to a locked account.
161 |   - Exploit logic that depends on the account's state.
162 | 
163 | ### **Security Practices for Using `init_if_needed` 🛡️:**
164 | 
165 | 1. **Avoid `init_if_needed` for Critical State Accounts:**
166 |    - If an account stores important data (e.g., user balances, protocol settings), use `init` instead to prevent reinitialization.
167 | 2. **Add Explicit Checks:**
168 |    - Manually verify whether an account is already initialized before using `init_if_needed`.
169 | 
170 | ```rust
171 | #[account]
172 | #[derive(InitSpace)]
173 | pub struct User {
174 |    pub user_pubkey: Pubkey,
175 |    #[max_len(30)]
176 |    pub user_name: Option<String>,
177 |    pub balance: u64,
178 |    pub is_initialized: bool,
179 |    pub user_vault_bump: u8,
180 |    pub user_bump: u8,
181 | }
182 | ```
183 | 
184 | ```rust
185 | require!(user_account.discriminator.is_empty(), AlreadyInitialized);
186 | ```
187 | 
188 | ### **Conclusion:**
189 | 
190 | While `init_if_needed` provides flexibility, it introduces risks if misused. **Always prefer `init` for security-critical accounts** and only use `init_if_needed` when absolutely necessary, with proper safeguards. Anchor's discriminator check helps prevent some attacks, but developers must implement additional checks to fully secure their programs against reinitialization exploits.
191 | 
192 | ## References:
193 | 
194 | [solana developer course](https://solana.com/developers/courses/program-security/reinitialization-attacks) <br/>
195 | [rare skill blog](https://www.rareskills.io/post/init-if-needed-anchor ) <br/>
196 | [init vs init_if_needed](https://medium.com/@calc1f4r/init-vs-init-if-needed-a-deep-dive-d33fe59e4de5)  <br/>
197 | [security concern in init_if_needed](https://syedashar1.medium.com/program-security-in-anchor-framework-solana-smart-contract-security-b619e1e4d939  
198 | )<br/>
199 | [stackoverflow_discussion](https://solana.stackexchange.com/questions/4948/what-is-anchor-8-bytes-discriminator) <br/>
200 | 
201 | ## Source Code:
202 | 
203 | https://github.com/baindlapranayraj/reinitialization-attack-demo
204 | 


--------------------------------------------------------------------------------
/rust_compiler_for_dummies.md:
--------------------------------------------------------------------------------
  1 | <img 
  2 |   width="1500px"
  3 |  height="460px"
  4 |  src="./images/header_rust_compiler2.jpg"
  5 | />
  6 | 
  7 | # 🍊 Rust Compiler For Dummies
  8 | 
  9 | ## Setting Up the Stage:
 10 | 
 11 | As you may already know, all programs that are written are ultimately converted into binary instructions (those fancy zeros and ones). Why? Because the CPU cannot understand our source code, all it understands is 0s and 1s. Since we cannot write or interact with the memory by writing 0s and 1s, we have to create some abstraction for writing programs in a human readable fashion.
 12 | 
 13 | The thinnest abstraction layer for writing code to interact with memory is **Assembly**. It is a low-level language that helps us write human readable code, operates directly with memory, and provides fine-grained control over computer operations.
 14 | Although Assembly offers high efficiency and speed, it is not commonly used because it is more prone to memory issues and the complexity of writing code. So then how can we interact with memory safely?
 15 | 
 16 | The second level of abstraction provides us with **high-level programming languages** (like Rust, C, and Go). These languages are more human readable than assembly language, featuring syntactic sugar that hides much of the complexity and memory management from the programmer. With this, programmers can write programs with less complexity and fewer errors. Different languages handle memory allocation and deallocation in various ways, some manage it efficiently, while others do not, each with its own trade-offs. But wait a sec... haven't I just told you the CPU cannot understand anything other than 0s and 1s (binary)?
 17 | 
 18 | We cannot execute high-level programming language programs directly on the CPU, we must convert these syntactic sugar programs into binary code. This translation is done by **compilers**. We will focus on how Rust code is converted to binary code and what internal checks occur during compilation. We will peel layer by layer and understand each and every stage. With this in mind, let's get started 😄.
 19 | 
 20 | ## High-Level Overview of Compilation Layers
 21 | 
 22 | The code written by developers is human-readable, allowing others to easily read and understand it. However, a compiler cannot understand human-written code. We need to convert source code into binary code, which is the only format the CPU can understand.
 23 | 
 24 | Although there are several steps required to produce binary code, at a high level, these steps are divided into three main pillers: **Frontend, Middle, and Backend**. This breaks down the complex process of converting source code to machine code.
 25 | 
 26 | <img 
 27 |   width="1500px"
 28 |   height="560px" 
 29 |   src = "./images/compiler_three_pillers.png" 
 30 | />
 31 | <br/>
 32 | 
 33 | At the frontend, you have Rust code. At the backend, you have the binary machine code generated by **LLVM (Low Level Virtual Machine)** that runs directly on the target machine. In the middle, all the Rust-specific ownership and borrowing checks happen.
 34 | 
 35 | We will peel back each layer and understand how Rust compilation works. If we zoom in a little on the Rust compilation three pillers, this is what we get as you can see in the image below.
 36 | 
 37 | <img 
 38 |   width="1500px"
 39 |   height="560px" 
 40 |   src = "./images/compiler_all_stages.png" 
 41 | />
 42 | <br/>
 43 | 
 44 | I don't expect you to know all the terms shown in the picture above, but don't worry by the end of this blog, you will understand all of them. Let's go step by step, peeling back each layer to understand what happens while compiling our code.
 45 | 
 46 | Keeping this big picture in mind, let's start peeling the Orange 🍊.
 47 | 
 48 | ### Layer One: Lexing, Parsing, and AST
 49 | 
 50 | Let's take an example as source code:
 51 | 
 52 | ```rust
 53 | fn main() {
 54 |     // Let's do some investigation :)
 55 |     let some = String::from("chinna");
 56 |     println!("Say my name:");
 57 |     println!("{}", some);
 58 | 
 59 |     time_pass(&some);
 60 | }
 61 | 
 62 | fn time_pass(pass: &String) {
 63 |     println!("Time passing with this guy: {}", pass);
 64 | }
 65 | ```
 66 | 
 67 | This step is the first one where compilation starts. The compiler first reads the `.rs` file as plain text, then breaks down this linear text into **Tokens** like `fn`, `some`, `{`—this is called **Lexing**.
 68 | 
 69 | Then the compiler converts these tokens into a tree-like structure called AST (Abstract Syntax Tree), and this AST still resembles the source code a lot, but it is in a tree-like structure. This is known as **Parsing**. You can see the AST version of the code example that we have taken [here](https://github.com/baindlapranayraj/rektoff/blob/main/rektoff-office-hour/AST.txt). Due to this, all the macros get expanded in this layer.
 70 | 
 71 | <img src="./images/first_step.png" />
 72 | 
 73 | The AST captures all the syntactic code into a tree-like structure. You may ask, why do we need to do this?
 74 | 
 75 | Well, compilers cannot understand this linear source code directly. The source code is sugar-coated syntax designed for human readability, not for compilers. The Abstract Syntax Tree (AST) abstracts away certain details, it is a tree data structure that best represents the syntactic structure of the source code. You can learn more about AST on this [Wikipedia page](https://en.wikipedia.org/wiki/Abstract_syntax_tree).
 76 | 
 77 | ### Layer Two: AST Lowering (HIR and THIR)
 78 | 
 79 | Well, after parsing the tokens and converting them into an AST, the next layer begins. At this point, the AST closely resembles the source code, which still contains a lot of syntactic sugar, such as `for` and `match`.
 80 | 
 81 | We need to peel away this syntactic sugar to simplify the AST. The result of this desugaring process is a form of the AST known as **HIR (High-Level Intermediate Representation)**. HIR is still close to what the user originally wrote, but it removes syntactic sugar for example, converting a `for` loop into a `loop` with iteration logic. After removing all the fluff, HIR is now a more compiler-friendly abstraction representation of the AST.
 82 | 
 83 | This process of simplifying or transforming the AST by removing syntactic sugar is known as **lowering**, and by the way, you can check the HIR representation of the AST by running this command: `rustc +nightly -Z unpretty=hir-tree src/main.rs`.
 84 | 
 85 | By lowering the HIR further down and checking whether all types of the code are used correctly or not, like for example you cannot add an integer with a string, and of course you can do shit like this in **_JavaScript_** 🤡.
 86 | 
 87 | <div align="center">
 88 |  <img  src="./images/javascript_meme.jpg"/>
 89 | </div>
 90 | 
 91 | After doing all the checks for types now **THIR(Typed High-Level Intermediate Representation)** is represented in form AST. As the name might suggest, the THIR is a lowered version of the HIR where all the types have been filled in, which is possible after type checking has completed.
 92 | 
 93 | After lowering HIR to THIR and before MIR unsafety checker will walks through THIR representation and it checks every expression for unsafe operations (like raw pointer dereferences, calls to unsafe functions, static mut accesses, union field accesses) and verifies these only appear inside an unsafe context (unsafe blocks or functions).
 94 | 
 95 | Because `unsafeck` needs typed expressions, it’s placed after type checking and THIR construction but before MIR building. This positioning allows it to enforce Rust’s safety guarantees early and precisely, ensuring all unsafe code is properly enclosed in explicit unsafe contexts.
 96 | 
 97 | ### Layer Three: Middle Intermediate Representation (MIR)
 98 | 
 99 | After lowering, the HIR becomes a compiler-friendly abstraction of the AST. From here, the next layer of abstraction begins this is the heart of the Rust compiler. MIR (Middle Intermediate Representation) is the phase where many classic memory bugs (like race conditions and use-after-free errors) can be detected. If such bugs are found, the compiler will simply throw an error.
100 | 
101 | MIR represents your code as a **Control Flow Graph (CFG)**. Think of this as a detailed flowchart. Every `if`, `loop`, and `match` is broken down into basic blocks and explicit "go-to" jumps between them.
102 | 
103 | To guarantee safety, the compiler can't just check the "happy path." It must analyze every possible path your code could take. What happens if this if is true? What if it's false? What if this loop runs zero times?
104 | 
105 | The CFG makes all these paths explicit. The borrow checker can systematically walk this flowchart, tracking the state of every variable (owned, borrowed, moved) through every possible branch and loop, ensuring no rule is ever violated. You can learn more about CFG at [here](https://en.wikipedia.org/wiki/Control-flow_graph) and for in order to see the MIR version of our code example you can check [here](https://github.com/baindlapranayraj/rektoff/blob/main/rektoff-office-hour/MIR.txt)
106 | 
107 | <div align="center">
108 |  <img width=800 height=500 src="./images/CFG.png"/>
109 | </div>
110 | 
111 | Well long story short, MIR provides the structure for the compiler to exhaustively check every possible path and guarantee that Rust's safety invariants are met. This system is incredibly robust and is the foundation of Rust's safety.
112 | 
113 | The `unsafe` keyword creates a critical exception. It is a promise from the programmer that they will uphold Rust's safety rules manually. If that promise is broken, the compiler's foundational assumptions are invalidated. This can compromise the behavior of the entire program, no matter how "safe" the surrounding code appears.
114 | 
115 | Ultimately, a Rust program is considered **sound** if it is free from undefined behavior. Safe code that passes the compiler's checks achieves this soundness automatically.
116 | 
117 | ### Layer Four: LLVM Code Generation
118 | 
119 | We are getting into the final stages of compilation. After performing all the Rust borrow and ownership checks during the MIR phase, the compiler applies optimizations at the MIR level, such as removing dead code and simplifying control flow.
120 | 
121 | MIR is then translated into LLVM IR (Low Level Virtual Machine Intermediate Representation), a platform-independent intermediate representation used by the LLVM backend. **LLVM IR is comparable to assembly**, but it is a bit more high-level and human-readable.
122 | 
123 | The code-generation phase of the Rust compiler is mainly done by **LLVM (Low Level Virtual Machine)**. LLVM is a collection of tools for building compilers, most notably used by the C/C++ compiler Clang. Finally, after all rigorous optimizations, LLVM IR is translated into machine code for the target platform (e.g., x86_64 or ARM64).Consequently, binaries compiled for one architecture cannot run on another without emulation or translation, which means you cannot run x86_64 binary code on an ARM64 processor or vice-versa.
124 | 
125 | <div align="center">
126 |  <img width=800 height=500 src="./images/llvm.png"/>
127 | </div>
128 | 
129 | ## References:
130 | 
131 | [rust_compiler_video_by_daniel](https://youtu.be/Ju7v6vgfEt8?si=-ElYGBaXZvUwt98m)<br/>
132 | [stack_overflow_discussion](https://stackoverflow.com/questions/43385142/how-is-rust-compiled-to-machine-code)<br/>
133 | [llvm_video](https://youtu.be/3WojCM9r0Ls?si=kfTtzQ-BrsPvs-5q)<br/>
134 | [stack_overflow_discussion_of_llvm](https://stackoverflow.com/questions/2354725/what-exactly-is-llvm)
135 | [rust_book_on_undefined_behavior](https://google.github.io/learn_unsafe_rust/undefined_behavior.html#unsoundness)
136 | 
137 | <img 
138 |   width="1500px"
139 |  height="460px"
140 |  src="./images/rust_compiler_footer.png"
141 | />
142 | 


--------------------------------------------------------------------------------
/solana_limitations.md:
--------------------------------------------------------------------------------
  1 | <img 
  2 |   width="1000px"
  3 |  height="430px"
  4 |  src="./images/shinchan .jpg"
  5 | />
  6 | 
  7 | # 🦀 Deep Dive: Solana Limitatons (Still writing)
  8 | 
  9 | GM GM everyone 😁,
 10 | 
 11 | In this blog, we will go through the various types of Solana resource limitations you might encounter while developing smart contracts. You may run into errors containing words like "limit" or "exceed." These errors represent boundaries predefined by Solana programs to maintain fairness and performance on the blockchain. If you’d like to get more information on these topics, you’re in the right place!
 12 | 
 13 | Hear what u can expect from this blog:-
 14 | 
 15 | - CU Limitations
 16 | - Transaction size limitations
 17 | - Stack size
 18 | 
 19 | # Limitations of Solana:
 20 | 
 21 | Solana is a high-performance public blockchain that stands apart from traditional blockchains like Bitcoin and Ethereum due its unique approach to transaction processing separates logic and state into different accounts, which allows Solana to process many transactions simultaneously.
 22 | 
 23 | However, programs running on Solana are subject to several types of resource limitations. These limitations ensure that programs use system resources fairly while maintaining high performance.
 24 | 
 25 | Knowing these boundaries is very helpful, and as the program designer, you must be mindful of these limitations to create programs that are affordable, fast, safe, and functional.
 26 | 
 27 | ## 1. Compute Unite Limitations:
 28 | 
 29 | ### What is Compute Units ?
 30 | 
 31 | CU (Compute Unit) the name itself suggests that CU is the fundamental measurement of the computational work (CPU cycles and memory usage) performed by a transaction or instruction on Solana. It's similar to "Gas" fees in Ethereum but is much more predictable and low-latency.
 32 | 
 33 | Every instruction your smart contract executes on-chain, such as reading or writing to accounts, performing cryptographic operations (like zk-ElGamal), or verifying signatures and serailization and deserialization consumes a certain number of compute units (CUs). This is roughly proportional to the amount of work done by the nodes.
 34 | 
 35 | If you perform simple transactions, then nodes can process those smart contracts efficiently, resulting in lower CU consumption. However, if you perform complicated mathematical operations or heavy loops, nodes consume a large amount of memory and CPU, and it takes more time to run the program (smart contract), resulting in higher CU consumption
 36 | 
 37 | _For example, for an simple transaction like sending a SOL form wallet A to wallet B it takes around 3000 Compute Units_
 38 | 
 39 | ```rust
 40 | 
 41 | let transfer_amount_accounts = Transfer {
 42 |       from: ctx.accounts.signer.to_account_info(),
 43 |       to: ctx.accounts.recipient.to_account_info(),
 44 |     };
 45 | 
 46 | let ctx = CpiContext::new(
 47 |         ctx.accounts.system_program.to_account_info(),
 48 |         transfer_amount_accounts,
 49 |       );
 50 | 
 51 | transfer(ctx, amount * LAMPORTS_PER_SOL)?;
 52 | 
 53 |  // Takes around 3000 CU
 54 | ```
 55 | 
 56 | The code in the above was written in Anchor ⚓️, It performs an simple transfer of SOL, and for that it taking 3000 CUs.
 57 | 
 58 | ### Compute Unit Budget
 59 | 
 60 | - As we know, heavy mathematical operations or loops consume a large amount of compute units. However, there is a default budget of 200,000 CUs for every transaction or instruction.
 61 | 
 62 | - If a transaction or instruction exhausts the 200,000 CU limit, the transaction or instruction is simply reverted, all state changes are undone, and the fees are not refunded to the signer who invoked the transaction. (This mechanism prevents attackers from running never-ending or computationally intensive programs on nodes, which could slow down or halt the chain.)
 63 | 
 64 | ```rust
 65 | Error: exceeded maximum number of instructions allowed (200000) compute units
 66 | ```
 67 | 
 68 | - Personally, I encountered this error while building my Chaubet project. When a bettor buys shares,[this function](https://github.com/baindlapranayraj/Chaubet/blob/86b5c91de727dd173a0593a7378a0acb3eb25b2a/programs/chaubet/src/utils/helper.rs#L89C5-L132C6) performs some heavy mathematical computations and checks. Due to this, the instruction exceeded the 200,000 CU limit, and the entire instruction was reverted.
 69 | 
 70 | - Well we can increase our computation limit using `SetComputeUnitLimit` we can request a specific calculation unit limit by adding an instruction to our transaction. But 🍑 we can only increase CUs Budget only upto 1.4 Million Units.
 71 | 
 72 | ```rust
 73 | const computeLimitIx = ComputeBudgetProgram.setComputeUnitLimit({
 74 |   units: 500000,  // Increased from 200k to 500k CUs.
 75 | });
 76 | ```
 77 | 
 78 | ### Why do we have a limited Compute Unit Budget?
 79 | 
 80 | In short and simple terms, Solana has CU limitations to ensure fair resource allocation. But what does that mean?
 81 | 
 82 | Solana validators are individual computers (or nodes) that process blockchain transactions and maintain the network’s state (these are the basic fundamentals you must know 😒). Each validator has limited CPU power and memory, just like any regular computer. When programs run inside these nodes, they allocate some memory to process all the instructions.
 83 | 
 84 | Now, if a malicious user sends a transaction that contains infinite loops, it can use a huge amount of memory and may slow or even crash the system. Since the blockchain is a network of shared computers/nodes, if one user performs a huge number of CPU tasks and uses a large amount of CPU/memory, it hogs the system, starving other users.
 85 | 
 86 | Due to this limitation in the CU Budget, it helps the Solana network to prevent denial-of-service (DoS) attacks and resource exhaustion on validators.
 87 | 
 88 | ## 2. Transaction Size Limit.
 89 | 
 90 | Before getting into Transaction Size Limit, lets peak a little into what are transactions what do they contain.
 91 | A transaction on Solana is a request sent by users to interact with the network, typically to mutate data such as transferring lamports (the native token unit) between accounts.
 92 | 
 93 | Each transaction contains one or more instructions that specify the operations to perform on the blockchain. This execution logic is stored in raw bytes in program state.
 94 | 
 95 | <img
 96 |  width="1000px"
 97 |  height="550px"
 98 |  src="./images/trx_arch.png"
 99 | />
100 | 
101 | ### The Transaction Contains:
102 | 
103 | - **Array of signatures**:- A sequence of signatures included in the transaction.
104 | - **Message**:- This is the core part of the transaction. It contains everything needed to describe what the transaction intends to do.
105 | 
106 |   As you are already seeing the above Image, The Message is the core part of the transaction which contains the :-
107 | 
108 | - Header:- This one indicates the number of signers and read-only accounts **(3 bytes)**
109 | - Array of Accounts:- It contains all the accounts required to perfrom the transaction, provided from the client (Each Pubkey 32 bytes)
110 | - Recent Blockhash:- A recent blockhash was attached to this transaction (32 bytes)
111 | - Array of Instruction:- Contains all instructions invoked by the signer(The size is depended on complexity of Instruction code).
112 | 
113 | ### Transacion Size Limitaions:-
114 | 
115 | | **Component**    | **Size**               | **Description**                                                     |
116 | | ---------------- | ---------------------- | ------------------------------------------------------------------- |
117 | | Signature        | 64 bytes               | Each signature included in the transaction                          |
118 | | Message Header   | 3 bytes                | Indicates number of signers and read-only accounts                  |
119 | | Account Pubkey   | 32 bytes (per account) | Public key for each account involved in the transaction             |
120 | | Recent Blockhash | 32 bytes               | Recent blockhash attached to the transaction                        |
121 | | Instructions     | Varies                 | Depends on number and complexity of instructions in the transaction |
122 | 
123 | <br/>
124 | 
125 | Every Solana transaction has a size limit of **1,232 bytes**. The total transaction size is the sum of the bytes for all signatures, accounts involved, the blockhash, and the instructions (including their accounts and data), and must be less than 1,232 bytes. Because of this limit, developers need to fit all signers, account addresses, and required data within the 1,232-byte constraint
126 | 
127 | You would have encounter the `transaction size limit exceeded` error when you try to send too many accounts from client side or too many instructions in single transaction.
128 | 
129 | **you might wonder why only limited to 1,232 bytes ?** <br/>
130 | In short :- The transaction size limit of 1,232 bytes in Solana is closely tied to how data is transmitted over the internet using IPv6 (Internet Protocol version 6). But lets break it down more.
131 | 
132 | **Hears Why ?:**
133 | 
134 | #### Let’s go back to some fundamentals of computer networking briefly.
135 | 
136 | So, what is computer networking?
137 | 
138 | In simple terms, it's a group of computers or nodes connected together to share data 📦 among them. This can involve communication between two local nodes/computers (LAN) or between computers across the globe (Internet).
139 | 
140 | For In order to communicate with nodes effectively without sending data 📦 or information to the wrong destination node, we need a set of rules or protocols to follow. **IP (Internet Protocol)** is the universal addressing system for nodes on a network. IP provides unique address for every node or computer and uses these addresses to route and deliver data packets 📦 to the correct destination. Data sent over a network isn’t sent all at once. It’s broken into small pieces called packets.
141 | 
142 | But solana also uses UDP(User Datagram Protocol) on top of IP, to send the packets/data 📦 more quickly (unlike in etheareum uses TCP which do lots of checks and hinders the speed if packet is too large). UDP just fires packets 📦, as fast as possible, with no guarantee they’ll arrive. But 🍑 if packet is large(if greater then MTU) then might get fragmented(splitted) which we may loose some data which reduces the reliablity.
143 | 
144 | <img
145 |  width="1000px"
146 |  height="550px"
147 |  src="./images/trx_size_udp.png"
148 | />
149 | 
150 | OK now why are we learning all these stuff ? how are they related to Transaction Size limit ?
151 | Hold your horses we are getting there lets connet some dots 😌.
152 | 
153 | Solana nodes, like other networked computers, communicate using the Internet Protocol (IP) often IPv6, the latest version. They must follow IP's rules for addressing and routing data. Solana nodes send serialized transactions to each other using the UDP transport protocol, which is fast and efficient.
154 | 
155 | However, if a transaction (UDP packet) is larger than the network’s Maximum Transmission Unit (MTU), it is fragmented (split) by the IP layer, not by UDP itself. UDP simply sends the packets it does not handle fragmentation If any fragment is lost in transit, the entire transaction is lost, since UDP provides no error correction or retransmission.
156 | 
157 | So this fragmentation is handled by Solana,In order to avoid fragmentation the packt/transaction size should be less then MTU(Maximum Transmission Unit) which is typically **1280 bytes**.After Removing the headers(IP and UDP header = 48 bytes),
158 | the remaining 1,232 bytes are allocated for transaction size.
159 | 
160 | ### Solana’s New Update: Increasing Transaction Size Limit to 4000 bytes
161 | 
162 | With Solana’s adoption of modern transport protocols like QUIC (which doesn’t have the same strict MTU constraints as UDP, QUIC is more like cross-breed between TCP(of fragmentation handle) and UDP(fast as f\*ck)), larger messages can be more reliably delivered.
163 | 
164 | In the older version, where the transaction size was limited to 1,232 bytes, it was difficult to create complex transactions for ZK and DeFi applications. Increasing the number of accounts on the client side (with each account’s public key being 32 bytes) and listing all those public keys could quickly hit the transaction size limit.
165 | 
166 | Because of this limitations developers should write optmize code where all accounts,user signatures should be squeezed into a tiny 1,232-byte space.
167 | 
168 | <img
169 |  width="1000px"
170 |  height="550px"
171 |  src="./images/meme_trx_size_limit.png"
172 | />
173 | 
174 | With this new [SIMD-296 proposes](https://github.com/solana-foundation/solana-improvement-documents/pull/296/commits/bbc29c909085589989ca5f258550ce4447e68a89) Transaction size limit incresed to **4k bytes**.This change allows for more instructions, larger data payloads, and smoother execution of complex dApps especially those involving multiple interactions or requiring rich metadata.
175 | 
176 | ## 3. Stack Size Limitations
177 | 
178 | 1. Explain about general programming language memory allocation.
179 | 2. Write an example and explain all the things realted to it ex:- Frame, Stack, Heap and Data/Globl persistend Data.
180 | 3. Explain how solana manages the memory
181 | 4. Connect the Dots with Solana BPF Virtual Machine.
182 | 5. Explain how can we optmize the Stack Size Limitations.
183 | 
184 | In Solana programs (smart contracts), there's a limitation on the stack size - the amount of memory allocated for local variables and function calls. The current stack frame size limit is **4KB per frame**.
185 | 
186 | ### What is Stack Memory?
187 | 
188 | Stack memory is used for:
189 | 
190 | - Local variables in functions
191 | - Function parameters
192 | - Return addresses
193 | - Temporary data
194 | 
195 | ### Stack Size Constraints
196 | 
197 | ```rust
198 |     // This function / stack frame contains more the 4KB size
199 |     pub fn handle(_ctx: Context<SendAmount>) -> Result<()> {
200 |         // Allocate a 1MB buffer on the stack
201 | 
202 |         let mut buffer = [0u8; 50000]; // 50,000 bytes (each u8 is 1 byte, so 50,000 * 1 = 50,000 bytes)
203 | 
204 |        // Fill the buffer with some pattern
205 |         for i in 0..buffer.len() {
206 |             buffer[i] = (i % 256) as u8; // Fill with repeating 0..255
207 |         }
208 | 
209 |         // Find the sum of all bytes (just for fun)
210 |         let sum: u64 = buffer.iter().map(|&b| b as u64).sum();
211 |         println!("Sum of all bytes: {}", sum);
212 | 
213 |         Ok(())
214 |     }
215 | ```
216 | 
217 | > ⚠️ **Warning Example:**
218 | >
219 | > When you allocate a large buffer or array on the stack, you may encounter an error like:
220 | >
221 | > ```
222 | > Error: Function _ZN16cu_optimizations16cu_optimizations6handle17h60f6d64a7e5552edE Stack offset of 1048576 exceeded max offset of 4096 by 1044480 bytes, please minimize large stack variables. Estimated function frame size: 1048576 bytes. Exceeding the maximum stack offset may cause undefined behavior during execution.
223 | > ```
224 | >
225 | > This warning indicates your function's stack frame exceeds Solana's 4KB stack limit. To resolve this, avoid allocating large arrays or buffers on the stack—use heap allocation or refactor your code to reduce stack usage.
226 | 
227 | ```
228 | error Access violation in program section at address 0x1fff02ff8 of size 8
229 | ```
230 | 
231 | ### Solana BPF Memory Regions
232 | 
233 | | **Region**       | **Start Address** | **Purpose**                                    | **Access Rules**                 |
234 | | ---------------- | ----------------- | ---------------------------------------------- | -------------------------------- |
235 | | Program Code     | 0x100000000       | Executable bytecode (SBF instructions)         | Read-only, execute-only          |
236 | | Stack Data       | 0x200000000       | Function call frames and local variables       | Read/write (4KB per stack frame) |
237 | | Heap Data        | 0x300000000       | Dynamic memory allocation (default 32KB)       | Bump allocator (no deallocation) |
238 | | Input Parameters | 0x400000000       | Serialized instruction data & account metadata | Read-only during execution       |
239 | 
240 | To work around stack limitations:
241 | 
242 | 1. Use heap allocation when necessary
243 | 2. Break large functions into smaller ones
244 | 3. Avoid large stack-allocated arrays
245 | 4. Use references instead of moving large data structures
246 | 
247 | [solana_optimization_github](https://github.com/solana-developers/cu_optimizations) <br/>
248 | [rare_skill_blog_post](https://www.rareskills.io/post/solana-compute-unit-price) <br/>
249 | [solana github discussion SIMD-0296](https://github.com/solana-foundation/solana-improvement-documents/pull/296/commits/bbc29c909085589989ca5f258550ce4447e68a89)<br/>
250 | [frank_castle tweet on solana transaction size limit](https://x.com/0xcastle_chain)<br/>
251 | [a great video to undertand about memory mangment in programming](https://youtu.be/vc79sJ9VOqk?si=hTqpylYjBO88hvJ9)
252 | 
253 | <img
254 |  width="1000px"
255 |  height="430px"
256 |  src="./images/solana_limitation_bye.gif"
257 | />
258 | 


--------------------------------------------------------------------------------