跳转至

What is an Alpha? (1)

📚Basic Conception

Alphas:mathematical models that seek to predict the future price movement of various financial instruments

quantitative investing: It means investing through predetermined mathematical models based on stock characteristics, without emotional intervention. It's about defining rules in advance about when, how much, and which stocks to invest in.

To verify if these rules work well, we need to implement our idea using code and test its potential performance in the market, a process called backtesting.

🧪Trials

📈-returns

  1. “returns” is a data field that has the stock's returns for each company on each date.
  2. added a minus sign to returns: This means we're going to bet against returns.

Therefore, the formula we just ran expresses the idea of investing opposite to company returns. In other words, we predicted that companies with high returns yesterday will see price decreases, while those with low returns will see price increases.

Relevant Conception——Price Reversion

Price reversion is when an asset's price moves back to its average after deviating. It's a key concept in trading and mean-reversion strategies.

🏆rank(-returns)

rank:This operator sorts the values within the input and represents them as evenly spaced values.

What is an Alpha? (2)

🧐Performance Metrics

🔍PnL Graph

A PnL (Profit and Loss) graph visually represents the cumulative profits and losses over a specific period, often derived from backtesting financial strategies or simulations.

It helps evaluate the performance of an investment or trading strategy by showing trends, peaks, and drawdowns. This graph is a key tool for assessing risk and profitability.

📝Is Summary

When you simulate an Alpha, BRAIN summarizes key performance values in the IS Summary.

Sharpe

Sharpe is the measure of risk adjusted returns earned by the alpha. Higher values of Sharpe are better. 

Sharpe is calculated as the annualized value ( \(*\sqrt{250}\) ) of returns divided by their standard deviation:  Sharpe = Avg. Annualized Returns / Annualized Std. Dev. of Returns

Turnover

Turnover is the percentage of the capital which the alpha trades each day. More turnover may mean higher transaction costs during trading. the formula Value Traded / Value Held expresses this.

Fitness

Fitness of an Alpha is a function of returns, turnover and Sharpe: Fitness=Sharpe * Sqrt(Abs(Returns)) / Max(Turnover,0.125)

Good Alphas have high fitness. You can optimize the performance of your Alphas by increasing Sharpe (or returns) and reducing turnover. Improving one factor normally has an adverse impact on the other factor. As you work on optimizing your Alpha, an improvement in its fitness is a sign that your changes are having a positive impact.

Returns

Returns indicates how much profit an Alpha can generate. Since BRAIN simulations assume a long-short portfolio (which we'll explain in the next step), the total investment amount equals half of the book size.

Drawndown

Even with good Alpha performance, significant losses can occur during certain periods. Depending on market conditions, large losses might make it difficult to continue investing in that Alpha.

Drawdown represents the percentage of the largest loss incurred during any year in your backtesting. As a practice, you should target a return-to-drawdown ratio greater than one. The higher the ratio of returns to drawdown, the better it may be for your alpha.

Margin

Margin represents how much PnL you obtained relative to the traded amount. It's calculated by dividing total PnL by the total traded amount. Note that Margin uses basis points (bps, ‱, or ten thousandths) as its unit of measurement, not %!

The platform?

🔍What Happens in the Platform When Running a Simulation

🏛Position

When starting a simulation, BRAIN calculates how much to invest in each stock by setting positions.

Positions consist mainly of long positions and short positions. In BRAIN, a long position means buying stocks, betting that the stock price will rise. Conversely, a short position means selling stocks, betting that the stock price will fall.

When you press the simulate button, BRAIN calculates positions for each stock through the process shown in the table below. Don't worry if it looks complicated! It's simple when you look at it step by step.

Navigator_Whats_Going_On.png

🧮Expression Calculation

First, the simulator calculates the results of the expression formula.

rank() ranks the input values and arranges them evenly from 1 to 0 in descending order. For example, if there are 5 stocks, it returns 5 values like 0, 0.25, 0.5, 0.75, 1 according to the input value order.

In the table above, column D, Alpha value on 2-Feb: rank(-returns), shows this process.

After calculating rank(), it goes through a process called neutralization.

🌐Market Neutralization

Neutralization means removing specific influences from an Alpha. Since overall market movements often influence stock prices, having only long positions for many stocks carries a high risk of market exposure.

Therefore, BRAIN assumes the construction of a long-short portfolio where half the positions are long positions and the other half are short positions.

Before constructing a long-short portfolio, let's see what happens when we only take long positions. Click on Settings, then change Neutralization to None and check the results.

⚖️Market Neutralization

🐂long-only portfolio

Navigator_Long_Only_Graph.png

You can see that even when running rank(returns), which is opposite to the earlier simulation results, it shows a similar pattern. This means that a portfolio consisting of only long positions struggles to escape market risk influence.

That's why BRAIN allows users to construct a long-short portfolio where half the positions are long positions and half are short positions. This setup helps minimize market risk influence and retain only pure signal effects.

⚖️long-short portfolio

Let's return to the table we saw earlier. To neutralize market risk, we go through three main assumptions:

  1. Subtract the mean from each value to center around zero. (Centered around 0)
  2. Divide each value by the sum of absolute values to create Normalized weights where the total of absolute values equals 1. (Normalized weights)
  3. Multiply Normalized weights by book size to get actual positions. (Assign $20Mn capital)

Navigator_Whats_Going_On.png

After creating positions through these three processes each day, we calculate each stock's PnL daily and combine them to produce the final PnL.

In the table above, we can see that out of 8 stocks, 5 predictions were correct and 2 were wrong, resulting in a final profit of 0.03.

🍂Decay

Sometimes Alpha positions might change too much daily. For example, with rank(-returns), if the Alpha's premise is true, all stocks' long and short positions need to change every day.

However, changing too many positions in one day can lead to issues with portfolio's stability. In such cases, we can slow down the rate of Alpha position changes through a method called Decay.

Decay means bringing forward a part of past positions to the present. When determining today's positions, it means applying a certain percentage of yesterday's or the day before's positions.

Let's see the table below.

Navigator_Decay_Excel.png

If applying a 3-day Decay, the final position is a weighted average where today's position (2-Feb) is multiplied by 3, yesterday's position (1-Feb) by 2, and the day before's position (31-Jan) by 1.

Expressed as a formula, it becomes \((Position(02Feb)*3 + Position(01Feb)*2 + Position(31Jan)*1)/6\).

The general form is like below:

\[ Decay\_linear(x,n)=\frac{x[date]∗n+x[date−1]∗(n−1)+…+x[date−n−1]*1}{n+(n−1)+…+1} \]

Decay allows easy control of the rate of position changes. However, note that applying Decay may weaken signal strength as current Alpha values reflect more slowly. Typically, values under 10 are used for Decay.

Settings

🌐Language / Instrument Type

BRAIN supports backtest simulation using a simplified language called Fast Expression, targeting only stocks (Equity). Don't worry about Fast Expression - a detailed explanation will follow later!

🌍Region

Region determines which area's stocks you want to simulate.

Currently, you can simulate in United States (USA) region. Once you reach the level and become a BRAIN consultant, you'll be able to run simulations in more regions.

Regions supported at the consultant level include Asia (ASI), Europe (EUR), China (CHN), Korea (KOR), Taiwan (TWN), Hong Kong (HKG), Japan (JPN), Americas (AMR), and Global (GLB) which allows simultaneous simulation of stocks from all regions.

🪐Universe

Universe is a group of US stocks defined on the basis of their liquidity.

For example, you can select universes targeting TOP N stocks (N can be 3000, 1000, 500, and 200), meaning the simulation will run on N stocks based on most liquid.

🐢Delay

Delay is an option that determines how much delay the data has. Using today's data immediately isn't as easy as you might think.

For example, considering closing prices, we can't know them until the stock market ends for the day. Due to this constraint, it's common to create Delay1 Alphas that use data accumulated up to yesterday.

However, if you want to reflect market changes faster, creating Delay0 Alphas might be a good idea. But Delay0 Alphas have many constraints and require stricter conditions for submission.

In Delay0, even data fields with the same names as in Delay1 might have different values. The close price data mentioned above uses actual closing prices in Delay1, but in Delay0, it uses prices from slightly before market close along with middle values.

🏛Neutralization

When creating Alphas, positions often concentrate in specific industries. For example, imagine oil prices rose yesterday, causing all oil & gas stocks to rise while airlines industry stocks, which use oil as fuel, all fell.

In such situations, in Alphas like rank(-returns) might concentrate long or short positions in those two industries. Like the market risk exposure we looked at earlier, this Alpha becomes exposed to industry risk.

While we hedged market risk by taking equal long and short positions, how can we hedge industry risk?

Like hedging market risk, we can hedge through neutralization. However, instead of neutralizing across all stocks, we neutralize only within companies in the same industry.

This way, we can take long positions in petrochemical stocks that rose less and short positions in those that rose more. Similarly, we take long positions in aviation stocks that fell more and short positions in those that fell less.

🍂Decay

Decay is a setting that determines how much past positions to reflect. As we looked at earlier in detail, higher Decay values lower Alpha turnover. However, note that the Alpha's Sharpe ratio might decrease as information becomes delayed.

✂️Truncation

When creating Alphas, positions might become concentrated in specific stocks. In such cases, you can use Truncation, which limits the maximum weight a single stock can have. In the TOP3000 universe, using about 0.01(1%) is typical. However, in smaller universes like TOP200, having larger maximum positions might be better.

🧮Pasteurization, Unit Handling, NaN Handling

These three options are somewhat difficult concepts to use from the start, but simply put:

  • Pasteurization determines whether to include stocks not in the universe in calculations or leave them as NaN.
  • Unit Handling detects and warns about mismatched data field units during simulation. It only provides warnings without affecting actual simulation, so only the Verify option is available.
  • NaN Handling is an option to choose whether to replace missing values (NaN) in data fields with 0.

⏳Test Period

Test Period allows a subset of PnL to be hidden as a validation period. This PnL can be viewed by clicking the 'Show test period' button. With this feature, you can check the performance of the hidden period and assess the robustness of the alpha based on the performance during this period.

??

data fields

📁How to Use Data in BRAIN

BRAIN enables easy access to financial market data using predefined names. In this step, we'll learn how to find the data you want. Let's first look at data classification system in BRAIN.

  • Dataset Categories
  • Datasets
  • Data fields

🏷️Dataset Categories

Dataset Categories divide data into 17 main categories. You can see these categories by clicking "Data" at the top of the platform screen. (BRAIN shows only 7 categories before becoming a consultant.) Notable examples include Fundamental data from company financial statements and PV data related to stock prices and transaction volumes.

📦Datasets

Datasets are collections of data with the same theme. They're usually named by adding numbers to the dataset category name. For example, PV1 dataset has price/volume-related data from the stock market, including price information like opening, high, low, closing prices, and information like 20-day average transaction volume. The fundamental dataset provides extensive data from financial statements including company assets, capital, and liabilities.

🔢Data fields

Data fields are the actual matrix-form data used in the platform. You can access the contents within data fields through their names in the simulator. The returns data we used earlier was accessing the returns data field containing return information.

🔍Finding Desired Data

BRAIN provides a Data Section to find desired data fields. You can search by dataset or data field names, or explore from categories.

Remember to set your desired region, delay, and universe in the top right before searching, as available data fields differ by region and universe!

Operator

🛠️Using Operators in BRAIN

Just like we applied rank() to -returns to transform values within the matrix, operators process matrices within data fields. BRAIN provides various operators, including simple arithmetic operations and more complex ones.

➗Arithmetic Operators

Arithmetic operators enable arithmetic operations including basic math operations and rounding.

💡Logical Operators

Logical operators evaluate expressions and return true or false values. In BRAIN, true equals 1 and false equals 0.

⏰Time Series Operators

Time series operators perform operations related to past d-day values for specific stocks. For example, ts_mean(x,d) calculates the average of x over d days.

❌Cross Sectional Operators

Cross-sectional operators compare or process values across target stocks at a specific point in time. For example, rank(x) orders x values at a specific time and distributes them from 0 to 1.

📐Vector Operators【到这里】

When searching for data fields, you might find vector-type data fields. Instead of having a single value per stock per day, these store multiple values (in vector format). To convert these into Alpha positions, you need to transform them into a single representative value like mean or median. These operators serve this purpose.

🎭Transformational Operators

Transformational operators enable transformation of values within matrices through specific operations.

👪Group Operators

When exploring data fields, you might find group-type data fields that group companies based on specific criteria. For example, the industry data field is a group data field that classifies companies by industry. Group operators include operations like calculating representative values (mean/sum/median) within groups or performing neutralization within groups.