跳转至

0️⃣Data and its operations

data fields

📁How to Use Data in BRAIN

BRAIN enables easy access to financial market data using predefined names. In this step, we'll learn how to find the data you want. Let's first look at data classification system in BRAIN.

  • Dataset Categories
  • Datasets
  • Data fields

🏷️Dataset Categories

Dataset Categories divide data into 17 main categories. You can see these categories by clicking "Data" at the top of the platform screen. (BRAIN shows only 7 categories before becoming a consultant.) Notable examples include Fundamental data from company financial statements and PV data related to stock prices and transaction volumes.

📦Datasets

Datasets are collections of data with the same theme. They're usually named by adding numbers to the dataset category name. For example, PV1 dataset has price/volume-related data from the stock market, including price information like opening, high, low, closing prices, and information like 20-day average transaction volume. The fundamental dataset provides extensive data from financial statements including company assets, capital, and liabilities.

🔢Data fields

Data fields are the actual matrix-form data used in the platform. You can access the contents within data fields through their names in the simulator. The returns data we used earlier was accessing the returns data field containing return information.

🔍Finding Desired Data

BRAIN provides a Data Section to find desired data fields. You can search by dataset or data field names, or explore from categories.

Remember to set your desired region, delay, and universe in the top right before searching, as available data fields differ by region and universe!

Operator

🛠️Using Operators in BRAIN

Just like we applied rank() to -returns to transform values within the matrix, operators process matrices within data fields. BRAIN provides various operators, including simple arithmetic operations and more complex ones.

➗Arithmetic Operators

Arithmetic operators enable arithmetic operations including basic math operations and rounding.

💡Logical Operators

Logical operators evaluate expressions and return true or false values. In BRAIN, true equals 1 and false equals 0.

⏰Time Series Operators

Time series operators perform operations related to past d-day values for specific stocks. For example, ts_mean(x,d) calculates the average of x over d days.

❌Cross Sectional Operators

Cross-sectional operators compare or process values across target stocks at a specific point in time. For example, rank(x) orders x values at a specific time and distributes them from 0 to 1.

📐Vector Operators

When searching for data fields, you might find vector-type data fields. Instead of having a single value per stock per day, these store multiple values (in vector format). To convert these into Alpha positions, you need to transform them into a single representative value like mean or median. These operators serve this purpose.

🎭Transformational Operators

Transformational operators enable transformation of values within matrices through specific operations.

👪Group Operators

When exploring data fields, you might find group-type data fields that group companies based on specific criteria. For example, the industry data field is a group data field that classifies companies by industry. Group operators include operations like calculating representative values (mean/sum/median) within groups or performing neutralization within groups.

PV Data

PV data has information related to price and volume. Since it includes price itself, which is essential for predicting stock prices, it's one of the most useful data types when first creating Alphas.s

💸Price data

PV data includes stock prices – open, high, low, close- and other trading related information like volume of shares traded and market capitalization. These values are well represented in candlestick charts.

Navigator_PV_data.png

  • Open is the first traded price when the stock market opens for the day.
  • Close is the last traded price when the stock market ends for the day.
  • High is the highest price traded during the day.
  • Low is the lowest price traded during the day.

📦Volume

Volume indicates the number of shares investors transacted that day.

You can use the adv20 data field to access the 20-day average volume. If you want to calculate the average for a different number of days, you can use ts_mean(volume,N).

📋VWAP (Volume-Weighted Average Price)

Additionally, VWAP can represent a day's stock price, which is the volume-weighted average price. Since low-volume trades might give a false picture of other price indicators like closing price, VWAP can be a better measure of that day's price.

In formula terms, it's \(sum(price*volume)/sum(volume)\).

💡Alpha Ideas

Most Alphas using PV data come from these two main ideas:

  • Momentum -Assume that stocks which have performed well in the past will continue to perform well, while stocks that have performed poorly will continue to do so.
    • Momentum effect typically appears over longer periods (several months or more).
  • Reversion -The hypothesis is that if something increases today, it will fall tomorrow. And if something decreases today, it will increase tomorrow. This something can be anything: price, volume, correlation between two things or the other indicators/variables that you can think of while developing your alpha.
    • Reversion effect appears over shorter periods (days or weeks).

The rank(-returns) we created first is a simple example of implementing the reversion effect.

🔥Let's try it out!

Shall we try implementing a reversion Alpha using VWAP?

VWAP is the average price weighted by volume for the day, while closing price is the last traded price. By comparing VWAP and closing price, we can understand how the last traded price compares to the day's average. For example, if the closing price is much higher than VWAP, we can interpret that the price rose near market close. The opposite tend to be true as well.

Since reversion theory assumes prices return to their mean, we can implement an Alpha using the formula "vwap/close". This takes long positions when closing prices fall below VWAP, and short positions when they rise above it.

When you simulate this, you'll notice that while the Sharpe ratio is high, the turnover is excessive. According to submission criteria, turnover should be below 70% for submission. You can adjust turnover by applying operators (ex. trade_when) or changing settings (ex. Decay).