Try to generate general prompts for LUX English AR

- Support output fund name ,share name, TER, performance fees, OGC
- Only output data point and value which can be found in page text.
- Output fund level data and share level data separately.
- List part of special cases to fit cases as many as possible.
This commit is contained in:
Blade He 2024-08-28 16:44:19 -05:00
parent 32676728f6
commit 134b365b68
2 changed files with 173 additions and 53 deletions

View File

@ -11,17 +11,47 @@ The markdown table(s) will be as output with key: "table_contents".
3. Extract data from upon parsed text and table(s) contents.
3.1 The upon parsed text and table(s) contents as context.
3.2 Data Extraction from parsed table contents
Maybe there are TER, performance fees data in the context, the TER reported name could be:
Total Expense Ratio, TER, Annualised TER including performance fees,etc.
Maybe there are TOR, TER, performance fees, OGC data in the context.
The TOR reported name could be:
TOR, Turnover Ratio, Portfolio Turnover, Portfolio turnover ratio, PTR, etc.
The TER reported name could be:
TER, Total Expense Ratio, Total Fund Charge, Gross Expense Ratio, All in fee, Total Net Expense Ratio, Weighted Average Expense Ratio, Synthetic total Expense Ratio, Annualised TER including performance fees,etc.
The performance fees reported name could be:
performance fees, performance fees ratio, etc.
Special cases
performance fees, performance fees ratio, Performance, etc.
The OGC reported name could be:
OGC, OGF, Ongoing Charge, Operation Charge, On Going Charges, Operating Charge, Ongoing Fund Charge, etc.
Data business features:
1. Most of cases, the data is in the table(s) of context.
2. TOR is fund level data.
- The full fund name should be main fund name + sub-fund name, e,g, main fund name is Black Rock European, sub-fund name is Growth, the full fund name is: Black Rock European Growth.
- The sub-fund name may be as the first column values in the table.
3. TER, performance fees, OGC are share class level data.
4. Their values are belong to percentage number.
- The TER, performance fees, OGC values should be less than 100.
- The TOR value could be more than 100, e.g. 126.33.
- The TOR and performance fees could be negative number, e.g. -7.99.
5. If with multiple data values in same row, please extract the latest.
6. One fund could be with multiple share classes and relevant TER, performance fees or OGC values.
Special cases:
1. Performance fees is part of TER.
If exist both of "TER including performance fees" or "TER with performance" and "TER excluding performance fees" or "TER without performance",
The TER should be "TER including performance fees" or "TER with performance".
If exist both of "TER including performance fees" and "TER excluding performance fees",
The TER should be "TER including performance fees".
The performance fees should be:
"TER including performance fees - TER excluding performance fees" or "TER with performance fees - TER without performance fees".
The performance fees value can be negative or less than 0, e.g., -0.27 or -0.18.
TER including performance fees - TER excluding performance fees.
Here is the example:
GAMAX FUNDS FCP\nClass\nTER (excluding Performance Fees)\nTER (including Performance Fees)\nGAMAX FUNDS - ASIA PACIFIC\nA\n2.07%\n2.07%\n
The output should be:
[
{"fund name": "GAMAX FUNDS - ASIA PACIFIC", "share data": ["share name": "A", "ter": 2.07, "performance fees": 0]}
]
The performance fees value is TER (including Performance Fees) - TER (excluding Performance Fees) = 2.07 - 2.07 = 0
2. Combo TER value table.
2.1 Exist Feeder fund TER and Master fund TER.
@ -31,7 +61,7 @@ Please output separately as below:
- "feeder fund share class" and "TER feeder" values
- "Master fund" and "TER Master" values
Here is the example:
Feeder fund (share class)\nMaster fund\nTER\nFeeder\nTER Master\nTotal\nGlobal Portfolio Solution DKK -\nBalanced Class TI\nDanske Invest SICAV Global Portfolio\nSolution \u2013 Balanced Class X\n0.1475%\n0.7025%\n0.850%\n
Feeder fund (share class)\nMaster fund\nTER\nFeeder\nTER Master\nTotal\nGlobal Portfolio Solution DKK -\nBalanced Class TI\nDanske Invest SICAV Global Portfolio\nSolution Balanced Class X\n0.1475%\n0.7025%\n0.850%\n
The output should be:
[
@ -39,20 +69,51 @@ The output should be:
{"fund name": "Danske Invest SICAV Global Portfolio Solution DKK", "share data": ["share name": "Balanced Class X", "ter": 0.7025]},
]
The TER and performance fees value is percentage number, it means the value should be less than 100.
Most of cases, the data is in the table(s) of context.
If with multiple TER/ performance fee values in same row, please extract the latest.
If possible, please extract fund name, share class name, TER or performance fees value as the output.
One fund could be with multiple share classes and relevant TER values.
The output should be JSON format, the format is like:
3. Latest data with time series data
Some data table is with multiple date-time columns, please extract the data from the latest date-time column.
Here is the example:
PERFORMANCE\nHISTORICAL PERFORMANCE\nHISTORICAL PERFORMANCE\nFrom \n1 July \nFrom \n19 July \nFrom \n1 January \nFrom \n27 April \nFrom \n19 July \nFrom \n1 January \n2021\nFrom \n22 May \n2021\nFrom \n16 July \n2021\nFrom \n21 September \n2021\nto 30 June 2023\nto 31 December 2022\nto 31 December 2021\nAsia Total Return Fund Class I5 (CHF Hedged) Acc\n6.73%\n \n-13.32%\n \n \n 6.04%\n \n \n \n
The output should be:
[
{"fund name": "Asia Total Return Fund", "share data": ["share name": "Class I5 (CHF Hedged) Acc", "performance fees": 6.73]}},
]
The keyword for performance fees is PERFORMANCE, the value 6.73 is the first number with the latest date-time.
4. TER reported name priority
If exists both of Expense Ratio and Synthetic total Expense Ratio, please extract the value of Synthetic total Expense Ratio.
Output requirement:
1. If possible, please extract fund name, share name, TOR, TER, performance fees, OGC values as the output.
2. The required output items are "fund name" and "share name".
3. Only output the dasta point which with relevant value.
4. fund level data: ("fund name" and "TOR") and share level data: ("fund name", "share name", "ter", "performance fees", "ogc") should be output separately.
4. The output should be JSON format, the format is like:
[{
"fund name": "fund 1",
"share data": [{"share name": "share 1", "ter": 1.23, "performance fees": 0.2},{"share name": "share 2", "ter": 2.56, "performance fees": 1.2}]
"TOR": 35.26
},
{
"fund name": "fund 2",
"share data": [{"share name": "share a", "ter": 1.16, "performance fees": 0.5},{"share name": "share b", "ter": 1.45, "performance fees": 1.1}]
"TOR": -28.26
},
{
"fund name": "fund 3",
"TOR": 115.52,
},
{
"fund name": "fund 1",
"share data": [{"share name": "share 1", "ter": 1.23, "performance fees": 0.2, "ogc": 0.05},{"share name": "share 2", "ter": 2.56, "performance fees": 1.2, "ogc": 1.16}]
},
{
"fund name": "fund 2",
"share data": [{"share name": "share a", "ter": 1.16, "performance fees": -0.15},{"share name": "share b", "ter": 1.45}]
},
{
"fund name": "fund 3",
"share data": [{"share name": "share a", "performance fees": 0.57, "ogc": 0.18},{"share name": "share b", "performance fees": -0.11}]
}]
Only output JSON data.
If can't find share class name in context, please output empty JSON data: []

View File

@ -3,17 +3,46 @@ Context:
Instructions:
Read the context carefully.
Maybe there are TER, performance fees data in the context, the TER reported name could be:
Total Expense Ratio, TER, Annualised TER including performance fees,etc.
Maybe there are TOR, TER, performance fees, OGC data in the context.
The TOR reported name could be:
TOR, Turnover Ratio, Portfolio Turnover, Portfolio turnover ratio, PTR, etc.
The TER reported name could be:
TER, Total Expense Ratio, Total Fund Charge, Gross Expense Ratio, All in fee, Total Net Expense Ratio, Weighted Average Expense Ratio, Synthetic total Expense Ratio, Annualised TER including performance fees,etc.
The performance fees reported name could be:
performance fees, performance fees ratio, etc.
Special cases
performance fees, performance fees ratio, Performance, etc.
The OGC reported name could be:
OGC, OGF, Ongoing Charge, Operation Charge, On Going Charges, Operating Charge, Ongoing Fund Charge, etc.
Data business features:
1. Most of cases, the data is in the table(s) of context.
2. TOR is fund level data.
- The full fund name should be main fund name + sub-fund name, e,g, main fund name is Black Rock European, sub-fund name is Growth, the full fund name is: Black Rock European Growth.
- The sub-fund name may be as the first column values in the table.
3. TER, performance fees, OGC are share class level data.
4. Their values are belong to percentage number.
- The TER, performance fees, OGC values should be less than 100.
- The TOR value could be more than 100, e.g. 126.33.
- The TOR and performance fees could be negative number, e.g. -7.99.
5. If with multiple data values in same row, please extract the latest.
6. One fund could be with multiple share classes and relevant TER, performance fees or OGC values.
Special cases:
1. Performance fees is part of TER.
If exist both of "TER including performance fees" or "TER with performance" and "TER excluding performance fees" or "TER without performance",
The TER should be "TER including performance fees" or "TER with performance".
If exist both of "TER including performance fees" and "TER excluding performance fees",
The TER should be "TER including performance fees".
The performance fees should be:
"TER including performance fees - TER excluding performance fees" or "TER with performance fees - TER without performance fees".
The performance fees value can be negative or less than 0, e.g., -0.27 or -0.18.
TER including performance fees - TER excluding performance fees.
Here is the example:
GAMAX FUNDS FCP\nClass\nTER (excluding Performance Fees)\nTER (including Performance Fees)\nGAMAX FUNDS - ASIA PACIFIC\nA\n2.07%\n2.07%\n
The output should be:
[
{"fund name": "GAMAX FUNDS - ASIA PACIFIC", "share data": ["share name": "A", "ter": 2.07, "performance fees": 0]}
]
The performance fees value is TER (including Performance Fees) - TER (excluding Performance Fees) = 2.07 - 2.07 = 0
2. Combo TER value table.
2.1 Exist Feeder fund TER and Master fund TER.
@ -23,7 +52,7 @@ Please output separately as below:
- "feeder fund share class" and "TER feeder" values
- "Master fund" and "TER Master" values
Here is the example:
Feeder fund (share class)\nMaster fund\nTER\nFeeder\nTER Master\nTotal\nGlobal Portfolio Solution DKK -\nBalanced Class TI\nDanske Invest SICAV Global Portfolio\nSolution \u2013 Balanced Class X\n0.1475%\n0.7025%\n0.850%\n
Feeder fund (share class)\nMaster fund\nTER\nFeeder\nTER Master\nTotal\nGlobal Portfolio Solution DKK -\nBalanced Class TI\nDanske Invest SICAV Global Portfolio\nSolution Balanced Class X\n0.1475%\n0.7025%\n0.850%\n
The output should be:
[
@ -31,21 +60,51 @@ The output should be:
{"fund name": "Danske Invest SICAV Global Portfolio Solution DKK", "share data": ["share name": "Balanced Class X", "ter": 0.7025]},
]
The TER and performance fees value is percentage number, it means the value should be less than 100.
The performance fees value can be negative, e.g. -0.2 or -0.67.
Most of cases, the data is in the table(s) of context.
If with multiple TER/ performance fee values in same row, please extract the latest.
If possible, please extract fund name, share class name, TER or performance fees value as the output.
One fund could be with multiple share classes and relevant TER values.
The output should be JSON format, the format is like:
3. Latest data with time series data
Some data table is with multiple date-time columns, please extract the data from the latest date-time column.
Here is the example:
PERFORMANCE\nHISTORICAL PERFORMANCE\nHISTORICAL PERFORMANCE\nFrom \n1 July \nFrom \n19 July \nFrom \n1 January \nFrom \n27 April \nFrom \n19 July \nFrom \n1 January \n2021\nFrom \n22 May \n2021\nFrom \n16 July \n2021\nFrom \n21 September \n2021\nto 30 June 2023\nto 31 December 2022\nto 31 December 2021\nAsia Total Return Fund Class I5 (CHF Hedged) Acc\n6.73%\n \n-13.32%\n \n \n 6.04%\n \n \n \n
The output should be:
[
{"fund name": "Asia Total Return Fund", "share data": ["share name": "Class I5 (CHF Hedged) Acc", "performance fees": 6.73]}},
]
The keyword for performance fees is PERFORMANCE, the value 6.73 is the first number with the latest date-time.
4. TER reported name priority
If exists both of Expense Ratio and Synthetic total Expense Ratio, please extract the value of Synthetic total Expense Ratio.
Output requirement:
1. If possible, please extract fund name, share name, TOR, TER, performance fees, OGC values as the output.
2. The required output items are "fund name" and "share name".
3. Only output the dasta point which with relevant value.
4. fund level data: ("fund name" and "TOR") and share level data: ("fund name", "share name", "ter", "performance fees", "ogc") should be output separately.
4. The output should be JSON format, the format is like:
[{
"fund name": "fund 1",
"share data": [{"share name": "share 1", "ter": 1.23, "performance fees": 0.2},{"share name": "share 2", "ter": 2.56, "performance fees": 1.2}]
"TOR": 35.26
},
{
"fund name": "fund 2",
"share data": [{"share name": "share a", "ter": 1.16, "performance fees": 0.5},{"share name": "share b", "ter": 1.45, "performance fees": 1.1}]
"TOR": -28.26
},
{
"fund name": "fund 3",
"TOR": 115.52,
},
{
"fund name": "fund 1",
"share data": [{"share name": "share 1", "ter": 1.23, "performance fees": 0.2, "ogc": 0.05},{"share name": "share 2", "ter": 2.56, "performance fees": 1.2, "ogc": 1.16}]
},
{
"fund name": "fund 2",
"share data": [{"share name": "share a", "ter": 1.16, "performance fees": -0.15},{"share name": "share b", "ter": 1.45}]
},
{
"fund name": "fund 3",
"share data": [{"share name": "share a", "performance fees": 0.57, "ogc": 0.18},{"share name": "share b", "performance fees": -0.11}]
}]
Only output JSON data.
If can't find share class name in context, please output empty JSON data: []